DP-Data Preprocessing

Data preprocessing including images, texts.

Images

Tutorial Datasets

Tensorflow

1
2
3
4
5
6
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/',one_hot=True)
#########
import tensorflow as tf
mnist = tf.keras.datasets.mnist(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Keras

1
2
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
  • x_train has the shape (num_samples,32,32,3), y_train has the shape (num_samples,)

There are two choices normalizing between [-1,1] or using (x-mean)/std. We prefer the former when we know different features do not relate to each other. ref

To accomplish this, we’ll first implement a dedicated Python class to align faces using an affine transformation, where affine transformations are used for rotating, scaling, translating, etc. We’ll then create an example driver Python script to accept an input image, detect faces, and align them.

Face Alignment

Face images need alignment preprocessing such that all faces:

  1. Be centered in the image.
  2. Be rotated that such the eyes lie on a horizontal line (i.e., the face is rotated such that the eyes lie along the same y-coordinates).
  3. Be scaled such that the size of the faces are approximately identical.

References

Building Blocks: Text Pre-Processing