Peeking into Tensorflow-Keras
1 Introduction
Machine learning algorithms can be broadly categorized as unsupervised or supervised by what kind of “experience” they are allowed to have during the learning process. In these case, “experience” is called dataset. Sometimes we call them data points. Unsupervised learning algorithms experience a dataset containing many features, then learn useful properties of the structure of this dataset. Supervised learning algorithms experience a dataset containing features, but each example is also associated with a label or target.
1.1 Neural Network
Neural networks or connectionist architectures provide an alternative computational paradigm, and can be seen as a step towards the understanding of intelligence. It departs from the traditional von Neumann serial processing and instead is based on distributed processing via connections between simple elements.
The goal of a neural network is to approximate some function by learning parameters that results in the best approximation. Another way of saying this is minimising the difference (loss function is used to perform the estimation) between the expected output and the actual one.
1.2 Modelling
Models are abstractions of reality to which experiments can be applied to improve our understanding of phenomena in the world. They are at the heart of science in which models can be used to process data to predict future events or to organise data in ways that allow information to be extracted from it. There are two common approaches to constructing models.
The first is of a deductive nature. It relies on subdividing the system being modelled into subsystems that can be expressed by accepted relationships and physical laws. These subsystems are typically arranged in the form of simulation blocks and sets of differential equations. The model is consequently obtained by combining all the sub-models.
The second approach favours the inductive strategy of estimating models from measured data. This estimation process will be referred to as “learning from data” or simply “learning” for short.
In general, a neural network consists of layers of neurons where each neuron computes the following activation function:
\[ f(x) = \phi(\mathbf{w}^Tx+b) \]
where \(x\) is the input to the neuron, \(w\) is a weight vector, \(b\) is a bias term and \(\phi\) is a nonlinearity function. Each neuron receives potentially many inputs, and outputs a single number. The nonlinearity is important because it allows layers of neurons to learn non-linear functions. In these layered structures, the output of one layer of units becomes the inputs to the next layer of units.
We need to find the weights and biases so that the outputs of the net comes as close as possible to their true values. Since we know that loss function will be used to measure this close value, adjustment values of weights and biases is via optimizer.
1.3 Tensor
Mathematically, a tensor is a generalization of vector and matrices. It the context of Tensorflow, a tensor is considered as a multidimensional array.
2 Tensorflow
- Tensorflow 2.x has adopted keras API as standard method writing neural network
- Tensorflow 2.x use eager execution by default
When writing a TensorFlow program, the main object that is manipulated and passed around is the tf.Tensor. TensorFlow supports eager execution and graph execution. In eager execution, operations are evaluated immediately. In graph execution, a computational graph is constructed for later evaluation.
tf.Tensor computation is accelerated via GPU’s, TPU’s!
2.1 Available optimizers in Tensorflow
- Stochastic gradient descent (SGD)
- RMSprop
- Adam
- AdamW
- Adadelta
- Adagrad
- Adamax
- Adafactor
- Nadam
- Ftrl
2.2 Available loss function in Tensorflow
Probabilistic losses
- BinaryCrossentropy class
- CategoricalCrossentropy class
- SparseCategoricalCrossentropy class
- Poisson class
- binary_crossentropy function
- categorical_crossentropy function
- sparse_categorical_crossentropy function
- poisson function
- KLDivergence class
- kl_divergence function
Regression losses
- MeanSquaredError class
- MeanAbsoluteError class
- MeanAbsolutePercentageError class
- MeanSquaredLogarithmicError class
- CosineSimilarity class
- mean_squared_error function
- mean_absolute_error function
- mean_absolute_percentage_error function
- mean_squared_logarithmic_error function
- cosine_similarity function
- Huber class
- huber function
- LogCosh class
- log_cosh function
more here
3 Tensorflow in Action
3.1 Data preparation
- read image using OpenCV
- read image using Pillow
- read image using tf.keras.utils
Data usually formatted in 3-dimension : (60000, 28, 28)
This data images is stored in a 3D tensor of axes 3 and having shape representing 60,000 matrices of 28×28 integers.
3.2 Neural Network Stacking
Multi Layer Perceptron
= tf.keras.models.Sequential(name="simple-MLP")
model_2 2, input_shape = (1,)))
model_2.add(tf.keras.layers.Dense(1, activation='sigmoid')) model_2.add(tf.keras.layers.Dense(
MLP with Feature Extraction
model = tf.keras.models.Sequential(name="simple-CNN")
model.add(tf.keras.layers.Conv2D(filters = 32, kernel_size = (5, 5), activation='relu', padding='same', input_shape = (IMG_SIZE,IMG_SIZE,1)))
model.add(tf.keras.layers.MaxPooling2D(pool_size = (2, 2)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.Dense(3, activation='softmax'))
Simple Autoencoder
= keras.Input(shape=(height, width, 1))
input_layer # encoding
= keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same')(input_layer)
x = keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.MaxPooling2D((2, 2), padding='same')(x)
x = keras.layers.Dropout(0.5)(x)
x
# decoding
= keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.UpSampling2D((2, 2))(x)
x
= keras.layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
output_layer
= tf.keras.Model(inputs=[input_layer], outputs=[output_layer]) model
When we are dealing with network that has feature extraction, convolution operation is used.
Model: "simple-CNN"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 180, 180, 32) 832
max_pooling2d (MaxPooling2D (None, 90, 90, 32) 0
)
flatten (Flatten) (None, 259200) 0
dense (Dense) (None, 128) 33177728
activation (Activation) (None, 128) 0
dense_1 (Dense) (None, 3) 387
=================================================================
Total params: 33,178,947
Trainable params: 33,178,947
Non-trainable params: 0
_________________________________________________________________
3.3 Training
Batch size defines the number of samples we use in one epoch to train a neural network. There are three types of gradient descent in respect to the batch size:
- Batch gradient descent – uses all samples from the training set in one epoch.
- Stochastic gradient descent – uses only one random sample from the training set in one epoch.
- Mini-batch gradient descent – uses a predefined number of samples from the training set in one epoch.