MNIST - The Hello World of ML

Introduction:

This tutorial assumes you know the basics about ML and know what Keras is. The purpose of this tutorial is to show a user how to train Multi Layer Perceptron (MLP) / Neural Network (NN) in Keras.

Import Libraries and Load Data

In this example we load the MNIST Data that comes with Keras. To read more about it, see the official documentation here. We also import all the libraries needed for this project.

Process Data Part 1

Filter text

After loading the data, we need to reshape it. The input shape will be required in our NN on the first layer. We have x images, in shape y * z. We turned our images into a flatten array, thus y*z, for each image. The output is (num_examples,flatten_array). In our example we don't have to worry about color channels, because our images are grayscale images.

Process Data Part 2

Keras has a default backend as float 32, so we change our data input to this type. And then we divide by 255. Why? This changes our pixel range from 0-255 to 0.0-1.0. This makes it easier for our Keras System. 0.0 means 0 (black) and 1.0 means 255 (white)

Process Data Part 3

Since we have 10 classes (0-9) using categorical works well in this example. For for a given label, if the number is 3, this function converts the label to [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] in which you can see the value for 3 is marked.

Keras Model

Now lets build our Sequential Model. In the input layer, it request the input shape, which previously we set as 28*28. The number of training examples are not require in this case. Thus input is (*,28*28). Now the output layer has shape (*, 32). This number was somewhat randomly picked. After this first layer, you do not need to list the input layer each layer. Also worth noting, the relu activation layer is likely the most common one to use in between layers. I use this one because it is not only accurate but it is fast. Read more about why Relu Advantage or Why use Relu? For the last layer, we use num_classes. Which from above is 10. So the output layer outputs 10 results. And for the output, we use the logistic sigmoid activation functions.

Compile

This is were we design the learning process of our system. Using binary cross entropy is needed because we are determining 0, 1 applies to our pictures. For metrics, accuracy is fine for now. For optimizer, which update our weights, I used Adam for its adaptive learning rate. RMSprop should work just as well on this example. The key difference is Adam uses a bias-correction and adds momentum to RMSprop. Either works well for this problem.

Fit Model

Fit model trains our model for a given number of iterations, which is set by epochs.

Model Training

This shows the status of our model training. Since we are not doing a "test" case, we use the test case as the validation case. In this example, each epoch (iteration) you can see how well the model trains.

Model's Loss

Model's Accuracy

Here we can see how well the model does for each epoch. After about the 5/6 epoch, we start to see the model overfit on the training data. And below 3 epochs we can see how it under fits.

Full Code

conclusion notes here

Insert script here

Conclusion