With the hype of Neural Network nowadays, there are many Neural Network tutorials scattered around the internet. But, because the complexity of this topic itself, many of you, yes, I say, you, are confused when reading the tutorial.
But, before starting the tutorial, you must to have a goal. Why do you need the tutorial?
The answer is simple, maybe you have something in mind to build using Neural Network. Or, you just don’t want to miss out this advanced technology.
For you who don’t know what this technology can do, I give you several example of the implementations.
Neural Network Application
From the traditional approach, into the seemingly impossible feat, Neural Network has been used in various industries. Here is several examples of Neural Network application,
- In Banking, you can see many big banks betting their future on this technology. From predicting how much money they need to put inside an ATM to optimize their trip back and forth to refill. Into replacing the old technology to detect fraudulent credit card transactions.
- In Advertising. In a big advertising company like Google Adsense or similar industry, they deploy Neural Network to further optimize their ad choice in relevancy, in order to make a better targeting and increase the Click-Through Rate.
- Healthcare, you can see many academics and start-ups trying to solve many traditional problems which were unsolved before. From Clinical Imaging to assists doctors in reading MRIs, to Genomics to read the DNA sequences.
- In Automotive. Well, you must have known. The self-driving car is huge. Huge deal. But, I still doubt they can implement those cars in the congestive road of Jakarta. Lol.
There are still many other implementations you never imagined before.
Are you a Software Engineer? Feeling that robot can’t replace you in your job? Take a look at Uizard. These guys are real. They make a robot who can create front-end codes.
Deep Machine Learning
What is the different between the usual Machine Learning technique and Neural Network?
Take a look at the image above. Machine Learning will have some kind of threshold where the performance will stuck while Deep Learning will always have a better performance with increased data.
But, what is the striking difference between these two techniques that makes Deep Learning better?
The concept of Neural Network itself. While Machine Learning is called a black box that can solve problems, Deep Learning is a Deep Machine Learning, with that black box stacked on top of each other.
Imagine that each circle is a Machine Learning black box. With those black box stacked to each other, it can learn more complex problem. With large enough data, it will learn better than just a single black box.
So, without delaying, let’s start the Neural Network tutorial.
Neural Network Tutorial with Python
Why Python? Well, Python is the library with the most complete set of Neural Network library.
For this tutorial, I will use Keras
Keras is a higher level abstraction for the popular Neural Network library, Tensorflow. Because of the high level of abstraction, you don’t have to understand the underlying logic. Just create your Neural Network and you are done.
To get started, you need to install several things.
First, of course, you need Python. You can download it from their website. You need version 3.6+ for this Neural Network tutorial.
After that use Pip and install tensorflow. Keras is already included inside.
pip install tensorflow pip install jupyter
Now you are ready for the action
Fashion MNIST, the not so common tutorial
Many of the tutorials used MNIST, the handwritten digit dataset for introducing you into Neural Network. While this is good for starting, many people, most often the non-academic people confused, what is the use of understanding handwritten digits?
That’s why in this Neural Network tutorial, we will not use MNIST, but another dataset, called fashion MNIST.
Fashion MNIST is a dataset of ten categories of clothing in grayscales.
There are 70,000 photos of such images available for us. The size of those images are 28×28 grayscale.
The categories are like this
Now, after getting the data. What next?
This step is always there in any Machine Learning project. The train-validation-test split.
It is a step to split your acquired data into three portion.
The training data, which the program will use to train itself. Large enough training data quantity will make your Neural Network understand better your data distribution. It means, more data will make your network do better. Always put the priority on this portion of the split.
Next is the validation data. It is a portion of data which will be evaluated against during the training process. Before trying on real-life test data, you will need validation data to quickly validate the network.
Finally the test data. This is the real-life test data you want to evaluate against. If the network performs well on the test data, you have the confidence that you can bring the network to the production level.
There is a rule of thumb when splitting your data. If your data is not that many, maybe in thousands or tens of thousands, then use 70-10-20 as the split strategy. 70% of the data into training, 10% into validation, and 20% into the test set.
However, if you have millions of data ready to be used, then 90-5-5 is a better split strategy. Or, if the data is more than that, maybe you can use 98-1-1 as the split strategy.
For this tutorial, because the data provided by Keras already split between train and test set (60K for training, 10k for testing), you can use as it is. For the validation, let’s take 10% of the training data.
So, it will be 54K images for training, 6K images for validation, and 10K images for testing.
Neural Network Construction
Now, you know what to do to prepare the data. Let’s get into the action.
Type jupyter notebook in your command line to get started.
Your browser will open up a window like this. Using jupyter notebook, you can code Python interactively.
Then import several things
from tensorflow.keras.datasets import fashion_mnist from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.utils import to_categorical
The fashion MNIST dataset is actually already included inside Keras own collection. For other datasets, you might want to import via OpenCV or Python Image Library to make it ready for processing and training.
For our Fashion MNIST, let’s load them
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
Okay, you are ready now to create your own Neural Network. But how?
Listen, this is important.
For every single Neural Network project you will do in the future, these things always apply when you create a Network.
- Start simple. Use a single layer perceptron and evaluate the result. If it is good, then proceed to deployment.
- If the previous step is not good enough, try to get your network wider and/or deeper. Add several neurons in your single layer perceptron. Or, add one layer into the existing network. Evaluate and if it is good, proceed to deployment. If not, then iterate by adding more neurons or layers.
- When after adding several more layers into your network but still stuck, then maybe you need to change your network architecture. Use Convolutional Neural Network (CNN) for images or Recurring Neural Network for time-series and texts.
Follow those three steps, and you will get your result better.
Let’s apply the steps to our problem
Single Layer Perceptron
Let’s start our Neural Network Tutorial with a perceptron.
What is perceptron?
The usual Neural Network images you see everywhere is the perceptron. There are three layers on the image. Input Layer, 1 Hidden Layer, and Output Layer. Input and Output Layer will always be there for every network. Therefore, I count only the number of Hidden Layer to recognize how deep it is. For that image, Because only 1 Hidden Layer available, hence I call that Single Layer Perceptron.
Because our image is 28×28 big (AKA two-dimensional). To make it fit to the perceptron, let’s flatten them.
x_train = x_train.reshape(x_train.shape, -1) / 255.0 x_test = x_test.reshape(x_test.shape, -1) / 255.0 y_train = to_categorical(y_train) y_test = to_categorical(y_test)
You will see that the size changed into 784 because of the flatten. Print y_train.shape and y_test.shape to see your data size.
Your training data x_train will change from 60,000 x 28 x 28 to 60,000 x 784. And your testing data x_test following suit, from 10,000 x 28 x 28 to 10,000 x 784.
For the hidden layer, let’s set an arbitrary number. Should be simple and small enough to follow our step number 1. Let’s choose 10.
While for the output layer, because we have 10 categories to categorize, we need to set it to 10. Each of these cells will be filled with 1 if it is the correct category and 0 if not. In an example, if you have a Sandal image, then the output layer should have something like this [0 0 0 0 0 1 0 0 0 0]. The index for Sandal category (5) should be 1, the other should be 0. Remember, Array is zero indexed. The sixth item should be index number 5.
The output layer is called One-Hot Vector, when it is hot then the value is 1, the others should be all zeros.
Back to our Architecture
model = Sequential() model.add(Dense(10, input_dim=784, activation='relu')) model.add(Dense(10, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
The Sequential model is the easiest model Keras has. The first line of model.add method is adding your hidden layer with 10 cells, coming from 784 input cells.
The second add method is adding your output layer to the network. Of course, it has 10 cells as I elaborated before.
Relu and Softmax are non-linearity. It is what makes Neural Network good. Neural Network can map any complex equality and map any data distribution.
You don’t have to know what Relu and Softmax are, it is too complex for a beginner. You just need to follow these tips
- Use Relu whenever possible. On every single hidden layer.
- Use Softmax on output layer who have more than two categories to be predicted.
- Sigmoid on output layer with two categories.
After creating your model, call compile method to finish your model. It usually takes three parameters. Always use categorical_crossentropy for multi-categories, and use binary_crossentropy for two categories. Use adam or rmsprop as the optimizer since both of them are pretty good. And you need accuracy as the metric to check your network performance.
model.fit(x_train, y_train, epochs=10, validation_split=0.1)
Like what I said before, we will use 10% of the training data as the validation data, hence validation_split was set to 0.1
Epoch is the number of training loop we will do. One epoch will expose all of our training data once to the network. More epoch means the network will know better about our data.
And the result is
Epoch 10/10 54000/54000 [==============================] - 9s 160us/step - loss: 0.4075 - acc: 0.8598 - val_loss: 0.4305 - val_acc: 0.8522
Pretty good. You get 85% accuracy on validation data. Don’t see the training data accuracy, because if your network know it better, it should be 100% accuracy. What matters is the accuracy on validation data. Since it has not seen any of the validation data, we can see how well it can generalize.
Let’s see on to the testing data
_, test_acc = model.evaluate(x_test, y_test) print(test_acc)
And you will get around 84% accuracy on test data. Good enough for this simple architecture.
Accuracy? What kind of food is that?
It is a metric to measure how good is the performance of your network. 84% accuracy on test data means, the network guessed right for around 8400 images from the 10K test data.
A higher accuracy on test data means a better network. If you think the accuracy should be higher, maybe you need the next step(s) in building your Neural Network.
Make the Network wider
model2 = Sequential() model2.add(Dense(50, input_dim=784, activation='relu')) model2.add(Dense(10, activation='softmax')) model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model2.fit(x_train, y_train, epochs=10, validation_split=0.1)
Nothing different beside the number of the hidden layer cells. From 10 to 50.
Let’s check it out
Epoch 10/10 54000/54000 [==============================] - 9s 167us/step - loss: 0.2735 - acc: 0.9006 - val_loss: 0.3703 - val_acc: 0.8653
A whopping 86% accuracy on validation data. Good! It proves that making a bigger network can increase the performance.
Let’s see on our test data
_, test_acc = model2.evaluate(x_test, y_test) print(test_acc)
Yup. It is increased to 86% too. Pretty good! It is around 300 more data guessed right.
But I want more
Create a deeper Network
model3 = Sequential() model3.add(Dense(50, input_dim=784, activation='relu')) model3.add(Dense(50, activation='relu')) model3.add(Dense(10, activation='softmax')) model3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model3.fit(x_train, y_train, epochs=10, validation_split=0.1)
Let’s add one more hidden layer with 50 cells. And check it out
Epoch 10/10 54000/54000 [==============================] - 9s 170us/step - loss: 0.2648 - acc: 0.9008 - val_loss: 0.3417 - val_acc: 0.8738
Validation accuracy increased by 1%. A better performance.
How about our test data?
_, test_acc = model3.evaluate(x_test, y_test) print(test_acc)
Hmm. It is 86.9% accurate. The improvement is not that big. What’s wrong?
Maybe our approach is not right by using perceptron on images. How about we change it into…
Convolutional Neural Network
CNN is a Neural Network that can see a subset of our data. It can detect a pattern in images better than perceptron. Read more about Convolutional Neural Network Tutorial on my blog post.
Let’s just apply the CNN to our network
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten import numpy as np (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data() x_train = x_train[:,:,:,np.newaxis] / 255.0 x_test = x_test[:,:,:,np.newaxis] / 255.0 y_train = to_categorical(y_train) y_test = to_categorical(y_test)
Let’s import the necessary methods and reshape our training data. You can’t flatten it because CNN reads our images as it is.
If you check your x_train, you will have 60,000 x 28 x 28 x 1 data. Why x 1?
The data CNN needs to read must be like this
total_data x width x height x channels
Height and width is self explanatory. Channels is like Red or Green or Blue in RGB images. In RGB, because there are three channels, we need to make the data x 3. But because we work with grayscale images where every value on Red, Green, or Blue channel is the same.
Let’s build the architecture
model4 = Sequential() model4.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28,28, 1))) model4.add(MaxPooling2D(pool_size=2)) model4.add(Flatten()) model4.add(Dense(10, activation='softmax')) model4.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
A little bit different. There are Conv2D, MaxPooling2D, and Flatten.
These guys are the three most common layers to use in CNN.
It will explain you what is inside the network
Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 28, 28, 64) 320 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 14, 14, 64) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 12544) 0 _________________________________________________________________ dense_1 (Dense) (None, 10) 125450 =================================================================
The conv2d will change your 28x28x1 image to 28x28x64. Just imagine this as 64 hidden layer cells.
MaxPooling will reduce the width and height so that you will not need to compute all the cells. It reduces the size to 14x14x64.
Finally flatten just flatten out the output of MaxPooling. Into a hidden layer of 12544 cells.
Let’s check it out
model4.fit(x_train, y_train, epochs=10, validation_split=0.1)
The result is…
Epoch 10/10 54000/54000 [==============================] - 42s 774us/step - loss: 0.1890 - acc: 0.9318 - val_loss: 0.2660 - val_acc: 0.9083
Oh yeah! It is more than 90%. A single CNN Layer can do this.
How about the test data
_, test_acc = model4.evaluate(x_test, y_test) print(test_acc)
Wow, it also gives you 90.25% accuracy.
Changing the architecture into a more suitable one really works. And I always suggest you to do so.
I conclude my version of Neural Network Tutorial.
A process on building Neural Network is pretty much like that. Following my three steps and you will do just fine.
On traditional dataset like something in your company database, you can follow my steps from the very beginning and start complicate the network. But for Images or Texts, it is actually better to just start jump into a most suitable architecture. But still, do it as simple as possible for your first step.
Thanks for reading this. I hope this article can help you build your Neural Network better.