Conv net - Image Classification Tensorflow Keras Example

May 23, 2019

Photo by David Sola on Unsplash

Conv net - Image Classification Tensorflow Keras Example

Kaggle is a company whose business model consists in having data scientists from around the world compete to build the best performant model for a given problem. In other words, for a fee, Kaggle hosts competitions for businesses that want to use crowdsourcing to solve their data problems.

In Kaggle competitions, you are given both a training set and a testing set. The features of the testing set are provided but the labels are hidden. The goal is to train off the training set and use it to predict the target label of the testing set. The predictions are stored in a submission file and uploaded to Kaggle for evaluation. Contestants can then view how their model faired against the other competitors and tweak their model accordingly. When the competition comes to an end, a third set, whose features and labels the contestants never had access to, is used to determine the winner. More often than not, this will penalize the teams whose model tended to overfit (i.e. high variance).

There is a famous example where Netflix offered a 1 million dollar prize only to end up with a final winning model that was too complicated for them to actually put into production. That aside, Kaggle’s contests have managed to produce some good results. For example, Allstate, the insurance company, posted a challenge where given attributes of drivers, the model approximates the probability of a car crash. The 202 competitors ended up improving Allstate’s model by a whopping 271%.

In the proceeding article, we’ll walk through a kaggle competition with the goal of determining whether a given image contains a cactus. The competition can be found here.


In machine learning, whenever you are working with images, you should automatically think convolutional neural networks. Fortunately, Keras, a high level API that runs on top of Tensorflow, abstracts away a substantial amount of the complexity in constructing neural networks.

import cv2  
import os  
import pandas as pd  
import numpy as np  
from matplotlib import pyplot as plt  
from keras.models import Sequential  
from keras.layers import Flatten, Conv2D, MaxPool2D, Activation, Dense, Dropout  
from keras.optimizers import Adam  
from keras.preprocessing.image import ImageDataGenerator

If you’d like to follow along, go ahead and download the training and testing sets from Kaggle and copy/unzip them into your working directory.

train_directory = 'train'  
test_directory = 'test'

In addition, we’re given a file with the id of every image and whether it consists of a cactus or not.

df = pd.read_csv('train.csv')  

Let’s have a look at what we’re working with.

img = cv2.imread('train/0004be2cfeaba1c0361d39e2b000257b.jpg')  

Every image has a height of 32 pixels and a width of 32 pixels. The third dimension refers to the color. A value of 1 would imply its a grayscale image where the brightness of each pixel ranges from 0 to 255. A value of 3 means that it’s a RGB image. In an RGB image, every pixel has red, green and blue attributes each of which can range from 0 to 255.


The Keras ImageDataGenerator object can be used to apply data augmentation. Performing data augmentation is a form of regularization, enabling our model to generalize better. During the training phase, each new batch of data is randomly adjusted according to the parameters supplied to ImageDataGenerator.

train_datagen = ImageDataGenerator(  

Let’s dissect what each of these arguments mean.

  • rescale: rescales the pixels such that their brightness ranges from 0 to 1
  • validation_split: portion of images set aside for validation
  • shear_range: randomly displaces each point in fixed direction
  • zoom_range: randomly zooms inside pictures. If you pass a float, then [lower, upper] = [1-zoom_range, 1+zoom_range]
  • horizonal_flip: randomly flips the image horizontally

More often than not, the images you’ll be working with will either be placed in folders with their respective class names or put inside a single folder along a CSV or JSON file which maps the each image to its label. For example, in the first scenario, all images that contain a cactus are placed in a directory named cactus and all the images that don’t contain a cactus are placed in a separate directory called no_cactus. In this case, we are given a CSV alongside the images. We can use the flow_from_dataframe method to associate each image with its label.

df['has_cactus'] = df['has_cactus'].astype(str)
train_generator = train_datagen.flow_from_dataframe(  
    directory = train_directory,  
    subset = 'training',  
    x_col = 'id',  
    y_col = 'has_cactus',  
    target_size = (32,32),  
    class_mode = 'binary'  
val_generator = train_datagen.flow_from_dataframe(  
    directory = train_directory,  
    subset = 'validation',  
    x_col = 'id',  
    y_col = 'has_cactus',  
    target_size = (32,32),  
    class_mode = 'binary'  

Next, we can go about constructing our model. The last layer in our network will have a single neuron since we’re performing binary classification. Convolution and max pooling are used in the hidden layers to try and learn the underlying pattern (i.e. what does a cactus look like).

model = Sequential()
model.add(Conv2D(32, (3,3) ,activation = 'relu', input_shape = (32,32,3)))  
model.add(Conv2D(32, (3,3), activation = 'relu'))  
model.add(Conv2D(64, (3,3), activation='relu'))  
model.add(Conv2D(64, (3,3), activation='relu'))  
model.add(Conv2D(128, (3,3), activation='relu'))  
model.add(Dense(512, activation = 'relu'))  
model.add(Dense(1, activation = 'sigmoid'))

We use binary_crossentropy as our loss function since it’s a binary classification problem, we measure the performance of our model based off accuracy and we use Adam to minimize the loss function.

    loss = 'binary_crossentropy',  
    optimizer = Adam(),   
    metrics = ['accuracy']  

In the context of machine learning, every training step we compute the gradient. If we’re using mini-batch gradient descent, then in one step, x examples are processed, where x is equal to the batch size. For example, if you have 2,000 images and use a batch size of 10, an epoch consists of 2,000 images / (10 images / step) = 200 steps.

Normally, we’d pass the batch size as an argument to the fit function. However, since the Keras data generator is meant to loop infinitely, Keras has no way determining when one epoch starts and another begins. Thus, we use steps_per_epoch and validation_steps which is simply equal to ceil(num_samples / batch_size).

history = model.fit_generator(  
    steps_per_epoch = 2000,  
    epochs = 10,  
    validation_data = val_generator,  
    validation_steps = 64  

We load the test samples making sure to normalize the data such that the brightness of each pixel ranges from 0 to 1. Then, we use our model to predict whether the image contains a cactus.

ids = []  
X_test = []
for image in os.listdir(test_directory):  
    path = os.path.join(test_directory, image)  
X_test = np.array(X_test)  
X_test = X_test.astype('float32') / 255
predictions = model.predict(X_test)

Finally, we create the submission file.

submission = pd.read_csv('sample_submission.csv')  
submission['has_cactus'] = predictions  
submission['id'] = ids

We set index to false, otherwise, it will add an index as the first column to each row.

submission.to_csv('submission.csv', index = False)

Cory Maklin

Profile picture

Written by Cory Maklin Genius is making complex ideas simple, not making simple ideas complex - Albert Einstein You should follow them on Twitter