In this kernel I will use different machine learning models, including Random Forest, Logistic Regression, Extra Trees, K Nearest Neighbors, Fully Connected Neural Network and Convolutional Neural Network, to classify 69 classes of fruit from images. Transfer learning will also be used based on VGG16 model fitted to ImageNet dataset. The datasets are downloaded from Kaggle dataset. The training dataset includes 34641 images and the testing dataset includes 11640 images.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import keras
import tensorflow as tf
import os
import glob
import platform
import cv2
from sklearn import svm
from sklearn.externals import joblib
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.layers import LSTM, Input
from keras.optimizers import RMSprop, Adamax
Using TensorFlow backend.

Reading the data

In [2]:
# Reading all pictures
# Each picture -> array of height x width x 3 dimenion
img_height = 48
img_width = 48
train_fruit_images = []
train_labels = [] 
# Loop over folder inside Training
whatos = platform.system()
for fruit_dir_path in glob.glob("Training/*"):
    if whatos == 'Windows':
        fruit_label = fruit_dir_path.split("\\")[-1]
    else: # Linux and Mac (Unix)
        fruit_label = fruit_dir_path.split("/")[-1]
    # Loop over each pic in each folder inside Training
    for image_path in glob.glob(os.path.join(fruit_dir_path, "*.jpg")):
        image = cv2.imread(image_path, cv2.IMREAD_COLOR)
        
        image = cv2.resize(image, (img_height, img_width))
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        train_fruit_images.append(image)
        train_labels.append(fruit_label)
train_fruit_images = np.array(train_fruit_images)
train_labels = np.array(train_labels)
In [3]:
label_to_id_dict = {v:i for i,v in enumerate(np.unique(train_labels))}
id_to_label_dict = {v: k for k, v in label_to_id_dict.items()}
train_label_ids = np.array([label_to_id_dict[label] for label in train_labels])
In [4]:
# Same for test set, reading all pictures
# Each picture -> array of height x width x 3 dimenion
test_fruit_images = []
test_labels = [] 
# Loop over folder inside Validation
whatos = platform.system()
for fruit_dir_path in glob.glob("Validation/*"):
    if whatos == 'Windows':
        fruit_label = fruit_dir_path.split("\\")[-1]
    else: # Linux and Mac (Unix)
        fruit_label = fruit_dir_path.split("/")[-1]
    # Loop over each pic in each folder inside Validation
    for image_path in glob.glob(os.path.join(fruit_dir_path, "*.jpg")):
        image = cv2.imread(image_path, cv2.IMREAD_COLOR)
        
        image = cv2.resize(image, (img_height, img_width))
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        test_fruit_images.append(image)
        test_labels.append(fruit_label)
test_fruit_images = np.array(test_fruit_images)
test_labels = np.array(test_labels)
In [5]:
test_label_ids = np.array([label_to_id_dict[label] for label in test_labels])

Plotting some tranining samples

In [46]:
# Plotting some training samples
nplot=10
fig, axes = plt.subplots(nplot, nplot, sharex=True, sharey=True, figsize=(9,9))
train_index = np.arange(0,train_fruit_images.shape[0])
for i in range(nplot):
    for j in range(nplot):
        axes[i,j].imshow(train_fruit_images[np.random.choice(train_index,1,replace=False)[0]])
        axes[i,j].set_xticklabels([])
        axes[i,j].set_yticklabels([])
        #axes[i,j].axis('off')
plt.show()
In [45]:
# Examples of training samples for one fruit
nplot = 10
offset = 492*5
fig, axes = plt.subplots(nplot, nplot, sharex=True, sharey=True, figsize=(9,9))
for i in range(nplot):
    for j in range(nplot):
        axes[i,j].imshow(train_fruit_images[i*nplot+j+offset])
        axes[i,j].set_xticklabels([])
        axes[i,j].set_yticklabels([])
        #axes[i,j].axis('off')
plt.show()

Train-Test sets

In [8]:
# Train-Test
X_train = train_fruit_images
Y_train = train_label_ids
X_test = test_fruit_images
Y_test = test_label_ids

# Normalizing features between 0 and 1
X_train = X_train/255
X_test = X_test/255

# Flattening each image features to 1D-array
X_train_flat = X_train.reshape(X_train.shape[0], img_height*img_width*3)
X_test_flat = X_test.reshape(X_test.shape[0], img_height*img_width*3)

# One-hot encode of the Output
Y_train = keras.utils.to_categorical(Y_train, len(label_to_id_dict))
Y_test = keras.utils.to_categorical(Y_test, len(label_to_id_dict))

# Shape of input and output
print('Original Sizes:', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
print('Flattened:', X_train_flat.shape, X_test_flat.shape)
Original Sizes: (34641, 48, 48, 3) (11640, 48, 48, 3) (34641, 69) (11640, 69)
Flattened: (34641, 6912) (11640, 6912)

Prediction Models

Random Forest

In [9]:
error_nestimators = []
for n_estimators in [10, 20, 30, 40, 50, 60, 70, 80]:
    #print('Number of Trees: ', n_estimators)
    rf_clf = RandomForestClassifier(n_estimators=n_estimators, oob_score=True)
    rf_clf.fit(X_train_flat, train_label_ids)
    fit_score = rf_clf.score(X_train_flat, train_label_ids)
    test_score = rf_clf.score(X_test_flat, test_label_ids)
    #print('Fitting score: ', fit_score)
    #print('Testng score: ', test_score)
    error_nestimators.append([n_estimators, fit_score, test_score, rf_clf.oob_score_])
error_nestimators = np.array(error_nestimators)
C:\Users\mcnguyen\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\ensemble\forest.py:453: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
C:\Users\mcnguyen\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\ensemble\forest.py:458: RuntimeWarning: invalid value encountered in true_divide
  predictions[k].sum(axis=1)[:, np.newaxis])
C:\Users\mcnguyen\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\ensemble\forest.py:453: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
C:\Users\mcnguyen\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\ensemble\forest.py:458: RuntimeWarning: invalid value encountered in true_divide
  predictions[k].sum(axis=1)[:, np.newaxis])
In [10]:
error_maxfeatures = []
for max_features in ['sqrt','log2', None]:
    #print('Number of Trees: ', max_features)
    rf_clf = RandomForestClassifier(n_estimators=30, max_features=max_features)
    rf_clf.fit(X_train_flat, train_label_ids)
    fit_score = rf_clf.score(X_train_flat, train_label_ids)
    test_score = rf_clf.score(X_test_flat, test_label_ids)
    #print('Fitting score: ', fit_score)
    #print('Testng score: ', test_score)
    error_maxfeatures.append([max_features, fit_score, test_score])
error_maxfeatures = np.array(error_maxfeatures)
In [11]:
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(8,4))
ax1.plot(error_nestimators[:,0], error_nestimators[:,1], marker='o', label='Fitting')
ax1.plot(error_nestimators[:,0], error_nestimators[:,2], marker='o', label='Testing')
ax1.plot(error_nestimators[:,0], error_nestimators[:,3], marker='o', label='OOB')
ax1.set_xlabel('# of trees')
ax1.set_ylabel('Score')
ax1.legend()

ax2.plot([0,1,2], error_maxfeatures[:,1], marker='o', markersize=10, label='Fitting')
ax2.plot([0,1,2], error_maxfeatures[:,2], marker='o', markersize=10, label='Testing')
ax2.set_xlabel('# of features')
#ax2.set_ylabel('Score')
ax2.set_xlim(0,2)
ax2.set_xticks([0,1,2])
ax2.set_xticklabels(['Sqrt','Log2','All'])
ax2.legend()

plt.show()

We can get the best Random Forest Clasifier with 60 trees, where the fitting score is 1.0, tesing score about 0.954 and out-of-bag (OOB) score is 0.993.

Extra Tree or Extremely Randominzed Trees

In [12]:
error_nestimators = []
for n_estimators in [10, 20, 30, 40, 50, 60, 70, 80]:
    #print('Number of Trees: ', n_estimators)
    et_clf = ExtraTreesClassifier(n_estimators=n_estimators)
    et_clf.fit(X_train_flat, train_label_ids)
    fit_score = et_clf.score(X_train_flat, train_label_ids)
    test_score = et_clf.score(X_test_flat, test_label_ids)
    error_nestimators.append([n_estimators, fit_score, test_score])
error_nestimators = np.array(error_nestimators)
In [13]:
fig, ax1 = plt.subplots(1,1, figsize=(4,4))
ax1.plot(error_nestimators[:,0], error_nestimators[:,1], marker='o', label='Fitting')
ax1.plot(error_nestimators[:,0], error_nestimators[:,2], marker='o', label='Testing')
ax1.set_xlabel('# of trees')
ax1.set_ylabel('Score')
ax1.legend()
plt.show()

The best result is for 80 trees and the fitting score is 1.0 and testing score is 0.952, very slightly less than Random Forest Classifier. This is reasonable since Random Forest and Extra Tree methods are very similar. However, both models are quite overfit.

In [42]:
nplotx=6
nploty=6
model = rf_clf
fig, axes = plt.subplots(nploty, nplotx, sharex=True, sharey=True, figsize=(10,10))
test_index = np.arange(0,test_fruit_images.shape[0])
for i in range(nploty):
    for j in range(nplotx):
        ind=np.random.choice(test_index,1,replace=False)[0]
        axes[i,j].imshow(test_fruit_images[ind])
        axes[i,j].set_xticklabels([])
        axes[i,j].set_yticklabels([])
        axes[i,j].set_title(str(test_labels[ind])+'\n'+'Predicted: '+str(id_to_label_dict[model.predict(X_test_flat[ind:ind+1])[0]]), fontsize=8)
plt.tight_layout()
plt.show()

Logistic Regression

In [14]:
lg_error = []
for C in [10000, 100, 1.0, 0.01, 0.0001]:
    lg_clf = LogisticRegression(C=C)
    lg_clf.fit(X_train_flat, train_label_ids)
    fit_score = lg_clf.score(X_train_flat, train_label_ids)
    test_score = lg_clf.score(X_test_flat, test_label_ids)
    lg_error.append([C, fit_score, test_score])
lg_error = np.array(lg_error)
In [15]:
fig, ax1 = plt.subplots(1,1, figsize=(4,4))
ax1.plot(np.log10(1/lg_error[:,0]), lg_error[:,1], marker='o', label='Fitting')
ax1.plot(np.log10(1/lg_error[:,0]), lg_error[:,2], marker='o', label='Testing')
ax1.set_xlabel('Regu strength (log scale)')
ax1.set_ylabel('Score')
ax1.legend()
plt.show()

For C = 1 we can get fitting score 1.0 and testing score 0.878, which are quite overfitted. However, increasing regularization strength, we can not reduce overfit a significantly. For very large regularization strength, both fitting and testing score reduce significantly, about 0.78 for fitting and 0.66 for testing.

In [58]:
C=1
lg_clf = LogisticRegression(C=C)
lg_clf.fit(X_train_flat, train_label_ids)
nplotx=6
nploty=6
model = lg_clf
fig, axes = plt.subplots(nploty, nplotx, sharex=True, sharey=True, figsize=(10,10))
test_index = np.arange(0,test_fruit_images.shape[0])
for i in range(nploty):
    for j in range(nplotx):
        ind=np.random.choice(test_index,1,replace=False)[0]
        axes[i,j].imshow(test_fruit_images[ind])
        axes[i,j].set_xticklabels([])
        axes[i,j].set_yticklabels([])
        axes[i,j].set_title(str(test_labels[ind])+'\n'+'Predicted: '+str(id_to_label_dict[model.predict(X_test_flat[ind:ind+1])[0]]), fontsize=8)
plt.tight_layout()
plt.show()

KNN

In [18]:
knn_error = []
for kn in [3, 5, 10, 20]:
    knn_clf = KNeighborsClassifier(n_neighbors=kn)
    knn_clf.fit(X_train_flat, train_label_ids)
    fit_score = knn_clf.score(X_train_flat, train_label_ids)
    test_score = knn_clf.score(X_test_flat, test_label_ids)
    knn_error.append([kn, fit_score, test_score])
knn_error = np.array(knn_error)
In [19]:
fig, ax1 = plt.subplots(1,1, figsize=(4,4))
ax1.plot(knn_error[:,0], knn_error[:,1], marker='o', label='Fitting')
ax1.plot(knn_error[:,0], knn_error[:,2], marker='o', label='Testing')
ax1.set_xlabel('# neighbor')
ax1.set_ylabel('Score')
ax1.legend()
plt.show()

We can get a testing score of ~ 0.919 with k3. Increasing number of neighbors, both fitting and testing scores are decreasing but we can not reduce overfit. As above model, KNN is also quite overfitted to the current fruit classification problem.

In [44]:
kn = 3
knn_clf = KNeighborsClassifier(n_neighbors=kn)
knn_clf.fit(X_train_flat, train_label_ids)
nplotx=6
nploty=6
model = knn_clf
fig, axes = plt.subplots(nploty, nplotx, sharex=True, sharey=True, figsize=(10,10))
test_index = np.arange(0,test_fruit_images.shape[0])
for i in range(nploty):
    for j in range(nplotx):
        ind=np.random.choice(test_index,1,replace=False)[0]
        axes[i,j].imshow(test_fruit_images[ind])
        axes[i,j].set_xticklabels([])
        axes[i,j].set_yticklabels([])
        axes[i,j].set_title(str(test_labels[ind])+'\n'+'Predicted: '+str(id_to_label_dict[model.predict(X_test_flat[ind:ind+1])[0]]), fontsize=8)
plt.tight_layout()
plt.show()

Neural Networks

First we will use fully connected neural networks model with Keras package with TensorFlow as backend.

In [20]:
#1st very simple model
model_dense = Sequential()
model_dense.add(Dense(256, activation='relu', input_shape=(X_train_flat.shape[1],)))
# Dropout layer to reduce variance
model_dense.add(Dropout(0.1))
model_dense.add(Dense(64, activation='relu'))
model_dense.add(Dropout(0.1))
model_dense.add(Dense(Y_train.shape[1], activation='softmax'))

# Summary of model
model_dense.summary()

# Compiling model
model_dense.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])

history_model_dense = model_dense.fit(X_train_flat, Y_train, batch_size=128, epochs=30, validation_data=(X_test_flat, Y_test))
score = model_dense.evaluate(X_test_flat, Y_test, verbose=0)
print('Test loss: ', score[0])
print('Test accuracy: ', score[1])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 256)               1769728   
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                16448     
_________________________________________________________________
dropout_2 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 69)                4485      
=================================================================
Total params: 1,790,661
Trainable params: 1,790,661
Non-trainable params: 0
_________________________________________________________________
Train on 34641 samples, validate on 11640 samples
Epoch 1/30
34641/34641 [==============================] - 16s 463us/step - loss: 3.3793 - acc: 0.1643 - val_loss: 1.8294 - val_acc: 0.4604
Epoch 2/30
34641/34641 [==============================] - 15s 423us/step - loss: 1.5446 - acc: 0.5141 - val_loss: 0.8711 - val_acc: 0.7344
Epoch 3/30
34641/34641 [==============================] - 16s 451us/step - loss: 1.0293 - acc: 0.6619 - val_loss: 0.7397 - val_acc: 0.7644
Epoch 4/30
34641/34641 [==============================] - 16s 457us/step - loss: 0.7844 - acc: 0.7410 - val_loss: 0.7558 - val_acc: 0.7485
Epoch 5/30
34641/34641 [==============================] - 16s 460us/step - loss: 0.6277 - acc: 0.7897 - val_loss: 0.5245 - val_acc: 0.8458
Epoch 6/30
34641/34641 [==============================] - 16s 461us/step - loss: 0.5372 - acc: 0.8219 - val_loss: 0.6815 - val_acc: 0.7773
Epoch 7/30
34641/34641 [==============================] - 16s 467us/step - loss: 0.4763 - acc: 0.8419 - val_loss: 0.6501 - val_acc: 0.8162
Epoch 8/30
34641/34641 [==============================] - 16s 464us/step - loss: 0.4218 - acc: 0.8598 - val_loss: 0.3244 - val_acc: 0.8988
Epoch 9/30
34641/34641 [==============================] - 16s 471us/step - loss: 0.3747 - acc: 0.8751 - val_loss: 0.3495 - val_acc: 0.8810
Epoch 10/30
34641/34641 [==============================] - 16s 468us/step - loss: 0.3426 - acc: 0.8883 - val_loss: 0.3525 - val_acc: 0.8862
Epoch 11/30
34641/34641 [==============================] - 16s 465us/step - loss: 0.3196 - acc: 0.8930 - val_loss: 0.4301 - val_acc: 0.8757
Epoch 12/30
34641/34641 [==============================] - 16s 469us/step - loss: 0.3003 - acc: 0.9011 - val_loss: 0.4192 - val_acc: 0.8686
Epoch 13/30
34641/34641 [==============================] - 16s 467us/step - loss: 0.2853 - acc: 0.9056 - val_loss: 0.3304 - val_acc: 0.8853
Epoch 14/30
34641/34641 [==============================] - 17s 483us/step - loss: 0.2736 - acc: 0.9096 - val_loss: 0.2724 - val_acc: 0.9097
Epoch 15/30
34641/34641 [==============================] - 16s 472us/step - loss: 0.2501 - acc: 0.9173 - val_loss: 0.2971 - val_acc: 0.9087
Epoch 16/30
34641/34641 [==============================] - 16s 474us/step - loss: 0.2541 - acc: 0.9168 - val_loss: 0.3231 - val_acc: 0.9118
Epoch 17/30
34641/34641 [==============================] - 16s 473us/step - loss: 0.2371 - acc: 0.9216 - val_loss: 0.2658 - val_acc: 0.9162
Epoch 18/30
34641/34641 [==============================] - 16s 472us/step - loss: 0.2297 - acc: 0.9246 - val_loss: 0.3594 - val_acc: 0.8915
Epoch 19/30
34641/34641 [==============================] - 17s 478us/step - loss: 0.2249 - acc: 0.9284 - val_loss: 0.2352 - val_acc: 0.9253
Epoch 20/30
34641/34641 [==============================] - 16s 462us/step - loss: 0.2063 - acc: 0.9312 - val_loss: 0.4710 - val_acc: 0.8558
Epoch 21/30
34641/34641 [==============================] - 16s 459us/step - loss: 0.2056 - acc: 0.9319 - val_loss: 0.2172 - val_acc: 0.9327
Epoch 22/30
34641/34641 [==============================] - 16s 461us/step - loss: 0.2032 - acc: 0.9338 - val_loss: 0.3023 - val_acc: 0.9165
Epoch 23/30
34641/34641 [==============================] - 16s 455us/step - loss: 0.1991 - acc: 0.9365 - val_loss: 0.3586 - val_acc: 0.8959
Epoch 24/30
34641/34641 [==============================] - 16s 461us/step - loss: 0.1919 - acc: 0.9366 - val_loss: 0.2664 - val_acc: 0.9081
Epoch 25/30
34641/34641 [==============================] - 16s 464us/step - loss: 0.1902 - acc: 0.9388 - val_loss: 0.3147 - val_acc: 0.9007
Epoch 26/30
34641/34641 [==============================] - 16s 462us/step - loss: 0.1812 - acc: 0.9413 - val_loss: 0.3008 - val_acc: 0.9225
Epoch 27/30
34641/34641 [==============================] - 16s 460us/step - loss: 0.1823 - acc: 0.9417 - val_loss: 0.2238 - val_acc: 0.9258
Epoch 28/30
34641/34641 [==============================] - 16s 465us/step - loss: 0.1794 - acc: 0.9413 - val_loss: 0.2798 - val_acc: 0.9240
Epoch 29/30
34641/34641 [==============================] - 16s 462us/step - loss: 0.1696 - acc: 0.9444 - val_loss: 0.3950 - val_acc: 0.8874
Epoch 30/30
34641/34641 [==============================] - 16s 460us/step - loss: 0.1669 - acc: 0.9456 - val_loss: 0.5108 - val_acc: 0.8695
Test loss:  0.51076778593
Test accuracy:  0.869501718213
In [22]:
# Deeper model
model_dense = Sequential()
model_dense.add(Dense(256, activation='relu', input_shape=(X_train_flat.shape[1],)))
# Dropout layer to reduce variance
model_dense.add(Dropout(0.2))
model_dense.add(Dense(128, activation='relu'))
# Batch Normalization to stabilize fitting and partially to reduce variance
model_dense.add(BatchNormalization())
model_dense.add(Dense(128, activation='relu'))
model_dense.add(BatchNormalization())
model_dense.add(Dense(128, activation='relu'))
model_dense.add(Dropout(0.5))
model_dense.add(Dense(Y_train.shape[1], activation='softmax'))

# Summary of model
model_dense.summary()

# Compiling model
model_dense.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])

history_model_dense = model_dense.fit(X_train_flat, Y_train, batch_size=128, epochs=30, validation_data=(X_test_flat, Y_test))
score = model_dense.evaluate(X_test_flat, Y_test, verbose=0)
print('Test loss: ', score[0])
print('Test accuracy: ', score[1])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_9 (Dense)              (None, 256)               1769728   
_________________________________________________________________
dropout_6 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 128)               32896     
_________________________________________________________________
batch_normalization_2 (Batch (None, 128)               512       
_________________________________________________________________
dense_11 (Dense)             (None, 128)               16512     
_________________________________________________________________
batch_normalization_3 (Batch (None, 128)               512       
_________________________________________________________________
dense_12 (Dense)             (None, 128)               16512     
_________________________________________________________________
dropout_7 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_13 (Dense)             (None, 69)                8901      
=================================================================
Total params: 1,845,573
Trainable params: 1,845,061
Non-trainable params: 512
_________________________________________________________________
Train on 34641 samples, validate on 11640 samples
Epoch 1/30
34641/34641 [==============================] - 23s 661us/step - loss: 2.9822 - acc: 0.2182 - val_loss: 2.3002 - val_acc: 0.2962
Epoch 2/30
34641/34641 [==============================] - 17s 501us/step - loss: 1.1285 - acc: 0.6380 - val_loss: 1.0442 - val_acc: 0.6326
Epoch 3/30
34641/34641 [==============================] - 18s 522us/step - loss: 0.6843 - acc: 0.7740 - val_loss: 0.7480 - val_acc: 0.7739
Epoch 4/30
34641/34641 [==============================] - 15s 441us/step - loss: 0.5015 - acc: 0.8323 - val_loss: 0.5920 - val_acc: 0.8014
Epoch 5/30
34641/34641 [==============================] - 15s 440us/step - loss: 0.4056 - acc: 0.8665 - val_loss: 1.1983 - val_acc: 0.6556
Epoch 6/30
34641/34641 [==============================] - 15s 420us/step - loss: 0.3470 - acc: 0.8854 - val_loss: 0.3440 - val_acc: 0.8881
Epoch 7/30
34641/34641 [==============================] - 15s 428us/step - loss: 0.3033 - acc: 0.8989 - val_loss: 1.4205 - val_acc: 0.6898
Epoch 8/30
34641/34641 [==============================] - 15s 428us/step - loss: 0.2761 - acc: 0.9088 - val_loss: 0.4021 - val_acc: 0.8885
Epoch 9/30
34641/34641 [==============================] - 15s 426us/step - loss: 0.2444 - acc: 0.9187 - val_loss: 0.2635 - val_acc: 0.9155
Epoch 10/30
34641/34641 [==============================] - 15s 427us/step - loss: 0.2225 - acc: 0.9249 - val_loss: 0.5415 - val_acc: 0.8418
Epoch 11/30
34641/34641 [==============================] - 15s 430us/step - loss: 0.2081 - acc: 0.9323 - val_loss: 0.4251 - val_acc: 0.8862
Epoch 12/30
34641/34641 [==============================] - 15s 426us/step - loss: 0.1954 - acc: 0.9342 - val_loss: 0.5040 - val_acc: 0.8667
Epoch 13/30
34641/34641 [==============================] - 15s 445us/step - loss: 0.1849 - acc: 0.9407 - val_loss: 0.4804 - val_acc: 0.8707
Epoch 14/30
34641/34641 [==============================] - 18s 533us/step - loss: 0.1836 - acc: 0.9413 - val_loss: 0.5151 - val_acc: 0.8649
Epoch 15/30
34641/34641 [==============================] - 16s 470us/step - loss: 0.1761 - acc: 0.9414 - val_loss: 0.2769 - val_acc: 0.9210
Epoch 16/30
34641/34641 [==============================] - 17s 487us/step - loss: 0.1575 - acc: 0.9481 - val_loss: 0.3382 - val_acc: 0.9055
Epoch 17/30
34641/34641 [==============================] - 15s 444us/step - loss: 0.1587 - acc: 0.9489 - val_loss: 0.5532 - val_acc: 0.8351
Epoch 18/30
34641/34641 [==============================] - 16s 452us/step - loss: 0.1471 - acc: 0.9525 - val_loss: 0.3038 - val_acc: 0.9124
Epoch 19/30
34641/34641 [==============================] - 15s 441us/step - loss: 0.1471 - acc: 0.9529 - val_loss: 0.3027 - val_acc: 0.9230
Epoch 20/30
34641/34641 [==============================] - 15s 447us/step - loss: 0.1357 - acc: 0.9557 - val_loss: 0.3647 - val_acc: 0.9074
Epoch 21/30
34641/34641 [==============================] - 15s 442us/step - loss: 0.1349 - acc: 0.9544 - val_loss: 0.3163 - val_acc: 0.9252
Epoch 22/30
34641/34641 [==============================] - 16s 451us/step - loss: 0.1328 - acc: 0.9582 - val_loss: 0.3490 - val_acc: 0.9169
Epoch 23/30
34641/34641 [==============================] - 15s 433us/step - loss: 0.1268 - acc: 0.9594 - val_loss: 0.3489 - val_acc: 0.9109
Epoch 24/30
34641/34641 [==============================] - 15s 435us/step - loss: 0.1294 - acc: 0.9594 - val_loss: 0.3509 - val_acc: 0.9236
Epoch 25/30
34641/34641 [==============================] - 15s 441us/step - loss: 0.1206 - acc: 0.9603 - val_loss: 0.5572 - val_acc: 0.8753
Epoch 26/30
34641/34641 [==============================] - 16s 449us/step - loss: 0.1232 - acc: 0.9607 - val_loss: 0.4504 - val_acc: 0.8967
Epoch 27/30
34641/34641 [==============================] - 15s 440us/step - loss: 0.1274 - acc: 0.9600 - val_loss: 0.3171 - val_acc: 0.9218
Epoch 28/30
34641/34641 [==============================] - 16s 452us/step - loss: 0.1168 - acc: 0.9630 - val_loss: 0.3332 - val_acc: 0.9256
Epoch 29/30
34641/34641 [==============================] - 18s 516us/step - loss: 0.1191 - acc: 0.9615 - val_loss: 0.3838 - val_acc: 0.9027
Epoch 30/30
34641/34641 [==============================] - 18s 525us/step - loss: 0.1152 - acc: 0.9618 - val_loss: 0.2728 - val_acc: 0.9362
Test loss:  0.272828678756
Test accuracy:  0.93616838488

It's great! With a bit deeper networks, we can archive 0.934 for testing and 0.962 for fitting. There is still overfitting here live above model, but it is less. Dropout was already used to avoid overfit but we still can not avoid it completely. We can use early stop to futher avoid overfit. For example of our case here, we can stop after 7 epochs to have a very a very well generalized neural network model with both fitting and testing score about 0.92! We can expect to get better model with convolutional neural networks (CNN). Let try out CNN!

In [50]:
nplotx=6
nploty=6
model = model_dense
fig, axes = plt.subplots(nploty, nplotx, sharex=True, sharey=True, figsize=(10,10))
test_index = np.arange(0,test_fruit_images.shape[0])
for i in range(nploty):
    for j in range(nplotx):
        ind=np.random.choice(test_index,1,replace=False)[0]
        axes[i,j].imshow(test_fruit_images[ind])
        axes[i,j].set_xticklabels([])
        axes[i,j].set_yticklabels([])
        axes[i,j].set_title(str(test_labels[ind])+'\n'+'Predicted: '+str(id_to_label_dict[model.predict(X_test_flat[ind:ind+1]).argmax()]), fontsize=8)
plt.tight_layout()
plt.show()
In [23]:
# CNN
# Stacking model
model_cnn = Sequential()
model_cnn.add(Conv2D(filters=32, kernel_size=(3,3), activation='relu', input_shape=(img_height, img_width, 3)))
model_cnn.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu'))
model_cnn.add(MaxPool2D(pool_size=(2,2)))
#model_cnn.add(BatchNormalization())
model_cnn.add(Dropout(0.25))
model_cnn.add(Flatten())
model_cnn.add(Dense(128, activation='relu'))
model_cnn.add(Dropout(0.25))
model_cnn.add(Dense(Y_train.shape[1], activation='softmax'))

# Compiling model
model_cnn.compile(loss='categorical_crossentropy',
                  optimizer=Adamax(),
                  metrics=['accuracy']
                 )
model_cnn.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 46, 46, 32)        896       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 44, 44, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 22, 22, 64)        0         
_________________________________________________________________
dropout_8 (Dropout)          (None, 22, 22, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 30976)             0         
_________________________________________________________________
dense_14 (Dense)             (None, 128)               3965056   
_________________________________________________________________
dropout_9 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_15 (Dense)             (None, 69)                8901      
=================================================================
Total params: 3,993,349
Trainable params: 3,993,349
Non-trainable params: 0
_________________________________________________________________
In [24]:
history_model_cnn = model_cnn.fit(X_train, Y_train, batch_size=128,
             epochs=5, validation_data=(X_test, Y_test))
score = model_cnn.evaluate(X_test, Y_test)
print('Test loss: ', score[0])
print('Test accuracy: ', score[1])
Train on 34641 samples, validate on 11640 samples
Epoch 1/5
34641/34641 [==============================] - 377s 11ms/step - loss: 1.9384 - acc: 0.5004 - val_loss: 0.4559 - val_acc: 0.8774
Epoch 2/5
34641/34641 [==============================] - 394s 11ms/step - loss: 0.3615 - acc: 0.8906 - val_loss: 0.2464 - val_acc: 0.9228
Epoch 3/5
34641/34641 [==============================] - 390s 11ms/step - loss: 0.1901 - acc: 0.9401 - val_loss: 0.2027 - val_acc: 0.9390
Epoch 4/5
34641/34641 [==============================] - 384s 11ms/step - loss: 0.1261 - acc: 0.9578 - val_loss: 0.1966 - val_acc: 0.9388
Epoch 5/5
34641/34641 [==============================] - 373s 11ms/step - loss: 0.0957 - acc: 0.9673 - val_loss: 0.1636 - val_acc: 0.9504
11640/11640 [==============================] - 38s 3ms/step
Test loss:  0.163625853475
Test accuracy:  0.950429553265
In [25]:
history_model_cnn = model_cnn.fit(X_train, Y_train, batch_size=128,
             epochs=5, validation_data=(X_test, Y_test))
score = model_cnn.evaluate(X_test, Y_test)
print('Test loss: ', score[0])
print('Test accuracy: ', score[1])
Train on 34641 samples, validate on 11640 samples
Epoch 1/5
34641/34641 [==============================] - 393s 11ms/step - loss: 0.0803 - acc: 0.9720 - val_loss: 0.1628 - val_acc: 0.9525
Epoch 2/5
34641/34641 [==============================] - 481s 14ms/step - loss: 0.0695 - acc: 0.9758 - val_loss: 0.1345 - val_acc: 0.9574
Epoch 3/5
34641/34641 [==============================] - 352s 10ms/step - loss: 0.0634 - acc: 0.9760 - val_loss: 0.1232 - val_acc: 0.9645
Epoch 4/5
34641/34641 [==============================] - 354s 10ms/step - loss: 0.0563 - acc: 0.9788 - val_loss: 0.2071 - val_acc: 0.9498
Epoch 5/5
34641/34641 [==============================] - 363s 10ms/step - loss: 0.0502 - acc: 0.9807 - val_loss: 0.1619 - val_acc: 0.9592
11640/11640 [==============================] - 37s 3ms/step
Test loss:  0.161873538924
Test accuracy:  0.959192439863
In [29]:
# deeper CNN
# Stacking model
model_cnn2 = Sequential()
model_cnn2.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', input_shape=(img_height, img_width, 3)))
model_cnn2.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu'))
model_cnn2.add(MaxPool2D(pool_size=(2,2)))
#model_cnn2.add(Dropout(0.1))
model_cnn2.add(BatchNormalization())

model_cnn2.add(Conv2D(filters=128, kernel_size=(5,5), activation='relu'))
model_cnn2.add(Conv2D(filters=128, kernel_size=(5,5), activation='relu'))
model_cnn2.add(MaxPool2D(pool_size=(2,2)))
model_cnn2.add(Flatten())
model_cnn2.add(Dropout(0.25))
model_cnn2.add(Dense(128, activation='relu'))
model_cnn2.add(Dropout(0.25))
model_cnn2.add(Dense(Y_train.shape[1], activation='softmax'))

# Compiling model
model_cnn2.compile(loss='categorical_crossentropy',
                  optimizer=Adamax(),
                  metrics=['accuracy']
                 )
model_cnn2.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_9 (Conv2D)            (None, 46, 46, 64)        1792      
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 44, 44, 64)        36928     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 22, 22, 64)        0         
_________________________________________________________________
batch_normalization_6 (Batch (None, 22, 22, 64)        256       
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 18, 18, 128)       204928    
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 14, 14, 128)       409728    
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dropout_13 (Dropout)         (None, 6272)              0         
_________________________________________________________________
dense_20 (Dense)             (None, 128)               802944    
_________________________________________________________________
dropout_14 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_21 (Dense)             (None, 69)                8901      
=================================================================
Total params: 1,465,477
Trainable params: 1,465,349
Non-trainable params: 128
_________________________________________________________________
In [30]:
history_model_cnn2 = model_cnn2.fit(X_train, Y_train, batch_size=128,
             epochs=10, validation_data=(X_test, Y_test))
score = model_cnn2.evaluate(X_test, Y_test)
print('Test loss: ', score[0])
print('Test accuracy: ', score[1])
Train on 34641 samples, validate on 11640 samples
Epoch 1/10
34641/34641 [==============================] - 1303s 38ms/step - loss: 1.2130 - acc: 0.6688 - val_loss: 0.4442 - val_acc: 0.8788
Epoch 2/10
34641/34641 [==============================] - 1294s 37ms/step - loss: 0.1993 - acc: 0.9358 - val_loss: 0.2528 - val_acc: 0.9143
Epoch 3/10
34641/34641 [==============================] - 1294s 37ms/step - loss: 0.1204 - acc: 0.9596 - val_loss: 0.1816 - val_acc: 0.9385
Epoch 4/10
34641/34641 [==============================] - 1294s 37ms/step - loss: 0.0765 - acc: 0.9727 - val_loss: 0.2453 - val_acc: 0.9407
Epoch 5/10
34641/34641 [==============================] - 1298s 37ms/step - loss: 0.0695 - acc: 0.9754 - val_loss: 0.1948 - val_acc: 0.9486
Epoch 6/10
34641/34641 [==============================] - 1302s 38ms/step - loss: 0.0522 - acc: 0.9810 - val_loss: 0.1663 - val_acc: 0.9586
Epoch 7/10
34641/34641 [==============================] - 1287s 37ms/step - loss: 0.0547 - acc: 0.9799 - val_loss: 0.0932 - val_acc: 0.9702
Epoch 8/10
34641/34641 [==============================] - 1287s 37ms/step - loss: 0.0325 - acc: 0.9865 - val_loss: 0.1250 - val_acc: 0.9625
Epoch 9/10
34641/34641 [==============================] - 1290s 37ms/step - loss: 0.0406 - acc: 0.9844 - val_loss: 0.2156 - val_acc: 0.9447
Epoch 10/10
34641/34641 [==============================] - 1294s 37ms/step - loss: 0.0349 - acc: 0.9859 - val_loss: 0.1893 - val_acc: 0.9526
11640/11640 [==============================] - 126s 11ms/step
Test loss:  0.189316277101
Test accuracy:  0.952577319547

It's great! We can archive 0.952 for testing and 0.986 for fitting. There is still a bit overfitting here. We can use early stop to avoid overfit a bit more.We can stop after 7 epochs to have a very a very well generalized CNN model with fitting and testing score of 0.980 and 0.970! We can see that CNN is quite advanced than other models in image classification.

In [51]:
nplotx=6
nploty=6
fig, axes = plt.subplots(nploty, nplotx, sharex=True, sharey=True, figsize=(10,10))
test_index = np.arange(0,test_fruit_images.shape[0])
for i in range(nploty):
    for j in range(nplotx):
        ind=np.random.choice(test_index,1,replace=False)[0]
        axes[i,j].imshow(test_fruit_images[ind])
        axes[i,j].set_xticklabels([])
        axes[i,j].set_yticklabels([])
        axes[i,j].set_title(str(test_labels[ind])+'\n'+'Predicted: '+str(id_to_label_dict[model_cnn.predict(X_test[ind:ind+1]).argmax()]), fontsize=8)
plt.tight_layout()
plt.show()

Transfer learning

In [35]:
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
In [36]:
vgg_base = VGG16( include_top=False, weights='imagenet', input_shape=(img_height, img_width, 3))
# Do not change the weights
for layer in vgg_base.layers:
    layer.trainable = False
In [59]:
x = Flatten()(vgg_base.output)
x = Dense(256, activation='relu')(x)
prediction = Dense(Y_train.shape[1], activation='softmax')(x)
vgg_model = Model(inputs=vgg_base.input, outputs=prediction)
vgg_model.summary()
vgg_model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 48, 48, 3)         0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 48, 48, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 48, 48, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 24, 24, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 24, 24, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 24, 24, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 12, 12, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 12, 12, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 12, 12, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 12, 12, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 6, 6, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 6, 6, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 6, 6, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 6, 6, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 3, 3, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 3, 3, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 3, 3, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 3, 3, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
flatten_6 (Flatten)          (None, 512)               0         
_________________________________________________________________
dense_23 (Dense)             (None, 256)               131328    
_________________________________________________________________
dense_24 (Dense)             (None, 69)                17733     
=================================================================
Total params: 14,863,749
Trainable params: 149,061
Non-trainable params: 14,714,688
_________________________________________________________________
In [60]:
history_vgg_model = vgg_model.fit(X_train, Y_train, batch_size=128, epochs=10, validation_data=(X_test, Y_test))
Train on 34641 samples, validate on 11640 samples
Epoch 1/10
34641/34641 [==============================] - 1077s 31ms/step - loss: 1.4144 - acc: 0.7254 - val_loss: 0.6654 - val_acc: 0.8241
Epoch 2/10
34641/34641 [==============================] - 1079s 31ms/step - loss: 0.2658 - acc: 0.9484 - val_loss: 0.3779 - val_acc: 0.8924
Epoch 3/10
34641/34641 [==============================] - 1088s 31ms/step - loss: 0.1107 - acc: 0.9775 - val_loss: 0.2651 - val_acc: 0.9216
Epoch 4/10
34641/34641 [==============================] - 1091s 31ms/step - loss: 0.0590 - acc: 0.9876 - val_loss: 0.2363 - val_acc: 0.9286
Epoch 5/10
34641/34641 [==============================] - 1087s 31ms/step - loss: 0.0379 - acc: 0.9907 - val_loss: 0.1970 - val_acc: 0.9387
Epoch 6/10
34641/34641 [==============================] - 1083s 31ms/step - loss: 0.0283 - acc: 0.9917 - val_loss: 0.2168 - val_acc: 0.9314
Epoch 7/10
34641/34641 [==============================] - 1082s 31ms/step - loss: 0.0218 - acc: 0.9925 - val_loss: 0.2022 - val_acc: 0.9353
Epoch 8/10
34641/34641 [==============================] - 1081s 31ms/step - loss: 0.0186 - acc: 0.9929 - val_loss: 0.1861 - val_acc: 0.9447
Epoch 9/10
34641/34641 [==============================] - 1084s 31ms/step - loss: 0.0167 - acc: 0.9929 - val_loss: 0.1947 - val_acc: 0.9390
Epoch 10/10
34641/34641 [==============================] - 1083s 31ms/step - loss: 0.0156 - acc: 0.9930 - val_loss: 0.1902 - val_acc: 0.9453

We get a good but not great as CNN model above results for transfer learning, since we fit only the last layer and fixed all parameters of VGG16 model fitted to ImageNet dataset. We can get better results if we allow all parameters to change, but that fitting will take quite long time and importantly the current database is quite small, ~35k samples comparing with ~15 millions parameters of VGG16 models, to fit all parameters.