Creating your own Object Detector

The post serves as a guide to creating an object detector.

Overview

  1. Gather and annotate train/validation images
  2. Download models from TensorFlow
  3. Import annotated images as TFRecords
  4. Create Label Map
  5. Train model
  6. Write Inference Graph
  7. Test Model

Gather and Annotate Images

For this project, I gathered images of road signs from Google. It is important to grab images that differ in size, lighting, clarity, and noise, in order to optimize your model.  I used  LabelImg software (downloaded here) to annotate the images, and saved them as .xml files.

LabelImg software used to annotate images
The xml file stores the image path, image size, label, and coordinates for the bounding box.

Download Models from TensorFlow

Now we want to download the object detection model we want from TensorFlow. This is the model we will train our images on. TensorFlow offers different options of models to choose from based on your needs; I decided to use the ssd_mobilenet_v2_coco model. This step can be completed by opening up your notebook and copying the code below:

#Downloading models from tensorflow github
#move to object_detection folder
!git clone https://github.com/tensorflow/models/
%cd ./models/research/object_detection

#get the model we want to train on (I chose ssd_mobilenet_v2_coco)
!wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
!tar -xvzf ssd_mobilenet_v2_coco_2018_03_29.tar.gz
!rm -rf ssd_mobilenet_v2_coco_2018_03_29.tar.gz

We also need to add new environmental variables for “model”, “model/research”, and “model/research/slim”. We create these environmental variables to make it easier to access these directories by creating direct paths.

The program should now have all of the files and software needed to execute successfully.

Importing Annotated Images

We now want to take the images we annotated and import them into our notebook. The code below is a function that converts our xml files into a .csv:

#function to convert xml file to csv
def xml_to_csv(path):
   xml_list = []
   for xml_file in glob.glob(path + '/*.xml'):
       tree = ET.parse(xml_file)
       root = tree.getroot()
       for member in root.findall('object'):
           value = (root.find('filename').text,
                    int(root.find('size')[0].text),
                    int(root.find('size')[1].text),
                    member[0].text,
                    int(member[4][0].text),
                    int(member[4][1].text),
                    int(member[4][2].text),
                    int(member[4][3].text))
           xml_list.append(value)
   column_name = ['filename', 'width', 'height', 'class', 'xmin',
   'ymin', 'xmax', 'ymax']
   xml_df = pd.DataFrame(xml_list, columns=column_name)
   return xml_df

Read in xml files from your directory. Mine are stored in ‘train’ and ‘val’ folders.

Create TFRecord Files

Since we expect our model to train on large amounts of data, it is a good idea to convert your files into TFRecords. TFRecords (TensorFlow Records) are stored in TensorFlow’s binary format, which improves performance when training. (For more on TFRecords, read here.)

Assign a unique ID for each distinct label in your training set, as shown below. I had 39 different classes of images, so I assigned 39 different IDs to each label.

def class_text_to_int(row_label):
    if row_label == 'stop':
        return 1
    elif row_label == 'pedestrianCrossing':
        return 2
    elif row_label == 'oneWay':
        return 3
    elif row_label == 'rightLaneMustTurn':
        return 4
    elif row_label == 'speedLimit35':
        return 5
    #continue with all 39 classes
    else:
        None

Once you have successfully created the TFRecords, it is time to create the labelmap.pbtxt file that the model will use to classify the objects it has detected in an image.  We are going to write this file into the training folder, as shown by the first line below.

%%writefile ./object_detection/training/labelmap.pbtxt

item {
  id: 1
  name: 'stop'
}

item {
  id: 2
  name: 'pedestrianCrossing'
}

item {
  id: 3
  name: 'oneWay'
}

item {
  id: 4
  name: 'rightLaneMustTurn'
}

item {
  id: 5
  name: 'speedLimit35'
}

##keep doing this for the rest of the classes

Once you have your label map, it is time to work with the model file. This file will slightly change depending on which model you choose, but overall, the parts that you need to specify remain the same. Again, we are going to write this file into the training folder.

%%writefile ./object_detection/training/ssd_mobilenet_v2_coco.config
You will need to copy the code for this section from TensorFlow Github (depending on the model you choose), and change these next 4 sections to fit your dataset. It may be easiest to do Ctrl+F to find these sections of code in the file.ose
#change the number of classes in your dataset (I have 39)
#don't need to change anything else here

num_classes: 39
  box_coder {
    faster_rcnn_box_coder {
      y_scale: 10.0
      x_scale: 10.0
      height_scale: 5.0
      width_scale: 5.0
    }
}

Change the fine_tune_checkpoint to where the model.cpkt file is stored. Typically this will be stored in your specific model file (mine is in object_detection/ssd_mobilenet_v2_coco_2018_03_29 because I am using the ssd mobilenet model).

#change this to be where your model.ckpt file is located
fine_tune_checkpoint: "/content/models/research/object_detection/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt"

Update this section of code to how many steps/iterations you want your model to run. For a smaller more defined dataset, you may need less steps. For a larger, more diverse dataset, a larger number of steps may be better. The goal is to get the optimal number of steps that will generate the smallest loss.

#update the number of steps you'd like to train your model
num_steps: 10000
data_augmentation_options {
 random_horizontal_flip {
 }
}

Change the training input path to be where your train.record file is located. Also change the label map path to be where your labelmap.pbtxt file is located (in training folder).

train_input_reader: {
 tf_record_input_reader {
   #change this to where train.record is located
   input_path: "/content/models/research/train.record"
 }
 #change this to where labelmap.pbtxt is located
 label_map_path: "/content/models/research/object_detection/training/labelmap.pbtxt"
}

For the evaluation configuration, set the number of examples to how many test images you have.

eval_config: {
 #change this to the number of test images you have
 num_examples: 160
 max_evals: 10
}

Lastly, change the validation input path to be where val.record is located. Set the validation label map path to be where labelmap.pbtxt is located (same as for training).

eval_input_reader: {
 tf_record_input_reader {
   input_path: "/content/models/research/val.record"
 }
 label_map_path: "/content/models/research/object_detection/training/labelmap.pbtxt"
 shuffle: false
 num_readers: 1
}

Train Model

Now that the configuration file is ready to go, it is time to train the model! This step will be time consuming depending on how many steps you choose. When you train your model, the model will save checkpoints to the training directory every once in a while. Once you are done training the model the training directory should look something like this:

checkpoint
This is sample output for the training directory after training the model. In this example, I trained with Faster rCNN, and only trained 2000 times.

We want to use the most recent/updated model. In this case it it is model.ckpt-2000 because I trained 2000 times.

Create Inference Graph

After training the model, we need to create an inference graph. This can be done (regardless of model chosen) with the my_inference_graph.py file on my GitHub, and execute the code below (change the model.ckpt):

!python3 my_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path training/ssd_mobilenet_v2_coco.config \
--trained_checkpoint_prefix training/model.ckpt-2000 \ #change this to be the most recent model saved in training directory
--output_directory ./inference_graph

Test Model

Once you create the inference graph, your model is ready to be tested. Update the code below to reflect your dataset and where your files/test images are stored.

# Name of the directory containing the object detection module we're using
MODEL_NAME = 'inference_graph'
#test image name
IMAGE_NAME = 'stopyield.jpg'

# Grab path to current working directory
CWD_PATH = os.getcwd()
# Path to frozen detection graph .pb file, which contains the model that is used
# for object detection.

PATH_TO_CKPT = os.path.join(CWD_PATH ,MODEL_NAME,'frozen_inference_graph.pb')

# Path to label map file

PATH_TO_LABELS = os.path.join(CWD_PATH,'training','labelmap.pbtxt')

# Path to image
#my image is stored in /content/, which is why I used that instead of CWD_PATH
PATH_TO_IMAGE = os.path.join('/content/', IMAGE_NAME)

# Number of classes the object detector can identify
NUM_CLASSES = 39

After making these changes and running the remaining code, you should successfully be able to detect objects in your test image!

object_detection.png
Output image after training object detection model

Feel free to run this code by downloading the files from my GitHub repository.

References:
https://github.com/italojs/traffic-lights-detector 
https://towardsdatascience.com/creating-your-own-object-detector-ad69dda69c85
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10