The post serves as a guide to creating an object detector.
Before beginning, make sure to have Jupyter Notebook installed.
Overview
- Gather and annotate train/validation images
- Download models from TensorFlow
- Import annotated images as TFRecords
- Create Label Map
- Train model
- Write Inference Graph
- Test Model
Gather and Annotate Images
For this project, I gathered images of road signs from Google. It is important to grab images that differ in size, lighting, clarity, and noise, in order to optimize your model. I used LabelImg software (downloaded here) to annotate the images, and saved them as .xml files.
Download Models from TensorFlow
Now we want to download the object detection model we want from TensorFlow. This is the model we will train our images on. TensorFlow offers different options of models to choose from based on your needs; I decided to use the ssd_mobilenet_v2_coco model. This step can be completed by opening up your notebook and copying the code below:
#Downloading models from tensorflow github #move to object_detection folder !git clone https://github.com/tensorflow/models/ %cd ./models/research/object_detection #get the model we want to train on (I chose ssd_mobilenet_v2_coco) !wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz !tar -xvzf ssd_mobilenet_v2_coco_2018_03_29.tar.gz !rm -rf ssd_mobilenet_v2_coco_2018_03_29.tar.gz
We also need to add new environmental variables for “model”, “model/research”, and “model/research/slim”. We create these environmental variables to make it easier to access these directories by creating direct paths.
The program should now have all of the files and software needed to execute successfully.
Importing Annotated Images
We now want to take the images we annotated and import them into our notebook. The code below is a function that converts our xml files into a .csv:
#function to convert xml file to csv def xml_to_csv(path): xml_list = [] for xml_file in glob.glob(path + '/*.xml'): tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall('object'): value = (root.find('filename').text, int(root.find('size')[0].text), int(root.find('size')[1].text), member[0].text, int(member[4][0].text), int(member[4][1].text), int(member[4][2].text), int(member[4][3].text)) xml_list.append(value) column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax'] xml_df = pd.DataFrame(xml_list, columns=column_name) return xml_df
Read in xml files from your directory. Mine are stored in ‘train’ and ‘val’ folders.
Create TFRecord Files
Since we expect our model to train on large amounts of data, it is a good idea to convert your files into TFRecords. TFRecords (TensorFlow Records) are stored in TensorFlow’s binary format, which improves performance when training. (For more on TFRecords, read here.)
Assign a unique ID for each distinct label in your training set, as shown below. I had 39 different classes of images, so I assigned 39 different IDs to each label.
def class_text_to_int(row_label): if row_label == 'stop': return 1 elif row_label == 'pedestrianCrossing': return 2 elif row_label == 'oneWay': return 3 elif row_label == 'rightLaneMustTurn': return 4 elif row_label == 'speedLimit35': return 5 #continue with all 39 classes else: None
Once you have successfully created the TFRecords, it is time to create the labelmap.pbtxt file that the model will use to classify the objects it has detected in an image. We are going to write this file into the training folder, as shown by the first line below.
%%writefile ./object_detection/training/labelmap.pbtxt item { id: 1 name: 'stop' } item { id: 2 name: 'pedestrianCrossing' } item { id: 3 name: 'oneWay' } item { id: 4 name: 'rightLaneMustTurn' } item { id: 5 name: 'speedLimit35' } ##keep doing this for the rest of the classes
Once you have your label map, it is time to work with the model file. This file will slightly change depending on which model you choose, but overall, the parts that you need to specify remain the same. Again, we are going to write this file into the training folder.
%%writefile ./object_detection/training/ssd_mobilenet_v2_coco.config
#change the number of classes in your dataset (I have 39) #don't need to change anything else here num_classes: 39 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } }
Change the fine_tune_checkpoint to where the model.cpkt file is stored. Typically this will be stored in your specific model file (mine is in object_detection/ssd_mobilenet_v2_coco_2018_03_29 because I am using the ssd mobilenet model).
#change this to be where your model.ckpt file is located fine_tune_checkpoint: "/content/models/research/object_detection/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt"
Update this section of code to how many steps/iterations you want your model to run. For a smaller more defined dataset, you may need less steps. For a larger, more diverse dataset, a larger number of steps may be better. The goal is to get the optimal number of steps that will generate the smallest loss.
#update the number of steps you'd like to train your model num_steps: 6050 data_augmentation_options { random_horizontal_flip { } }
Change the training input path to be where your train.record file is located. Also change the label map path to be where your labelmap.pbtxt file is located (in training folder).
train_input_reader: { tf_record_input_reader { #change this to where train.record is located input_path: "/home/hannah/Software/python/models/research/new train.record" } #change this to where labelmap.pbtxt is located label_map_path: "/home/hannah/Software/python/models/research/object_detection/training/labelmap.pbtxt" }
For the evaluation configuration, set the number of examples to how many test images you have.
eval_config: { #change this to the number of test images you have num_examples: 81 max_evals: 10 }
Lastly, change the validation input path to be where val.record is located. Set the validation label map path to be where labelmap.pbtxt is located (same as for training).
eval_input_reader: { tf_record_input_reader { input_path: "/home/hannah/Software/python/models/research/new val.record" } label_map_path: "/home/hannah/Software/python/models/research/object_detection/training/labelmap.pbtxt" shuffle: false num_readers: 1 }
Train Model
Now that the configuration file is ready to go, it is time to train the model! This step will be time consuming depending on how many steps you choose. When you train your model, the model will save checkpoints to the training directory every once in a while. Once you are done training the model the training directory should look something like this:
We want to use the most recent/updated model. In this case it it is model.ckpt-6053 because I trained 6053 times.
Create Inference Graph
After training the model, we need to create an inference graph. This can be done (regardless of model chosen) with the my_inference_graph.py file on my GitHub, and execute the code below (change the model.ckpt):
!python3 my_inference_graph.py \ --input_type image_tensor \ --pipeline_config_path training/ssd_mobilenet_v2_coco.config \ --trained_checkpoint_prefix training/model.ckpt-6053 \ --output_directory ./inference_graph
Test Model
Once you create the inference graph, your model is ready to be tested. Update the code below to reflect your dataset and where your files/test images are stored.
# Name of the directory containing the object detection module we're using MODEL_NAME = 'inference_graph' #test image name IMAGE_NAME = 'stoponeway.jpg' # Grab path to current working directory CWD_PATH = os.getcwd() # Path to frozen detection graph .pb file, which contains the model that is used # for object detection. PATH_TO_CKPT = os.path.join(CWD_PATH ,MODEL_NAME,'frozen_inference_graph.pb') # Path to label map file PATH_TO_LABELS = os.path.join(CWD_PATH,'training','labelmap.pbtxt') # Path to image #my image is stored in /content/, which is why I used that instead of CWD_PATH PATH_TO_IMAGE = os.path.join('/content/', IMAGE_NAME) # Number of classes the object detector can identify NUM_CLASSES = 39
After making these changes and running the remaining code, you should successfully be able to detect objects in your test image!
Feel free to run this code by downloading the files from my GitHub repository.
References:
https://github.com/italojs/traffic-lights-detector
https://towardsdatascience.com/creating-your-own-object-detector-ad69dda69c85
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10