What is Emotion Recognition and Why is It Important?
One of the main aspects of our robot will be its social interaction with humans. An important component of robot-human interaction is the ability for the robot to understand the emotions of a human, thus being able to alter its response appropriately. Emotion Recognition allows for the robot to estimate the emotional state of the human it is talking to, allowing for the basic understanding of emotion.
DLIB Usage and Installation
DLIB: Library for Machine Learning is an open source software which we utilized to identify certain landmark points on the face. Given an image, DLIB would return an array containing the coordinates of certain features such as a eyes or the corners of the mouth.
To install DLIB, execute the following commands:
- git clone https://github.com/davisking/dlib.git
cmake --build . --config Release
Make sure to add DLIB to your pythonpath by adding the following line to your .bashrc
Then test the module:
python -c “import dlib”
If the module is successfully imported, than you have installed DLIB properly.
Scikit Learn Usage and Installation
Scikit learn is an open source machine learning library. Using this software, we trained our program in recognizing emotion through the use of support vector machines (SVM). Upon being given 2 input arrays, the SVM trains each index of the first array to point to the corresponding index in the second array. Given a database of pictures of people displaying a certain emotion, we can then map certain coordinates of landmarks to certain emotions, slowly training the program step by step to identify each emotion.
You can install Scikit Learn by following the instructions on this link:
How We Implement Emotion Recognition
Although we have the coordinates of each landmark and a software capable of machine learning, we still have a couple steps left before our program can truly recognize emotion.
Using OpenCV, we acquired input video feed from our laptop webcam. Upon receiving each frame, we called and ran DLIB on each frame in order to create an array containing each landmark in an [x,y,x,y,x,y…] format. Following this, we split the array appropriately to create an array of duples. We then created a normalization factor in order to scale each distance to a standard. By looking at points that stayed relatively the same distance away from the center of your face regardless of the user’s distance to the camera, we scaled the distances between points to the standard. This allowed us to create a coordinate graph with an origin located at the center of a face. Each landmark’s x and y distance to the nose point was calculated, allowing us to label every point with a coordinate such as (1,1), or (-4,-3).
From this point, we measured the distances between certain landmarks(feature vectors).
- For happiness, we chose the corners of the mouth, which grow farther apart when the user smiles.
- For surprise, we looked at the distance between the eyebrows and eyes, because people generally raise their eyebrows when surprised.
- For disgust, we compared the distances between the nose and the eyes, trying to capture the wrinkled nose
- For anger, we look at how much the eyebrows narrow, in order to capture an angry face
- For fear, we identify the distance between the top and bottom of the eyes, as wide eyes generally indicate fear
- For sadness, we selected the distance between the inner eyebrow and the eye as an indicator of how drooped their eyebrows are.
We then compared them to a set of pre-generated distances we recorded based on a database of pictures with a clear emotion. Finally, we predict the user’s emotion by using the SVM to best match the use’s feature vectors to the appropriate emotion.
The data set of prerecorded emotions was trained into the SVM using the test.py program. For this, we took the feature vectors of each picture and based on the given emotion, mapped those feature vectors to those emotions.
In order to run our program, follow the steps below:
- Start the ROS master node.
- rosrun Scott’s rospyopenni.py
- rosrun Aurash’s image_converter
- rosrun my DLIBlandmarking.py
Further Reading and Future Plans
Emotion Detection is not perfect. While the method we use does give us a good indication of emotion, accuracy is not 100%. This program is still under development. As of now, we are mainly working towards improving accuracy. In theory, this can be achieved through a couple different ways.
- Improved Data set –
Currently, the database of pictures E-Motion utilizes a relatively small set of faces. By increasing the number and variety of faces, the baseline of the program is improved. Searching for a larger, more diverse database is a priority.
- Improve Feature Vectors –
The landmarks chosen to indicate an emotion were our best estimates but could be improved. It’s possible that we didn’t choose the best indicators of emotion.
- Altering the size of the emotion array –
In order to use SVM in emotion detection, an array of data must be mapped to another array of emotions. The larger the array of data, the more tight the fit of the SVM. Small arrays of data tend to create inaccuracies in the prediction. However, too large arrays cause the fit to be too tight. Currently, the amount of data has to be artificially inflated in order to create a large enough array of data. Adjusting this number may improve accuracy.
- Explore sklearn.svm –
Lastly, the Scikit Learn SVM class contains many parameters and methods. I haven’t fully explored the class and therefore am unsure of what some parameters pertain to, but it may contain a way to improve accuracy by adjusting it’s parameters.
Feel free to contact me regarding any questions, problems, or concerns regarding the Emotion Detection code, dataset, etc.
I can be reached via email at firstname.lastname@example.org