Facial Emotion Recognition in Browser using TensorflowJS
Story behind this one
I had implemented a Facial Emotion Recognition Model in a team during my summer project at BCS, IITK. I wanted to mention it on my website. Initially, I wrote just the details of the project in a frame like stuff. But recently, I thought why not implement the facial emotion recognition on the website for real time use. And this lead me to design the facial emotion recognition on my homepage.
Main Ingredients:
- Saved Keras Model (.h5 format) — The saved model trained on FER2013 dataset. For more details, you can visit this github repo.
- TensorflowJS — The tensorflow version which can be used for training, using deep learning models in your browser or using NodeJS.
- Blazeface — For face detection.
- MediaDevices.getUserMedia() — To access webcam.
Converting Keras Model to TensorflowJS model
For converting Keras model to TensorflowJS model, use
# bash
tensorflowjs_converter --input_format keras \
path/to/my_model.h5 \
path/to/tfjs_target_dir
For more details visit the official documentation.
NOTE: While saving the Keras Model please make sure that you run the python code for training the model only once before saving. For example, if you train the model more than once, it will lead to weight name being
conv_1
if the original weight name wasconv
. If this happens, the tensorflow model will not work causing an error,Error: Provided weight data has no target variable: conv_1/kernel tensorflowjs
. The best way to solve this problem is to retrain the model only once and then save it. If you are working on Colab,Restart runtime
and then run the training code only once and save the model.
For better understanding if this problem arise, visit this Github issue.
Once the model is converted save the files in model
folder in root directory of the site code.
The code for the website: HTML, CSS, Javascript is given at the end. Javascript code has been explained below.
Explaining the JavaScript Code
HTML Element you will need
const video = document.getElementById('webcam');
const instruction = document.getElementById('caminstruct');
const liveView = document.getElementById('liveView');
const enableWebcamButton = document.getElementById('webcamButton');
const instructionText = document.getElementById("camiText");
const webcam_canvas = document.getElementById('webcam_canvas');
video
: HTML video
element to stream the Webcam data.
instruction
: HTML div
element to view instructions alongwith Access Webcam
button.
liveView
: parent div
element.
enableWebcamButton
: button
to ask for webcam permission.
instructionText
: To display instructions.
webcam_canvas
: HTML canvas
element which is used for cropping the image.
Some other important variables:
const cam_ctx = webcam_canvas.getContext('2d'); //Canvas Context
const width = 640; //Stream width
const height = 480; // Stream height
var model = undefined; // Variable to store blazeface model
var model_emotion = undefined; //varible to store emotion recognition model
var control = false; // Variable to control recursion of window.requestAnimationFrame
Flowchart:
- Check if webcam can be accessed
function getUserMediaSupported() {
return (navigator.mediaDevices &&
navigator.mediaDevices.getUserMedia);
}
returns true
if webcam can be accessed and only then we process with next step otherwise, the warning is shown in instructionText
.
2. Load Models
Use blazeface.load()
to load the face detection model. Once the model is loaded, we check if emotion model is loaded or not. We proceed further only when both model has been loaded otherwise wait.
Use tf.loadLayersModel('model/model.json', false)
for loading the emotion model from storage. Here 'model/model.json'
is the location of .json
file of model. This file contains the details of layers. Weights are stored in another file which is called during prediction. We do the same in this case, wait for both model to be loaded.
Once both models are loaded, the enableWebcamButton
is displayed.
3. Ask for webcam Permission
Click on enableWebcamButtom
for camera permission. The click is handled using enableCam
function. Here the constraint = {audio:false, video: {width:640, height:480}}
is used so that only video is streamed and not audio. navigator.mediaDevices.getUserMedia()
is called to ask for webcam permission. If permission is granted then, video
source is set to webcam stream through video.srcObject = stream;
and instruction
element is disappeared using instruction.style.display = “none”
to display the webcam feed in video
element. And then we proceed further.
If any error occurs or permission is denied, then that is handled through errorCallback
which displays the error in instructionText
.
4. Call predictWebcam()
Through video.addEventListener('loadeddata', predictWebcam)
calls predictWebcam()
when data is loaded in video
.
5. Read Video Frame
The video frame is drawn on the canvas through cam_ctx.drawImage(video, 0, 0, width, height)
and then the frame(image) is read from the canvas through const frame= cam_ctx.getImage(0, 0, width, height)
.
Note that this step of reading frame isn’t necessary (alternative has been discussed in next step). However, we need to draw the frame on the canvas for cropping purpose.
6. Face Detection
model.estimateFaces(frame)
is used to detect faces in the image. Here the function estimateFaces
comes predefined with the blazeface
module.
Here, we don’t necessarily need to pass the frame
(read from canvas using getImage()
). The function estimateFaces
can be called with video
element or canvas
element. For example, model.estimateFaces(video)
or model.estimateFaces(webcam_canvas)
(provided video frame as been drawn on the canvas) will also work.
We proceed further only if single face is detected in the image.
7. Crop Face
estimateFaces()
gives the list of all the faces detected through predictions
. It contains the bounding boxes, and landmarks. bounding boxes are rectangular and elongated in length as can be seen below:
Hence, I have used landmarks for cropping. Considering nose positions as center of the face and distance between ear as face width and height.
Cropping is done using cam_ctx.getImage(topLeft_x,topLeft_y, width, height)
. This method of cropping is very effective. However, one can also crop through tensorflowjs function: cropAndResize()
but that needs normalised coordinates which lead to wrong bounding boxes.
The cropped image is stored into frame2
.
8. Image to Tensor
The Keras converted model need tensorflow Tensor to work. Hence the image is converted to Tensor using tf.browser.fromPixels()
. This results in a tensor of dimension [face_width, face_height, 3]
9. Resize Image and convert to B/W
The emotion model needs a black and white image of size [48, 48]
. So the tensor needs to be resized which is done using tensorflow function, tf.image.resizeBilinear()
. And image is converted to Black and White by calculating the mean of RGB colors using mean()
function. The resultant tensor has dimension [48, 48]
. And finally the tensor is converted to float datatype through toFloat()
function.
10. Make Tensor with same shape as required by emotion model
The emotion model needs input of shape: [1, 48, 48, 1]
. But the tensor we currently have has shape: [48, 48]
. The shape is expanded using function expandDims()
at axis = 0
and axis = -1
(last axis). This will result in the desired shape and the tensor is stored in image_tensor
.
11. Emotion recognition
model_emotion.predict()
function is used with image_tensor
as input to predict the output of the model, result
. Then the output is converted to array through predictedValue = result.arraySync()
. However note that predictedValue
is a dictionary and contains the array (actual result) in predictedValue['0']
.
12. Display the predictions
The predictions (result) of emotion model is displayed to user by changing the width of div
element.
document.getElementById("angry").style.width = 100*predictedValue['0'][0]+"%";
And after all this predictWebstream
is again called through window.requestAnimationFrame(predictWebcam);
Use of control
:
The webcam will be used continuously or window.requestAnimationFrame(predictWebcam);
will call predictWebcam
recursively even when the user has scrolled to other position of web page. To prevent this a global variable control
is used. requestAnimationFrame()
is called only if control === true
. And so whenever, a user scrolls to other places, control
is changed to false
and camera permissions are resetted through resetEverything()
function, so that user can again use the emotion recognition in the same way when he scrolls back to it.
function resetEverything(){
control = false;
console.log("Stopping Everything.")
const stream = video.srcObject;
const tracks = stream.getTracks();tracks.forEach(function(track) {
track.stop();
});
Designing the Website
HTML Structure
CSS File
JavaScript Code
And in this way the entire code and structuring for the real time emotion recognition on a browser was done.
Enjoy !!