Facial Emotion Recognition in Browser using TensorflowJS

6 min readSep 21, 2021

Story behind this one

I had implemented a Facial Emotion Recognition Model in a team during my summer project at BCS, IITK. I wanted to mention it on my website. Initially, I wrote just the details of the project in a frame like stuff. But recently, I thought why not implement the facial emotion recognition on the website for real time use. And this lead me to design the facial emotion recognition on my homepage.

Facial Emotion Recognition in the Homepage.

Main Ingredients:

Saved Keras Model (.h5 format) — The saved model trained on FER2013 dataset. For more details, you can visit this github repo.
TensorflowJS — The tensorflow version which can be used for training, using deep learning models in your browser or using NodeJS.
Blazeface — For face detection.
MediaDevices.getUserMedia() — To access webcam.

Converting Keras Model to TensorflowJS model

For converting Keras model to TensorflowJS model, use

# bash

tensorflowjs_converter --input_format keras \
                       path/to/my_model.h5 \
                       path/to/tfjs_target_dir

For more details visit the official documentation.

NOTE: While saving the Keras Model please make sure that you run the python code for training the model only once before saving. For example, if you train the model more than once, it will lead to weight name being conv_1 if the original weight name was conv. If this happens, the tensorflow model will not work causing an error, Error: Provided weight data has no target variable: conv_1/kernel tensorflowjs . The best way to solve this problem is to retrain the model only once and then save it. If you are working on Colab, Restart runtime and then run the training code only once and save the model.
For better understanding if this problem arise, visit this Github issue.

Once the model is converted save the files in model folder in root directory of the site code.

The code for the website: HTML, CSS, Javascript is given at the end. Javascript code has been explained below.

Explaining the JavaScript Code

HTML Element you will need

const video = document.getElementById('webcam');
const instruction = document.getElementById('caminstruct');
const liveView = document.getElementById('liveView');
const enableWebcamButton = document.getElementById('webcamButton');
const instructionText = document.getElementById("camiText");
const webcam_canvas = document.getElementById('webcam_canvas');

video : HTML video element to stream the Webcam data.

instruction : HTML div element to view instructions alongwith Access Webcam button.

liveView : parent div element.

enableWebcamButton : button to ask for webcam permission.

instructionText : To display instructions.

webcam_canvas : HTML canvas element which is used for cropping the image.

Some other important variables:

const cam_ctx = webcam_canvas.getContext('2d');  //Canvas Context
const width = 640;   //Stream width
const height = 480;  // Stream height
var model = undefined;  // Variable to store blazeface model
var model_emotion = undefined; //varible to store emotion recognition model
var control = false; // Variable to control recursion of window.requestAnimationFrame

Flowchart:

Check if webcam can be accessed

function getUserMediaSupported() {
  return (navigator.mediaDevices &&
    navigator.mediaDevices.getUserMedia);
}

returns true if webcam can be accessed and only then we process with next step otherwise, the warning is shown in instructionText .

2. Load Models

Use blazeface.load() to load the face detection model. Once the model is loaded, we check if emotion model is loaded or not. We proceed further only when both model has been loaded otherwise wait.

Use tf.loadLayersModel('model/model.json', false) for loading the emotion model from storage. Here 'model/model.json' is the location of .json file of model. This file contains the details of layers. Weights are stored in another file which is called during prediction. We do the same in this case, wait for both model to be loaded.

Once both models are loaded, the enableWebcamButton is displayed.

3. Ask for webcam Permission

Click on enableWebcamButtom for camera permission. The click is handled using enableCam function. Here the constraint = {audio:false, video: {width:640, height:480}} is used so that only video is streamed and not audio. navigator.mediaDevices.getUserMedia() is called to ask for webcam permission. If permission is granted then, video source is set to webcam stream through video.srcObject = stream; and instruction element is disappeared using instruction.style.display = “none”to display the webcam feed in video element. And then we proceed further.

If any error occurs or permission is denied, then that is handled through errorCallback which displays the error in instructionText .

4. Call predictWebcam()

Through video.addEventListener('loadeddata', predictWebcam) calls predictWebcam() when data is loaded in video .

5. Read Video Frame

The video frame is drawn on the canvas through cam_ctx.drawImage(video, 0, 0, width, height) and then the frame(image) is read from the canvas through const frame= cam_ctx.getImage(0, 0, width, height) .

Note that this step of reading frame isn’t necessary (alternative has been discussed in next step). However, we need to draw the frame on the canvas for cropping purpose.

6. Face Detection

model.estimateFaces(frame) is used to detect faces in the image. Here the function estimateFaces comes predefined with the blazeface module.

Here, we don’t necessarily need to pass the frame (read from canvas using getImage()). The function estimateFaces can be called with video element or canvas element. For example, model.estimateFaces(video) or model.estimateFaces(webcam_canvas) (provided video frame as been drawn on the canvas) will also work.

We proceed further only if single face is detected in the image.

7. Crop Face

estimateFaces() gives the list of all the faces detected through predictions . It contains the bounding boxes, and landmarks. bounding boxes are rectangular and elongated in length as can be seen below:

Hence, I have used landmarks for cropping. Considering nose positions as center of the face and distance between ear as face width and height.

Cropping is done using cam_ctx.getImage(topLeft_x,topLeft_y, width, height). This method of cropping is very effective. However, one can also crop through tensorflowjs function: cropAndResize() but that needs normalised coordinates which lead to wrong bounding boxes.

The cropped image is stored into frame2 .

8. Image to Tensor

The Keras converted model need tensorflow Tensor to work. Hence the image is converted to Tensor using tf.browser.fromPixels() . This results in a tensor of dimension [face_width, face_height, 3]

9. Resize Image and convert to B/W

The emotion model needs a black and white image of size [48, 48] . So the tensor needs to be resized which is done using tensorflow function, tf.image.resizeBilinear() . And image is converted to Black and White by calculating the mean of RGB colors using mean()function. The resultant tensor has dimension [48, 48] . And finally the tensor is converted to float datatype through toFloat() function.

10. Make Tensor with same shape as required by emotion model

The emotion model needs input of shape: [1, 48, 48, 1] . But the tensor we currently have has shape: [48, 48] . The shape is expanded using function expandDims() at axis = 0 and axis = -1 (last axis). This will result in the desired shape and the tensor is stored in image_tensor .

11. Emotion recognition

model_emotion.predict() function is used with image_tensor as input to predict the output of the model, result. Then the output is converted to array through predictedValue = result.arraySync() . However note that predictedValue is a dictionary and contains the array (actual result) in predictedValue['0'] .

12. Display the predictions

The predictions (result) of emotion model is displayed to user by changing the width of div element.

document.getElementById("angry").style.width =  100*predictedValue['0'][0]+"%";

And after all this predictWebstream is again called through window.requestAnimationFrame(predictWebcam);

Use of control :

The webcam will be used continuously or window.requestAnimationFrame(predictWebcam); will call predictWebcam recursively even when the user has scrolled to other position of web page. To prevent this a global variable control is used. requestAnimationFrame() is called only if control === true . And so whenever, a user scrolls to other places, control is changed to false and camera permissions are resetted through resetEverything() function, so that user can again use the emotion recognition in the same way when he scrolls back to it.

function resetEverything(){
 control = false;
 console.log("Stopping Everything.")
 const stream = video.srcObject;
  const tracks = stream.getTracks();tracks.forEach(function(track) {
    track.stop();
  });

Designing the Website

HTML Structure

HTML Code

CSS File

CSS Code

JavaScript Code

JavaScript Code

And in this way the entire code and structuring for the real time emotion recognition on a browser was done.

Enjoy !!