Computer Vision - Face Detection

COMPUTER VISION

What is it...?

Computer Vision is mimicking the abilities of human vision by electronically perceiving and understanding an image.

It is a broad term and includes a lot of domains like Gesture Recognition, Optical Character Recognition, Face detection and a lot more.

In this article, we will be focussing on face detection and try to understand the key ideas that allow us to detect human faces in real time.


It all begins with a pixel!

As shown below, a digital representation of an image comprises a large number of pixels depending on the resolution of the image.

Each pixel represents the smallest unit containing the information about how the image will be rendered on a digital device.

Each pixel can be represented by 4 bytes ( 1 byte each for red, green, blue and alpha ).


What Face Detection is...?

It is essentially processing a raw image to :

  • Detect the presence of human faces, if any.
  • Extract info about the coordinates and size of those human faces.

How do we do it...?

That's what this article is all about.

Before we dive into the implementation details, let’s discuss the framework that most of the modern face detection systems use... the Viola-Jones Object Detection Framework.


VIOLA JONES OBJECT DETECTION FRAMEWORK

The training is slow, but the detection is fast.

This framework introduced three key ideas:

  1. Integral image representation
  2. Construction of classifier by Adaptive Boosting
  3. Combining successively more complex classifiers in a cascade structure

Let's see what each of these are...


INTEGRAL IMAGE REPRESENTATION

Features instead of Pixels

This framework focuses on using features rather than pixel for computing.

What is a feature...?

Let's see an example to understand what a feature is...

As explained here, all human faces share some similar properties:

  • The eye region is darker than the upper cheeks.

  • The nose bridge region is brighter than the eyes.

Using features has an obvious benefit in facilitating training data because they can encode critical domain knowledge.

The way these features are used involves computation of difference between the sum of intensities of the pixels in light and dark regions.

Value = Σ (pixels in black area) - Σ (pixels in white area)

The above step is a key operation as it is repeated a number of times with regions of varying sizes and coordinates. Therefore, it certainly needs to be efficient to achieve overall efficiency.

This is where the idea of Integral image representation comes handy.

What is an Integral Image representation...?

An intermediate representation which allows us to quickly compute intensities of an area independently of the size of the region with a computational complexity of O(1) instead of O(n)

As explained here

  • The value of the integral image at a point is the sum of all the pixels above and to the left.
  • The sum of the pixels within a rectangle can be computed with four array references.


CONSTRUCTION OF CLASSIFIER BY ADAPTIVE BOOSTING

What is a classifier...?

As explained here, classifier is a function that takes the values of various features in an example (image to be tested in our case) and predicts the class that that example belongs to (whether it contain human face or not, in our case).

A classifier is backed by some training data ( which is usually the output of some machine learning algorithm ) and its efficiency and accuracy depends on the quality of the training data.

  • A classifier is called a weak classifier if it cannot be used alone to predict the class to which an example belongs to. It is generally computationally economical.
  • On the contrary, a classifier is called a strong classifier if it can be used alone to classify the example. It is generally computation intensive.

There can be a large number of rectangle features associated with each image sub-window.... in fact, they can be far larger than the number of pixels.

As Viola and Jones hypothesized, a relatively smaller number of features could be used to form an effective classifier.

The big question - Which features to select?

Viola and Jones used variant of AdaBoost to select the features as well as to train the classifier

What is AdaBoost... ?

Adaptive Boosting a.k.a AdaBoost is a learning algorithm which is used to boost the performance of a simple classifier.

Below is a conceptual overview of how adaptive boosting works:

  • The training data ( a collection of positive and negative samples i.e. the images with and without a human face ) is fed into a weak classifier.
  • After the first round of learning, the weights are normalized for the examples ( training data images ) to emphasize on those which were incorrectly classified by the previous classifier.
  • This process is repeated until we get the required accuracy of the classifier.
  • The final result is a strong classifier which is a linear combination of a number of weighted weak classifiers followed by a threshold.

Thankfully, for our purposes (human face detection), we do not need to train our own classifier.

Instead, we will be using the classifier training data provided by an open source OpenCV library.


THE ATTENTIONAL CASCADE

The key insight is that smaller ( and therefore more efficient ) boosted classifiers can be constructed which reject many of the negative sub-windows while detecting almost all positive instances.

The idea is to use simpler classifiers to reject a majority of negative sub-windows thereby focussing the attention to only the promising regions of the image.

The end result is an overall efficiency due to the reduction in input sub-windows for the computationally expensive classifiers.


SEEING IT IN ACTION...

Here is a video that visualizes the detection process of OpenCV's face detector.

The algorithm uses the Viola-Jones method :

  • An integral image is calculated.
  • Some calculations are done on all the areas defined by the black and white rectangles to analyze the differences between the dark and light regions of a face.
  • The sub-window (in red) is scanned across the image at various scales to detect if there is a potential face within the window.
  • If not, it continues scanning.
  • If it passes all stages in the cascade file, it is marked with a red rectangle.
  • In the post-processing stage, all the potential faces are checked for overlaps.

HOW DO I USE FACE-DETECTION IN MY WEB PROJECT...?

We have survived the theory, let's come to implementation part.

Depending on the nature of the requirement, we can go with one of the below options:

I. IMPLEMENT YOUR OWN CLOUD

The most flexible and powerful way would be to manage your own cloud solution.

  • You get to control the training of the classifiers and every aspect of it.
  • You may use some open source libraries like OpenCV which has implemented more than 2500 machine learning algorithms implemented and has C++, C, Python, Java and MATLAB interfaces... which mean you can easily interface it with node server via node-opencv
  • However, this approach has its own maintenance overheads which become critical once you scale your app.
  • Another potential drawback would be that the image needs to be transmitted over the network to the server. In simple scenarios e.g. tagging of friends etc.. we might go with implementing a client side solution only.

II. USE A THIRD PARTY LIBRARY

  • You may easily get face detection up and running by delegating all the maintenance and setup chores to some third party service providers like Google Cloud Vison API or the one provided by Microsoft.
  • These providers already have an exhaustive training data backing up their classifiers. They also support advanced features such as Explicit Content Detection.
  • These services are usually paid.
  • This solution is highly recommended till the pay as you use amounts balances out the resources to set up and train your own classifier which usually takes some time to fine tune.
  • The common potential drawback of the image transmitted over network remains.

III. USE A CLIENT SIDE JS LIBRARY

  • The computation is delegated to the client but the image is not transmitted over the network to detect faces.
  • This solution is particularly valuable when we need to support simple requirements like tagging of friends, or focussing on an area of the image with the face etc.
  • One of the popular libraries to support this is trackingjs
  • Simplest to implement.

USING TRACKINGJS TO IMPLEMENT FACE DETECTION

You can install this library by simply typing

npm install tracking

Once installed, this library has the following files in the build directory

  • The files in the data folder contain the classifiers. You need to include the appropriate classifier based on what you need to detect.
  • So basically, we need to include the tracking library and one ( or more depending on use case ) of the classifiers and we are all set.

The classifier data looks like as shown below.

Below is a code snippet from the examples provided on the trackingjs site.

Using this library involves the following steps:

  • Require trackingjs and appropriate classifier ( from the data directory )
  • Instantiate a tracking.ObjectTracker with the name of the classifier to use.
  • Set appropriate callback on the tracker for the track event.
  • Trigger the tracker by simply invoking tracking.track with the element ( can be image / video / canvas ) and the tracker as the arguments.
    window.onload = function() {
      var img = document.getElementById('img');
      // We need to detect face... instantiate a new ObjectTracker with 'face' as an argument
      // IMPORTANT: We need to include appropriate classifier.
      var tracker = new tracking.ObjectTracker('face');

      // Set up appropriate listeners for track event
      tracker.on('track', function(event) {
        event.data.forEach(function(rect) {
          // Do something with the faces identified in the image.
          plotRectangle(rect.x, rect.y, rect.width, rect.height);
        });
      });

      // Invoke the track
      tracking.track(img, tracker);
    }

So that was a beginner's introduction to real-time face detection on the web.

Hope you found it interesting. Thanks for reading.


ActionCable and WebSockets - Part 1(The Basics)

One of the best thing about Rails is the ease with which it allows you to develop web apps quickly by providing some sensible conventions.

And with Rails 5, it allows you to make real-time web apps in a breeze

Introducing ActionCable.. a real time framework for communication over websockets.

But before we proceed any further, let's spend some time discussing how we got to the action cable.. or particularly, the web sockets.

When the web started becoming more dynamic with ajax and js advances, we, as developers, started finding ways to make our applications more real-time.

POLLING

One of the earliest solutions that came up was Polling.

The client send requests over HTTP at regular intervals... a simple and robust solution to implement !!

The interval plays a critical role here:

In order to give any real-time experience to the client, the polling interval needs to be small enough to make the user effectively believe that app is almost live with some network latency may be.

But there were problems :

If we try to make it more real-time by reducing polling interval, our servers didn't like it mainly because of the way the polling.. or rather HTTP works.

Here's a sample implementation from IBM

polling

HTTP is a stateless protocol. The server and client are aware of each other only during the current request. Afterward, both of them forget about each other.

Which means... ?

For each request, there is additional data in the form of headers which gets transmitted through the network and therefore, the communication was inefficient.

As per this google's whitepaper, typical headers size of 700-800 bytes is common.

Assuming 800 bytes,

For 1k clients polling every second, network throughput = 800 * 1000 = 0.763 MBps... For 10k clients... 7.63 MBps.


LONG POLLING

This was more like looping via setTimeout instead of doing it via setInterval.

The server receives the request and responds only when it has the response available.

Here's a sample implementation from IBM

long-polling

But again, that was not the solution either:

It quickly falls apart once the data begin to change frequently... that would be more like regular polling itself.


SERVER SENT EVENTS

Server-sent event support was added to Rails in 4.0, through ActionController::Live.
Here's a nice intro from tenderlove

A persistent unidirectional connection is made between the server and the client. The client subscribes to the server events via onmessage callbacks.

Here's a nice write-up on server sent events

server-sent-events

That seemed promising for a while but the world stopped for IE users. As shown here, No version of IE implements EventSource interface which is required for server sent events. Whoosh! End of story.


WEB SOCKETS

Websockets work by maintaining a persistent bi-directional channel.

After the initial handshake, the HTTP connection is upgraded to a WebSocket connection. The data frames can then pass to and fro between client and the server until one of the sides closes it.

More info here

Here's a nice write up on websockets vs REST

web sockets


.... Coming back to action cable..

Let's get our head around some common terms that we would be using.

  • Consumer - The client of a WebSocket connection is called the consumer.
  • Channel - A channel encapsulates a logical unit of work, similar to what a controller does in a regular MVC setup.
  • Subscriber - Each consumer can, in turn, subscribe (and therefore, will become a subscriber) to multiple cable channels.

As seen in the conceptual diagrams below from here

action cable conceptual diagram

** This is just a bird-eye view. We will be covering the details as we proceed.

  • An action cable server can run as a separate server or can be mounted on the Rails App server itself.
  • The action cable server need not be a threaded server only. More info here (socket hijacking)
  • A client (Browser) interacts with rails app over HTTP / HTTPS.
  • The connection between client and ActionCable server is upgraded from HTTP to WS / WSS ( WebSocket Protocol ).
  • The Rails App enqueues the publishing in an event queue ( default implementation uses Redis for this)
  • The action cable server processes the queue.

You can follow below link to read the next part which is focussed on Implementing a sample Chat application using ActionCable.

ActionCable and WebSockets – Part 2 The Implementation


ActionCable and WebSockets - Part 2(The Implementation)

You can visit the Part 1 here.

THE RAILS WAY...

When you create a new rails 5 application, rails generates some files for you:

new rails app

For any implementation of a websocket connection, we need both the client and the server parts of the code.

CLIENT SIDE

For the client side Rails provides app/assets/javascripts/cable.js which loads action_cable js and all files in channels directory.

action_cable.js

On page load, a consumer is created and exposed via App.cable.

If we would go a little bit into the client side code for action_cable, we would find that rails does all the heavy loading like instantiating subscriptions and connections...monitoring connections etc. pretty cool.. right ?

function Consumer(url) {
            this.url = url;
            this.subscriptions = new ActionCable.Subscriptions(this);
            this.connection = new ActionCable.Connection(this);
        }

For most practical purposes, you would not be modifying anything in this file.

SERVER SIDE

Actually, rails generates an empty Connection class.
However, for almost all practical applications, we would need some sort of authorization on the incoming connections.

Here's a good tutorial on ActionCable devise authentication
# SAMPLE IMPLEMENTATION FOR DEMO PURPOSE

module ApplicationCable
  class Connection < ActionCable::Connection::Base
    identified_by :current_user

    def connect
      self.current_user = find_verified_user
      logger.add_tags 'ActionCable', current_user.name
    end

    protected
      def find_verified_user
        # Assuming a successful authentication sets a signed cookie with the `user_id`
        if verified_user = User.find_by(id: cookies.signed[:user_id])
          verified_user
        else
          # Raises ActionCable::Connection::Authorization::UnauthorizedError
          reject_unauthorized_connection
        end
      end
  end
end
  • Here, identified_by is a connection identifier.
    Therefore, we can use it to retrieve, and thereby disconnect, all open connections for a given user.
  • If you implement connect method, the same will be called while handling a websocket open request.
  • You can call reject_unauthorized_access if you don't want the current_user to connect

The app/channels/application_cable/channel.rb contains your ApplicationCable::Channel where you put shared logic for your channels.
It's similar to ApplicationController for controllers.


All that came right out of the box.

Let's go ahead and implement a common use case... The Chat Application.

For a chat application, we would have these three basic requirements:

  • We should be able to subscribe to a channel,
  • Publish something on that channel
  • Receive the published message on the subscribed channel.

THE channel GENERATOR

Rails 5 provides a new channel generator which creates two new files.
One ruby file and one js file.

This generator is similar to the familiar controller generator. You specify the name of the channel (room) and one or more public methods which can be invoked as Remote Procedures ( we'll come to it in a while )

channel-generator

Let's see what we have in each of these files for this particular example..

CLIENT SIDE JS CODE

# app/assets/javascripts/channels/room.coffee

App.room = App.cable.subscriptions.create "RoomChannel",
  connected: ->
    # Called when the subscription is ready for use on the server

  disconnected: ->
    # Called when the subscription has been terminated by the server

  received: (data) ->
    # Called when there's incoming data on the websocket for this channel

  speak: ->
    @perform 'speak'
  • Rails created a subscription for the RoomChannel.

Please note that the name is exactly same as the name of the class that we have for the channel.

  • Then it provides empty implementations for three callbacks: connected, disconnected and received.
  • Then we have a speak method which basically invokes the perform method with the string speak as its argument. again, that name is important.

We'll come to it later that why this naming is important. But the good thing is rails did all that for us and we don't need to worry about it unless we override the defaults.

SERVER SIDE RUBY CODE

# app/channels/room_channel.rb
class RoomChannel < ApplicationCable::Channel
  def subscribed
    # stream_from "some_channel"
  end

  def unsubscribed
    # Any cleanup needed when channel is unsubscribed
  end

  def speak
  end
end

For the channel class in ruby,

  • We have empty implementations for two callbacks, subscribed and unsubscribed.
  • Also, we have an empty implementation for our speak method.

I think you would appreciate, that we have all this structure ready and we have only typed one or two generators so far.

DEMO FROM DHH

Here's the famous action cable demo from DHH, we'll use snippets from there to understand what each portion of code does.

SERVER SIDE CODE

# app/channels/room_channel.rb
class RoomChannel < ApplicationCable::Channel
  def subscribed
    stream_from "room_channel"
  end

  def speak(data)
    Message.create! content: data['message']
  end
end
  • The subscribed callback is called whenever a new connection is opened i.e. when you open a new tab.
    There's a corresponding unsubscribed callback as well which is invoked when a connection is closed.
  • The stream_from method is called with the name of the broadcasting pubsub queue ('room_channel' in this case). This name is important and should be same as the one on which the broadcast is invoked.
  • The speak method would be invoked as a remote procedure from the client. Here, we are just creating a new message with the passed args. You may broadcast from here itself if you want to.

However, for all practical applications, we might have a large number of subscribers and it would make sense to handle it asynchronously in a delayed job.

# app/models/message.rb
class Message < ApplicationRecord
  # That's a rails 5 equivalent of after_commit, on: :create
  after_create_commit :broadcast_self

  private
    def broadcast_self
      MessageBroadcastJob.perform_later(self)
    end
end

And here's the code for the message broadcast job.

class MessageBroadcastJob < ApplicationJob
  queue_as :default

  def perform(message)
    # You may render JSON or HTML itself if you want to reuse your views.
    ActionCable.server.broadcast 'room_channel', message: render_message(message)
  end

  private
  def render_message(message)
    # RAILS5_THING: Controller can render partial without being in scope of the controller.
    ApplicationController.renderer
                      .render(partial: 'messages/message', locals: { message: message })
  end
end

Notice how we invoke the broadcast on a given named pubsub queue ( the one we passed as an argument to stream_from) with the hash that we want to broadcast.

CLIENT SIDE CODE

Here's a JS equivalent of the CoffeeScript that was used in the demo.

(function() {
  // Subscribe to a channel (RoomChannel)
  // And specify event handlers for various events and any custom actions(speak).
  App.room = App.cable.subscriptions.create("RoomChannel", {
    received: function(data) {
      // Do something when some data is published on the channel
      $('#messages').append(data.message)
    },
    speak: function(message) {
      // We link the client-side `speak` method to `RoomChannel#speak(data)`.
      // This is possible because the server-side channel instance will automatically
      // expose the public methods declared on the class (minus the callbacks),
      // so that these can be reached as remote procedure calls
      // via a subscription's `perform` method.
      return this.perform('speak', { message: message });
    }
  });

  $(document).on('keypress', '[data-behavior="room_speaker"]', function(event) {
    if (event.keyCode === 13) {
      // Respond to some trigger based on which you want to
      // Invoke the speak method on the subscription created above.
      App.room.speak(event.target.value);
      event.target.value = ''
      event.preventDefault();
    }
  });

}).call(this);

I think this snippet explains why the exact string 'speak' was important.

That's because the server side ruby instance has exposed this method and this can be invoked as a Remote procedural call over the WebSocket connection.


RUNNING THIS DEMO TO EXPLORE LOGS

Here's the animation showing the working of the demo app.

demo-animation

WHEN A CLIENT CONNECTS

Initial handshake and upgrade of HTTP to WebSocket

protocol-upgrade

The client subscribes to a channel

client-subscribe

The server logs also show the initial HTTP upgrade along with the subscription to the channel.

subscribe-server-logs

WHEN A CLIENT MAKES AN RPC AND MESSAGE BROADCASTS.

The client invokes the channel's speak method which in turn results in a broadcast along the channel as seen in the returned frame.

RPC-client

The server logs show the invocation of RoomChannel#speak followed by its persistence in db and broadcast along the channel.

server-broadcast


SECURING AGAINST CROSS SITE WEBSOCKET HIJACKING

The websockets support cross domain requests...which means it is also vulnerable to the security threats which result due to this behavior.

Here's more info on this topic.

Rails does all the heavy lifting for you... and all you need to do is some configurations and you are good to go...

  • Action cable only allows requests from origins configured by action_cable.allowed_request_origins in your config file.
  • In case of development, this defaults to "http://localhost:3000"
  • You can configure to turn off this check using disable_request_forgery_protection in your config file.
Rails.application.config
                 .action_cable
                 .allowed_request_origins = ['http://rubyonrails.com', /http:\/\/ruby.*/]

# You can disable this check by :
# Rails.application.config.action_cable.disable_request_forgery_protection = true

Message queues via redis... but I don't need them in development...

Yes... the rails community also thought so.

And if you see in the config/cable.yml, you'll see the adapter is async instead of redis.

which means ?

In your development mode, the message queue is maintained in memory and will be lost once the server is shut down (which is the desired behavior in most development scenarios).

However, if you want, you may use a redis server as well by uncommenting the redis gem in the gem file and configuring config/cable.yml

development:
  adapter: redis
  url: redis://localhost:6379/1

In production, Action Cable uses Redis by default to administer connections, channels, and sending/receiving messages over the WebSocket.


So that was a brief intro to the ActionCable which fills in the gap for real-time features in Rails.

Hope you found it interesting. Thanks for reading.