Ray Casting Algorithm

If you’ve been struggling with the problem of determining whether a point lies inside or outside a polygon, perhaps in google maps, read on. This post is specific to Android and Node.js but the algorithm can easily be implemented in any other language/platform as the algorithm itself is a piece of cake once understood.

Read more

Computer Vision - Face Detection


What is it...?

Computer Vision is mimicking the abilities of human vision by electronically perceiving and understanding an image.

It is a broad term and includes a lot of domains like Gesture Recognition, Optical Character Recognition, Face detection and a lot more.

In this article, we will be focussing on face detection and try to understand the key ideas that allow us to detect human faces in real time.

It all begins with a pixel!

As shown below, a digital representation of an image comprises a large number of pixels depending on the resolution of the image.

Each pixel represents the smallest unit containing the information about how the image will be rendered on a digital device.

Each pixel can be represented by 4 bytes ( 1 byte each for red, green, blue and alpha ).

What Face Detection is...?

It is essentially processing a raw image to :

  • Detect the presence of human faces, if any.
  • Extract info about the coordinates and size of those human faces.

How do we do it...?

That's what this article is all about.

Before we dive into the implementation details, let’s discuss the framework that most of the modern face detection systems use... the Viola-Jones Object Detection Framework.


The training is slow, but the detection is fast.

This framework introduced three key ideas:

  1. Integral image representation
  2. Construction of classifier by Adaptive Boosting
  3. Combining successively more complex classifiers in a cascade structure

Let's see what each of these are...


Features instead of Pixels

This framework focuses on using features rather than pixel for computing.

What is a feature...?

Let's see an example to understand what a feature is...

As explained here, all human faces share some similar properties:

  • The eye region is darker than the upper cheeks.

  • The nose bridge region is brighter than the eyes.

Using features has an obvious benefit in facilitating training data because they can encode critical domain knowledge.

The way these features are used involves computation of difference between the sum of intensities of the pixels in light and dark regions.

Value = Σ (pixels in black area) - Σ (pixels in white area)

The above step is a key operation as it is repeated a number of times with regions of varying sizes and coordinates. Therefore, it certainly needs to be efficient to achieve overall efficiency.

This is where the idea of Integral image representation comes handy.

What is an Integral Image representation...?

An intermediate representation which allows us to quickly compute intensities of an area independently of the size of the region with a computational complexity of O(1) instead of O(n)

As explained here

  • The value of the integral image at a point is the sum of all the pixels above and to the left.
  • The sum of the pixels within a rectangle can be computed with four array references.


What is a classifier...?

As explained here, classifier is a function that takes the values of various features in an example (image to be tested in our case) and predicts the class that that example belongs to (whether it contain human face or not, in our case).

A classifier is backed by some training data ( which is usually the output of some machine learning algorithm ) and its efficiency and accuracy depends on the quality of the training data.

  • A classifier is called a weak classifier if it cannot be used alone to predict the class to which an example belongs to. It is generally computationally economical.
  • On the contrary, a classifier is called a strong classifier if it can be used alone to classify the example. It is generally computation intensive.

There can be a large number of rectangle features associated with each image sub-window.... in fact, they can be far larger than the number of pixels.

As Viola and Jones hypothesized, a relatively smaller number of features could be used to form an effective classifier.

The big question - Which features to select?

Viola and Jones used variant of AdaBoost to select the features as well as to train the classifier

What is AdaBoost... ?

Adaptive Boosting a.k.a AdaBoost is a learning algorithm which is used to boost the performance of a simple classifier.

Below is a conceptual overview of how adaptive boosting works:

  • The training data ( a collection of positive and negative samples i.e. the images with and without a human face ) is fed into a weak classifier.
  • After the first round of learning, the weights are normalized for the examples ( training data images ) to emphasize on those which were incorrectly classified by the previous classifier.
  • This process is repeated until we get the required accuracy of the classifier.
  • The final result is a strong classifier which is a linear combination of a number of weighted weak classifiers followed by a threshold.

Thankfully, for our purposes (human face detection), we do not need to train our own classifier.

Instead, we will be using the classifier training data provided by an open source OpenCV library.


The key insight is that smaller ( and therefore more efficient ) boosted classifiers can be constructed which reject many of the negative sub-windows while detecting almost all positive instances.

The idea is to use simpler classifiers to reject a majority of negative sub-windows thereby focussing the attention to only the promising regions of the image.

The end result is an overall efficiency due to the reduction in input sub-windows for the computationally expensive classifiers.


Here is a video that visualizes the detection process of OpenCV's face detector.

The algorithm uses the Viola-Jones method :

  • An integral image is calculated.
  • Some calculations are done on all the areas defined by the black and white rectangles to analyze the differences between the dark and light regions of a face.
  • The sub-window (in red) is scanned across the image at various scales to detect if there is a potential face within the window.
  • If not, it continues scanning.
  • If it passes all stages in the cascade file, it is marked with a red rectangle.
  • In the post-processing stage, all the potential faces are checked for overlaps.


We have survived the theory, let's come to implementation part.

Depending on the nature of the requirement, we can go with one of the below options:


The most flexible and powerful way would be to manage your own cloud solution.

  • You get to control the training of the classifiers and every aspect of it.
  • You may use some open source libraries like OpenCV which has implemented more than 2500 machine learning algorithms implemented and has C++, C, Python, Java and MATLAB interfaces... which mean you can easily interface it with node server via node-opencv
  • However, this approach has its own maintenance overheads which become critical once you scale your app.
  • Another potential drawback would be that the image needs to be transmitted over the network to the server. In simple scenarios e.g. tagging of friends etc.. we might go with implementing a client side solution only.


  • You may easily get face detection up and running by delegating all the maintenance and setup chores to some third party service providers like Google Cloud Vison API or the one provided by Microsoft.
  • These providers already have an exhaustive training data backing up their classifiers. They also support advanced features such as Explicit Content Detection.
  • These services are usually paid.
  • This solution is highly recommended till the pay as you use amounts balances out the resources to set up and train your own classifier which usually takes some time to fine tune.
  • The common potential drawback of the image transmitted over network remains.


  • The computation is delegated to the client but the image is not transmitted over the network to detect faces.
  • This solution is particularly valuable when we need to support simple requirements like tagging of friends, or focussing on an area of the image with the face etc.
  • One of the popular libraries to support this is trackingjs
  • Simplest to implement.


You can install this library by simply typing

npm install tracking

Once installed, this library has the following files in the build directory

  • The files in the data folder contain the classifiers. You need to include the appropriate classifier based on what you need to detect.
  • So basically, we need to include the tracking library and one ( or more depending on use case ) of the classifiers and we are all set.

The classifier data looks like as shown below.

Below is a code snippet from the examples provided on the trackingjs site.

Using this library involves the following steps:

  • Require trackingjs and appropriate classifier ( from the data directory )
  • Instantiate a tracking.ObjectTracker with the name of the classifier to use.
  • Set appropriate callback on the tracker for the track event.
  • Trigger the tracker by simply invoking tracking.track with the element ( can be image / video / canvas ) and the tracker as the arguments.
    window.onload = function() {
      var img = document.getElementById('img');
      // We need to detect face... instantiate a new ObjectTracker with 'face' as an argument
      // IMPORTANT: We need to include appropriate classifier.
      var tracker = new tracking.ObjectTracker('face');

      // Set up appropriate listeners for track event
      tracker.on('track', function(event) {
        event.data.forEach(function(rect) {
          // Do something with the faces identified in the image.
          plotRectangle(rect.x, rect.y, rect.width, rect.height);

      // Invoke the track
      tracking.track(img, tracker);

So that was a beginner's introduction to real-time face detection on the web.

Hope you found it interesting. Thanks for reading.

Voronoi: A solution to proximity based problems

In computational geometry, Voronoi diagram is a partitioning of a plane with n points into n convex polygons such that each polygon contains exactly one generating point and every point in a given polygon is closer to its generating point than to any other. A Voronoi diagram is sometimes also known as a Dirichlet tessellation. GeoVoronoi is an extension of Voronoi optimized for geo locations.


To dive deeper into Voronoi geometry, read chapter seven of Computational Geometry: Algorithms and Applications. Or maybe play with this cool demo.

If you're working on any proximity-based problem, Voronoi diagram can be a lot of help. In layman's language, Voronoi diagram divides a 2-D plane with n sites into n zones, one for each sites, such that any point inside the zone is nearer to the site that owns the zone as compared to any other sites. In the above Voronoi diagram, 'a' is a site and 'v(a)' is its zone.

In my particular use case, I've been creating a proximity based mobile app that needs to know which site is nearest to the device during each location update. The zone information provided by Voronoi diagram saves me from polling among distances from each site upon each location update. Now I just need to check which zone the device is currently inside of and would poll among the zones only if the device moves out of the current zone. This optimizes the process multiple folds.

I'm extending this Javascript implementation of Voronoi diagram. Although it works perfectly for a normal X-Y plane, it wouldn't quite fit in for geo locations. So, I've written a geo wrapper over it, namely, GeoVoronoi which basically does the following patch-work:

  1. Longitudes don't stay linear near the international date line. They discontinue and go like '..., 178, 179, -179, -178,...'. GeoVoronoi would take care of this.
  2. Auto-boxing super area of sites with optimal padding. (Bbox in code).
  3. Keeping latitude-longitudes sanitized in terms of range and decimal places.


Input an array of sites(latitude-longitudes) and a voronoi-graph will be returned to you. The information inside voronoi graph can be used to do calculations, to determine zones and even to draw the Voronoi diagram on a plane, Google Map, etc.

I've created an npm package for geovoronoi here. To install this package use the following command:
npm install geovoronoi

Above code will generate voronoi-graph data. This voronoiGraph object contains following three major objects and a key-value for execution time:

  1. cells: This contains information for all the polygons(zone) along with their owner sites. A polygon is represented by halfedges (called so as most of them are shared among two adjacent polygons). Each half-edge has start and end vertices called 'va' and 'vb' respectively. Set of start vertices of all half-edges will be all the vertices of the polygon. Information of cells is enough for basic calculations.
  2. edges: This contains information about start and end points of all the edges in the resulting Voronoi diagram.
  3. vertices: This contains information about all the vertices in the resulting Voronoi diagram.

Please refer to this README for the data inside cells.

If you found this post interesting go explore more about Voronoi. Leave a comment about your use case. You might also want to take a look at my post about Ray casting. It will help in determining whether a point lies inside a polygon or not.

Ruby on Rails Caching And JavaScript Techniques

Cross posted from darthsid

While implementing caching in a recent rails project I came across some typical caching issues. In a lot of pages the content is same for all users but certain components in them have user specific actions. As an example, I have a page listing all public messages that users have posted(similar to the public timeline in twitter) but actions on those messages are user specific(eg: only owner or admin can delete a message). Also, most of these actions use ajax and the rails authenticity token in them also gets cached resulting in subsequent failures if the session changes. Another issue was that the timestamps in most pages is fuzzy and they become irrelevant if a page gets cached for too long. I could have created separate caches for each user but if the user base really grows managing the caches would become a nightmare and that would still not solve the authenticity token and the timestamp problem. The simplest solution was to use JavaScript, more specifically jQuery.
Read more

Configuring TinyMCE SpellChecker with rails application

Last month I spent much time while configuring tinymce’s spellchecker with my rails application. But finally I got it working and thought I could be a good post. To configure this I took some help form gusto’s blog and google (as usual). So, Lets do it now without spending more time.

First of all create a rails app and install tinyMCE rails plugin:
1) rails tinymce
2) script/plugin install git://github.com/kete/tiny_mce.git
3) rake tiny_mce:scripts:install

We will be using aspell utility to check spellings so install “aspell” first. You can do it by “apt-get install aspell” or “port install aspell”

Once it is done then add following two lines in your layout(application.html.erb):
1) <%= javascript_include_tiny_mce_if_used %>
2) <%= tiny_mce if using_tiny_mce? %>

Now consider that I have a Users controller and User model with a text field ‘about_me’, and I want to use tinymce with spellchecker for that field.

To convert about_me textarea(new user form) to tinyMCE, add following code to users controller:

**Please cross check that you have added(enabled) ’spellchecker’ in :plugins and added spellchecker button(I have added it in :theme_advanced_buttons2).

Now if you go to new user form you will see that about_me text area is replaced by tinymce editor with spellchecker button.

Now, add two more lines for spellchecker in above tinymce editor configuration:

1) :spellchecker_languages => “+English=en” (english language)
2) :spellchecker_rpc_url => “/users/spellchecker” (rails url where spellchecking will be done)

So our final configuration will be look like:

Now download Spelling.rb, save it as spelling.rb in your rails lib dir and change ASPELL_PATH at line 7 according to your aspell installation.

Once it is done include this module(”include Spelling”) in application.rb or you controlller (here in our case users controller).

Now, create an action in named spellchecker in you controller(’users’):

Now, Type some thing in tinymce editor and click on spellchecker button. It will highlight misspelled words.


And when you click on highlighted misspelled word it will show suggessions.