30 Aug 2011

Geo/Proximity Search with MySQL

Search posted by waseem

In location based web applications we come across a problem that requires us to fetch some information from our database that is at some certain distance from a particular location. Generally this problem can be defined as following.

Give me all the things near location

Queries like “Fetch all the nearest friends”, “Fetch all the nearest restaurants” or “Fetch all the closest software developers” are essentially same as above.

If we analyze above problem, we see that we have three keywords in it viz. things, near and location. things is the data-set in which we are looking for information. It is usually a table in our database. It could be a users or restaurants table. The keyword near is the amount of distance within which we need to look into our data-set for information. It is given by a user using our application or already defined by us. Normally it is a few kilometers or miles. The third keyword, location is the place from which we need to measure the nearness of records in our data-set. This is also given by users using our application. It is identified by latitude and longitude on earth’s surface.

If we analyze above problem from engineering point of view, we can define it as following.

Give me all the things
order by distance from location
where distance < kilometers

We are actually looking for records sorted by their distance from location. And we are considering only those records that have distance less than specified by user in kilometers or miles. Out of four keywords in the problem, we know three already viz. things — a table in our database, location and kilometers — provided by user. The only thing that we do not know is the distance between location and records in our data-set. Lets find a way to calculate the distance.

We have latitude and longitude of all the records in our data-set. For now lets assume that we have a hotels table in our database that has following structure.

id name latitude longitude
1 Le Meridien 6.4420085 3.415344
2

We also have location of the user in the form of latitude and longitude.

The distance between two points on the surface of a sphere is given by Haversine Formula. Distance is given by following formula.

In above formula,

d = Distance between two points.
R = Radius of the sphere on which the points are. In our case it is earth with 3959 Miles or 6371 Kilometers.
lat1 and long1 are co-ordinates of the first point and lat2 and long2 are co-ordinates of second point in radians.

The corresponding MySQL statement to calculate distance between locations orig and dest:

  6371 * 2 * ASIN(SQRT(POWER(SIN(RADIANS(orig.lat - ABS(dest.lat))), 2) + COS(RADIANS(orig.lat)) * COS(RADIANS(ABS(dest.lat))) * POWER(SIN(RADIANS(orig.long - dest.long)), 2)))

And our complete MySQL query to get all the hotels that are within 10 kilometers from location identified by orig.

SELECT id, name,
6371 * 2 * ASIN(SQRT(POWER(SIN(RADIANS(orig.lat - ABS(hotels.latitude))), 2) + COS(RADIANS(orig.lat)) * COS(RADIANS(ABS(hotels.latitude))) * POWER(SIN(RADIANS(orig.long - hotels.longitude)), 2))) AS distance
FROM hotels
HAVING distance < 10
ORDER BY distance LIMIT 10;

Benchmark

If we run EXPLAIN on the previous query we do not get very pleasant results.

mysql> EXPLAIN query \G
           id: 1
  select_type: SIMPLE
        table: hotels
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10574
        Extra: Using filesort
1 row in set (0.00 sec)

This query scanned the whole table! It went through all the 10574 records in the table one by one and found the distance of that record from the location and then sorted all the records by its distance from it. This is not scalable for large data-sets.

Optimization

If we analyze our original problem once again.

Give me all the things
order by distance from location
where distance < kilometers

Earlier we scanned whole table to find out the distance between each record and location specified by the user. But we are looking for records that are within the kilometers as specified by the user. If somehow we can find a relationship that describes how co-ordinates change with change in distance, we can get the co-ordinates of hotels that may be farthest from the location specified by the user. These farthest co-ordinates will form a boundary within which all other records will reside. Then we need to look only for the records that are within this boundary. It turns out to be that we do have relationship between co-ordinates and distance.

1˚ of latitude ~= 111.04 Kms
1˚ of longitude ~= cos(latitude) * 111.04 Kms

We calculate the co-ordinates of the bounding box using above relationship between distance and co-ordinate change. We do this for location from orig and allowed distance given by dist

long1 = orig.long – dist / ABS(COS(RADIANS(orig.lat)) * 111.04)
long2 = orig.long + dist / ABS(COS(RADIANS(orig.lat)) * 111.04)

lat1 = orig.lat – dist / (111.04)
lat2 = orig.lat + dist / (111.04)

The bounding box will encompass the circle whose center and radius is given by user. Following figure gives us an idea how it will look like.

The black dot is center of the circle it is also the location of the user. Radius of the circle is allowed kilometers withing which we need to look. Since the box is larger than the circle, it will encompass some records that do not fall within the circle. This may give use some false hits. The red dots denote false hits. The green dots denote correct hits.

For faster lookups, we define indexes on latitude and longitude columns in our table.

CREATE INDEX `index_hotels_on_latitude_and_longitude` ON `hotels` (`latitude`, `longitude`)

We also change our original query to find the records within above box whose boundaries are defined by lat1, long1 and lat2, long2.

SELECT id, name,
6371 * 2 * ASIN(SQRT(POWER(SIN(RADIANS(orig.lat - ABS(hotels.latitude))), 2) + COS(RADIANS(orig.lat)) * COS(RADIANS(ABS(hotels.latitude))) * POWER(SIN(RADIANS(orig.long - hotels.longitude)), 2))) AS distance
FROM hotels
WHERE hotels.latitude BETWEEN lat1 AND lat2 AND hotels.longitude BETWEEN long1 AND long2
HAVING distance < 10
ORDER BY distance LIMIT 10;

If we run EXPLAIN on above query.

mysql> EXPLAIN query \G
           id: 1
  select_type: SIMPLE
        table: listings
         type: range
possible_keys: index_hotels_on_latitude_and_longitude
          key: index_hotels_on_latitude_and_longitude
      key_len: 10
          ref: NULL
         rows: 825
        Extra: Using where, Using filesort
1 row in set (0.00 sec)

We see that number of rows now scanned are far more less than earlier. This time it scanned only 825 rows. In fact these are the number of records that are within the bounding box. So the number of rows that are to be scanned depend upon the number of records falling within the box. If our application has records that are concentrated around a place, number of scanned rows will be higher for query executed around that place.

Using MySQL Spatial Data types

MySQL natively supports Spatial Data Types. Lets use these data types to solve our problem.

ALTER TABLE hotels ADD location POINT NOT NULL;

UPDATE hotels SET location = POINT(hotels.latitude, hotels.longitude);

ALTER TABLE hotels ADD SPATIAL KEY (location);

First we add a location column in our hotels table of POINT type. We also update the location column to have existing co-ordinates of our records. Finally we add a spatial key on the location column.

As of now MySQL does not have any standard GIS function that calculates distance between two points on the surface of a sphere. Lets define our own.

DELIMITER $$
 CREATE FUNCTION distance (a POINT, b POINT) RETURNS double DETERMINISTIC
   BEGIN
     RETURN 6371 * 2 * ASIN(SQRT(POWER(SIN(RADIANS(ABS(X(a)) - ABS(X(b)))), 2) + COS(RADIANS(ABS(X(a)))) * COS(RADIANS(ABS(X(b)))) * POWER(SIN(RADIANS(Y(a) - Y(b))), 2)));
   END  $$
DELIMITER ;

We use the same haversine formula to calculate distance between point a and point b.

Since we are now using Point data type to store locations of hotels in our database, we will use MySQL’s GIS functions to specify bounding box.

SET @bbox = 'POLYGON((lat1 long1, lat1 long2, lat2 long2, lat2 long1, lat1 long1))'

Above defines a polygon that has four vertices defined by (lat1 long1), (lat1 long2), (lat2 long2) and (lat2 long1). The last coordinate explicitly specifies that it is a closed polygon. The closing point is same as the first vertice of the polygon. Variables lat1, lat2, long1 and long2 are the same that we calculated earlier.

Now we update our SQL query to use POINT data types.

SELECT id, name, distance(hotels.location, POINT(orig.lat, orig.long)) AS cdist
FROM hotels
WHERE INTERSECTS(hotels.location, PolygonFromText(@bbox))
HAVING cdist < 10
ORDER BY cdist LIMIT 10;

Above, we are using INTERSECTS function provided by MySQL to find out whether location of a hotel falls within the bounnding box or not. PolygonFromText simply returs a polygon defined in a string. Lets run EXPLAIN on above query.

mysql> EXPLAIN query \G
           id: 1
  select_type: SIMPLE
        table: hotels
         type: range
possible_keys: location
          key: location
      key_len: 34
          ref: NULL
         rows: 70
        Extra: Using where; Using temporary; Using filesort

The number of rows scanned is significantly decreased. We can see that storing coordinates in POINT data type has improved the performance.


Following are some pointers that might be helpful exploring MySQL and Proximity searching.



Did you like this? Share it:
09 Apr 2010

Introduction to Active Scaffold Part I

Active Scaffold, tutorials posted by payal

I originally wrote this article for fifth issue of Rails Magazine. In addition to print version, you can download all issues of Rails Magazine in pdf for free. Not only that, they also allow authors to self-publish their articles on blogs etc. after 30 days of issue release. I hope rails beginners will particularly find this post useful.


In this two-part series, I will trying to explain Active Scaffold from the ground up using a real world example.

Introduction

ActiveScaffold is a rails plugin that generates an ajaxified management interface based on the database tables.  It is an incredibly powerful, configuration-driven framework that automatically provides all CRUD operations. It is easily and highly configurable. Let us have a walkthrough of active scaffold with details on configurability and customization options it provides using a sample application. The demo application which we are going to develop is a management site for a sports club.

Configuration

Here our example is about building an administration panel for a sports club “mysportsclub”. It consists of sports, players, coaches and equipments. There are various kinds of sports in the club. Each player can have membership for any number of sports. Each sport will have one trained coach for guidance and various equipments which will be issued to the players. Let’s begin with the implementation of “mysportsclub”

Let’s setup our rails application “mysportsclub”. Run the following commands:

rails mysportsclub

Install the latest version of the plugin which is compatible with latest stable rails (2.3.x):

script/plugin install git://github.com/activescaffold/active_scaffold.git

Note: Latest active scaffold version works only with rails version higher than 2.2. Also, install render_component plugin since it is now removed from rails 2.3 but used by active scaffold.

Layout:

This is the standard active scaffold layout admin.html.erb:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
   <title>My Sports Club</title>
   <%= javascript_include_tag :defaults %>
   <%= active_scaffold_includes %>
</head>
<body>
   <%= yield %>
</body>
</html>

Model/Migrations:

script\generate model sport name:string description:text coach_id:integer
script\generate model player name:string date:birth_date
script\generate model equipment name:string sport_id:integer
script\generate model coach name:string
script\generate model players_sport player_id:integer sport_id:integer

rake db:migrate

Add the following code to the respective models. This code defines the associations for the various models:

sports.rb

class Sport < ActiveRecord::Base
  belongs_to :coach
  has_many	:equipments, :dependent => :destroy

  has_many	:players_sports, :dependent => :destroy
  has_many	:players, :through	=> :players_sports

  validates_presence_of	:name
  validates_uniqueness_of	:name
end

coach.rb

class Coach < ActiveRecord::Base
  has_one	:sport
  validates_presence_of	:name
end

player.rb

class Player < ActiveRecord::Base
  has_many	:players_sports, :dependent => :destroy
  has_many	:sports, :through	=> :players_sports

  validates_presence_of	:name
end

equipment.rb

class Equipment < ActiveRecord::Base
  belongs_to	:sport
  validates_presence_of	:name
end

players_sport.rb

class PlayersSport < ActiveRecord::Base
  belongs_to	:player
  belongs_to	:sport
end

Note I

Don’t forget to add the plural form of “equipment” as “equipments” in the config/intializers/inflections.rb.

 ActiveSupport::Inflector.inflections do |inflect|
   inflect.plural "equipment", "equipments"
 end

Active Scaffold will throw an exception if we do not define the above inflection rule.  The controller generated for the ‘Equipment’ model will be ‘EquipmentsController’. But since active scaffold generates the controller name automatically using its inbuilt function: “active_scaffold_controller_for(class_name_of_the_model)”. It will not be able to guess the correct controller name and would throw an exception while we try to access the nested equipments:

Controllers:

script\generate controller sports
script\generate controller coaches
script\generate controller equipments
script\generate controller players

Add the following code to your controllers:

sports_contrpller.rb

class SportsController < ApplicationController
  active_scaffold :sport do |config|
    config.columns = [:name, :description, :coach, :equipments]
  end
end

coaches_controller.rb

class CoachesController < ApplicationController
  active_scaffold :coach do |config|
    config.columns = [:name, :sport]
  end
end

players_controller.rb

class PlayersController < ApplicationController
  active_scaffold :player do |config|
    config.columns = [:name]
  end
end

equipments_controller.rb

class EquipmentsController < ApplicationController
  active_scaffold :equipment	do |config|
    config.columns = [:name, :sport]
  end
end

players_sports_controller.rb

class PlayersSportsController < ApplicationController
  active_scaffold :players_sport do |config|
    config.columns = [:sport, :player]
  end
end

routes.rb

map.root :controller => 'sports', :action => 'index'

map.resources :sports, :active_scaffold => true
map.resources :players, :active_scaffold => true
map.resources :players_sports, :active_scaffold => true
map.resources :coaches, :active_scaffold => true
map.resources :equipments, :active_scaffold => true

The above code will generate a complete interface for our club.

Following code for your layout will provide a navigation bar in your view.

admin.html.erb

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <%= stylesheet_link_tag 'admin.css' %>
    <%= javascript_include_tag :defaults %>
    <%= active_scaffold_includes %>
    <title><%= "Admin" %></title>
  </head>
  <body>
    <div>
      <div>
        <h2>Agent Website Admin Panel</h2>
        <div>
          <ul class="links">
            <li>
              <%= link_to 'Sports', sports_path %>
            </li>
            <li>
              <%= link_to 'Players', players_path %>
            </li>
            <li>
              <%= link_to 'Coaches', coaches_path %>
            </li>
            <li>
              <%= link_to 'Equipments', equipments_path %>
            </li>
            <li>
              <%= link_to 'PlayersSports', playerssports_path %>
            </li>
          </ul>
          <div class="clear">
          </div>
        </div>
      </div>
      <div id="content">
        <div id="main">
          <%= yield %>
        </div>
      </div>
    </div>
  </body>
</html>

There are various customization options available. As per our requirement we can customize the columns, links, fields, actions etc. We would discuss them in the concluding part of the series.


Payal Gupta is working working as software engineer with Vinsol. She has over two years of industry
experience.


Did you like this? Share it:
25 Jul 2006

Basic Ruby/Rails Tutorials

General, Ruby On Rails resources, rails, ruby, tutorials, www posted by Ritu
A