What every web startup ought to know about public data sets ?

By manik February 26, 2009
  • Question 1 : Gimme a hot startup idea  ?

Answer : Amazon has released public data sets of 1 Terabyte(and growing) which can be integrated with AWS .

  • Question 2 : What the heck is public data sets ? And I am asking you to give me startup idea, not a company PR news

Answer :  In simple words, its a centralized repository of public domain, non proprietary scientific, demographic and medical data available in Linux and Windows Snapshots.

  • Question 3 : Yes, that was quite simple (arghhhh….)

Answer    : Okay , there are 4 categories of information you can find in these repositories

1. Biology      2. Chemistry      3. Economics      4. Encyclopaedic

Let me take an example of Encyclopaedic data.

Suppose you want to build a music mashup and you need albums data. In these data sets you will find DBpedia Knowledge Base with data of 57,000 music albums. DB Base has 2.6 million things readily accessible to you  And that is just encyclopaedic data. There is  human genome data (55 GB) , US census information , Economics , busines and industry summary data and labor statistics(inflation, employement, pay) and its going to grow with more organizations adding public data with time.

  • Question 4 : Sounds Interesting, Can machines read this data ?

Answers : I think Yes,  for example, the entire English section of Wikipedia can be dumped into a machine readable format and in postgresql database. According to Read write web , “This is like a network of libraries for robots”

  • Question 5 : Man, this is awesome. I was thinking of Location aware+ Google maps+airlines mashup and you have given me the entire US Department of Transportation Aviation data. Just Imagine what it can do for research ,analytics and knoweldge process industries.

Answer : What is your next question ?

  • Question 6 : How to get started ?

Answer :  Read How it works ? section. If you are a developer, its a great opportunity to build something interesting. If you are a business, there should be some developers who can help you to integrate these massive libraries with your next small startup with big vision.

