This is highly commendable effort. I have been the lead programmer of an India based travel search engine, so understand the value of this data.
Another problem to be solved is: Building a cache of seat availability. Basically, the Indian Railways guys can not provide the availability fast enough. And it affects every body - from end consumers using their websites, to OTA sites, to travel-search players.
I can't understand, why no body in the Railways IT dept. (i.e. CRIS) thought of having a PUSH based cache. The way it can work is each reservation/booking done causes a PUSH onto the state of the seat availability cache. Which is a very fast cache (probably based on memcached).
If they allow me, I can code the cache for FREE, for them. Provided they also stop being so possessive about their data. And allow for apps like yours and mine to flourish, which ultimately benefit the traveler.
After noticing a comment on the site, I checked the "Disclaimer" pages on the 2 official sites (Both don't allow storing this data in any form). However, the 3rd source http://indiarailinfo.com/ is not affiliated to the gov.
Are there any NGO/organizations that could push for such data to be made public in India? I'll try getting in touch with CIS (http://www.cis-india.org/).
Thanks for informing me that it is not legal to scrape http://www.indianrail.gov.in. I am rewriting the code to scrape only http://indiarailinfo.com, which does not explicitly forbid scraping. Is it legal to scrape such a site (without overloading it)? Is it legal to share that code?
I'm not sure if there are any general rules/guidelines in India for this. However, IndianRailInfo doesn't have any Copyright/Disclaimer mentioned on their homepage. So I see no reason why it should be a problem.
Re:Source, till it's not used to scrape Copyright/private data, it shouldn't be a problem. Mention this clause in the disclaimer/license. Will you be putting it on GitHub?
It's probably to prevent link scrapers from putting further load on the already poorly performing site! If they allow people to freely use the data, there's a good chance that a lot of sites mushroom to provide railways data
That holds true for all open data, right? I still find others (like NY MTA: http://mta.info/developers/) providing open data (in GTFS).
Agreed about the load part though.
Appreciate the effort. I asked this other day on HN on a related thread. But is crawling the official sites legal? And also how frequent it's going to be updated? Just want to make sure before using this in an idea I have for a Windows phone app.
This is awesome. Doing this for the Netherlands, and there is a public API (but no schedule API, just a trip planner).
Plans for GTFS? As awful as it seems, it is a nice format and has lots of tools. In half an hour with OpenTripPlanner and OpenStreetMaps and GTFS data (and a computer with 16GB RAM) you can have a trip planner ready to go.
On side Note : If someone decides to build train timetable site - it would be highly useful to add Local taxi hire company numbers (near destination station). It will be highly useful as people would love to pre-book taxis as well.
Brilliant. I'm excited about attempting to use the data to produce an Indian timetable site on top of the frontend I've built for Irish rail and bus; http://getthere.ie/
Excellent & highly useful database. Its always helpful to get such type of data in public domain. This way independent developers can build useful apps for community.
Another problem to be solved is: Building a cache of seat availability. Basically, the Indian Railways guys can not provide the availability fast enough. And it affects every body - from end consumers using their websites, to OTA sites, to travel-search players.
I can't understand, why no body in the Railways IT dept. (i.e. CRIS) thought of having a PUSH based cache. The way it can work is each reservation/booking done causes a PUSH onto the state of the seat availability cache. Which is a very fast cache (probably based on memcached).
If they allow me, I can code the cache for FREE, for them. Provided they also stop being so possessive about their data. And allow for apps like yours and mine to flourish, which ultimately benefit the traveler.
EDIT: minor rephrase