I've started parsing
open rail data for a website that is proving more annoying to work than i'd like, but with the domain trains.live, i feel its a good enough name/domain i should give it a good go.
However, it has made me realise just how few of our trains actually run precisely to timetable, with nearly everything being a least a minute or two out.
- 2023-04-06 11_25_58-adam@ns394998_ ~.png (86.24 KiB) Viewed 40168 times
there's also some really suspiciously named locations, such as 'Doncaster Decoy'. A google told me that it dates back to the 2nd World War, Doncaster did a lot of building and they made a few decoy places so Germany would bomb the wrong places. At least, I am going to presume it wasn't named 'Doncaster Decoy' during WWII...
Furthermore, there are 5 methods of referring to the same place, which I presume comes from decades of upgrading various digital systems, each with their own way of doing it, although usually a location doesn't use all of them though. For example, the below is for Peterborough, it's 3ALPHA code is what is often just called the 'station code' that you can usually type into rail websites to quickly refer to a station if you know it, KGX is King's Cross, ZCW is... uh Canada Water & FPK is Finsbury Park for example. STANOX is what the bit of the system I am using uses, i have to quickly search a local database I have (it's a massive list of the below data) for each data push i get, so i can quickly match it to the actual full station (or rail depot / signal block) name. I also need to download an updated version each time a station closes or opens, or a new signal that is used as a timekeeping place is added, at least from what I can tell, i don't get pushes for all signals, just major ones before some stations & major junctions, which i presume are used for time data across the network.
- 2023-04-06 11_30_18-thefoxbox.xyz _ localhost _ raildata _ TIPLOCDATA _ phpMyAdmin 5.0.4deb2+deb11u1.png (6.16 KiB) Viewed 40168 times
One days worth of data, from a random day I pulled a few week ago was 388,820 lines of train movements, which took up 69.6MB of database space. I really now understand why rail websites don't tend to have historical data, or if they do, its only a month or so of history, as it takes up so much space. I can probably massively reduce that 69MB, as i was ditching every single bit of every movement relayed, i can probably ditch a lot of the data I never need to refer to.
Safe to say, building a website using the open access data Network Rail provides? Much bigger task than i expected, definitely has hints of systems built upon systems over decades of computer software with the many ways or referring to locations.
I basically have to glue many different data sources together, to get a coherent output, what i posted a above i just one of the databases of info i need, and a very basic parsing in real time of the movement data. I also need to start getting and parsing data for train cancellations, activations, changes in service, and also download and learn to parse the giant timetable itself, and then use the data above to amend the timetable data with live data as the trains progress around the network.