It would be really great if you could document the file format used by the gm.cat file
The problem with documenting the file format is that I understood it and wrote the program five years ago, then extracted the songs and then forgot about it. Anyway, this is what I can recall from the code.
The file starts with an index array that allows to seek to a particular song within the file. Each element in the array is a pair of little-endian uint32's: The first of them is the offset within the file of a particular song, while the second is the length in bytes of the song. (This is best followed with a hex dump of the file in front of you.) Normally the sum of the offset and the length matches the offset of the next song. Also, there is no way to know in advance how many songs there are (the program doing the decoding is supposed to know that), but the songs start the first byte after the index array, so in fact the first offset gives the number of songs times 8 (the size of two uint32's).
The rest of the file contains the songs, as indexed in the header. Each song has three parts: the title, the subsequences and the tracks. They are packed together one after another.
The title is easy: It has a length byte and then that many characters that make up the name of the song.
The subsequence part starts again with a single byte that is the number of subsequences, and then has that many subsequences. Each subsequence starts with a little-endian uint32 representing the subsequence size (including the uint32) and then has the subsequence data, in standard MIDI format: first the delta-time, then the command, then the command data.
The track part is similar. It also has a first byte stating the number of tracks, and then that many tracks, only that tracks, as opposed to subsequences, have a leading channel byte. So a track is one channel byte (the MIDI channel to which output for that track will be sent), then a little-endian uint32 with the size of the track (including the uint32 itself but excluding the leading channel byte), and then the track data, again in standard MIDI format.
In the program I posted, the function gmext_song is responsible for loading the song data; everything from the start to the comment that reads 'write output file' is all that is needed to read a particular song into memory.
This is all about the file format. However, to properly decode the track data (as a MIDI sequencer would have to do), there are some caveats about the MIDI commands; all this is taken from the source code of the program I posted:
* Commands 0x80/0x90 have their second data byte (the volume) modified.
* Command 0xC0 treats some values of its data byte specially.
* Command 0xB0 also treats some data values specially.
* Commands 0xFD, 0xFE, 0xFF have a special meaning in the decoding.
Command 0xFE is the 'insert subsequence' command. It has a single data byte which is the number of the subsequence to be inserted replacing the 0xFE command. This way, subsequences are parts of the song which are factored out of the track data (possibly to reduce storage size) and get inserted at runtime.
I do not remember what commands 0xFD and 0xFF do. Telling from the program, since they return control from the decoding function, they must signal termination of the current subsequence or track, so they can be used to append padding delta-time at the end of a subsequence. I vaguely remember that they did occur as the last command in a sequence (I should double-check this), and it is possible that one of them meant normal ending while the other meant restart (used in the theme song to get automatic repetition).
I'd like to help to add a native decoder to OpenTTD so it could play the gm.cat file natively.
If you are interested in a native decoder, I can write it for you, if you just tell me in which format you want the data to be stored in memory.