One of my guilty pleasures as a programmer is writing extractors for weird/old/undocumented file formats. I’m not very good at it, but I try. My latest victim was the “.MTF” format used by Delphine Software’s 1998 Darkstone, the first RPG I ever owned:
All the data files except for movies are in these obscure binary archives:
I searched Google in vain for any information on the format. No, this is not the same thing as Microsoft Tape Format. The only thing I found to help was a utility called dsxtract.exe, which extracted all the mp2 music files from MUSIC.MTF. It didn’t run on Windows 7 x64 (and of course the source code is nowhere to be found), but DosBox did the trick.
Now let’s take a look at MUSIC.MTF in a hex editor:
Ugh, not even an ASCII header. We can see that starting at byte 9, there’s an ASCII string, so the first 8 bytes are probably integers.
First things first: integers are usually little-endian. This means that if you blindly paste the highlighted 4 bytes in calc.exe and make it convert from hex to decimal, you’ll get 419430400, a number that has no obvious meaning. The trick is to invert the order of bytes: 00000019 is 25 in decimal. Does “25” make any sense?
Well when I ran dsxtract.exe, it produced 25 mp2 tracks. So it would appear that this first integer is the number of entries in the file.
So let’s look at the next integer. D is 13 in decimal. Hum, ok?
What about that next string. “MUSIC\22.MP2”. That looks like a path name. And it’s 12 characters long. Hum, almost 13… wait! The next byte is 00, the null character, so this is a 13-byte null-terminated string, and the integer before it was its length! Every string at the beginning of this file has 12 characters followed by 00, and each one is prefixed with the number 13. To confirm this hypothesis, I also looked at DATA.MTF which had paths of various lengths, and each was prefixed with the number of characters + 1. So, there we go.
Between “MUSIC\22.MP2” and the next string, there are 8 bytes:
75 02 00 00 93 FC 15 00
This is most likely two integers: 629 and 1440915. They might somehow indicate where this file is located within the archive. Let’s see what we have at offset 629:
FF FD 90 04 53 33 11 11 11 11 11 11 11 11 11 24
Well, I know nothing of the MP2 file format, but if this is the start of “MUSIC\22.MP2” then it probably looks similar to any other MP2 file. Let’s look at one of the extracted files in a hex editor:
FF FD 90 04 55 22 11 11 11 11 11 11 11 11 11 24
Sweet! Now let’s look at the file size of 22.MP2: 1 440 915 bytes. That’s exactly the second number following the path name, so that would give its size.
Note that I make this look very easy; in fact I spent about 3 hours to find this. Anyway.
At this point we can write the spec down for Darkstone’s MTF file format:
4 bytes – numFiles: integer; Number of files in the archive
This is followed by [numFiles] data entries. Each is structured like so:
4 bytes – pathLength: integer; Length in bytes of the next string.
[pathLength] bytes – null-terminated ASCII string: path of the entry.
4 bytes – offset: integer; absolute offset of the data in the archive.
4 bytes – size: integer; size of the data in the archive
Then follows all the data.
I then wrote a little C# application that read an entry, fetched the corresponding data, put it in a file of the same name, and proceeded to the next entry until they were all done. That almost worked. It would throw exceptions while reading DATA.MTF, because some of the specified sizes there are invalid and result in reading past the end of the file. Ugh.
So I had to resort to a more involved approach. Instead of processing one entry at a time, I start by reading all entries (path, offset, size), making a list of that, sort it by offset, and then go over that list to fetch the corresponding data. For each entry, I check if the size makes sense; if it doesn’t, I use the next entry’s offset to calculate the “real” size. Note that the real size could actually be smaller (there are unused bytes), but I suppose that’s the best I can do.
This worked well. Extracting these archives reveals the many file formats used by Darkstone (a lot more fun in store!):
- MP2 – well-known sound format, used for all music and speech
- WAV – well-known sound format, used for most sounds
- DAT – could be anything; used by a few relatively small files: “SND.DAT”, “LANGUAGE.DAT”, etc.
- AND – looks related to 3D models, I don’t know.
- B3D – maybe this? I don’t know.
- BRM – no clue
- CLD – very repetitive (FC 01 FC 01 FC C0 FC 00 01 FC 01 FC C0 FC 01 FC 01 FC C0…), but I don’t know.
- MBR – obscure binary format, I haven’t got a clue
- MDL – idem
- SKA – idem
- O3D – seems used for meshes, maybe it’s Objective-3D, about which no one knows apparently.
Here’s the full listing if anyone wants to use it. This must be compiled with the /UNSAFE compiler option: