I've done few formats myself. Nothing complicated. But once you do one, all others are essentially the same. You need length of data, data itself and then likely version and magic bytes for identification purposes. With those few details you can do essentially anything.
For example, one format I use is just to concatenate multiple files into a single one, I use it to group video timeline seeker images into one file - it is faster than using archive or tar/gzip. Another one is a format that concatenates AES-GCM chunks into a single file, which allows me to have interrupted writes and it also supports seeking and streaming of reads.
These things are quite useful, but there is no general use(like gzip/tar). Usually there is some specific functionality needed, so they have to always be written from scratch.
z500 23 hours ago [-]
> For example, one format I use is just to concatenate multiple files into a single one, I use it to group video timeline seeker images into one file - it is faster than using archive or tar/gzip
I did something like this when I was moving my files onto a new computer like 25 years ago, and all I had was a floppy drive. Just continuously dump the data onto a floppy until space runs out and ask for another one until there are no more files.
gethly 22 hours ago [-]
Floppy disks..ah, good times :)
mring33621 1 days ago [-]
I'd buy the AES-GCM chunks one for a dollar!
gethly 23 hours ago [-]
I spent quite a lot of time on that one, for obvious reasons. But in general it is not too hard. The GCM is a block-based cypher with built-in checksum, unlike CTR, which is a streaming one. So all you need to do is have a fixed block size where you store the header and the data. The nonce is 12 bytes, gcm tag is 16 bytes, so that is fixed 28 bytes. After some experimenting, 64kb block size seemed to work the best, despite it being quite a large chunk of data. And then, as you know, you have exactly 64kb of data in each chunk, you just stack them one after another. The hard part is then handling reads as you need to know into which chunk you have to seek, decrypt it and then seek to the correct position to stream/read the correct data. And once you reach the end of the chunk to move on to the next one. It is a bit tricky but perfectly doable and have been working for me for probably 3 years now. One caveat is to properly handle the last chunk as that one will not be full 64kb but whatever was left in the buffer of the last data. This is important for appending to existing files.
mring33621 21 hours ago [-]
I've been just re-encrypting to CTR and streaming from that. You can stream ok from a big, single GCM file, but random-access has to faked by always restarting at 0...
gethly 20 hours ago [-]
Problem with CTR is that it is not a block-based cypher, which means you cannot append to existing file. For example if you have multipart file uploads, this would just not work. Also CTR lacks checksum integrity, it only XORs the bytes.
And yeah, like I said, random access is possible but you have to write your own "driver" for it.
ktpsns 1 days ago [-]
The amount of energy put into reversing games is incredible. This is real passion combined with expertise. Similarly skilled people unlocked Photoshop or MSO decades ago (and certainly still do where possible). Given that I have shifted my focus to OSS a few decades ago this gives me nostalgic feelings but I am also happy not having to regularly fight against software vendors and their ideas of software distribution.
vivzkestrel 23 hours ago [-]
what about splinter cell conviction, 15 yrs and nobody has figured out its map file format .unr that uses custom unreal engine 2.x. It even has a tool that lets you unpack its UMD files https://github.com/wcolding/UMDModTemplate The library on github requires this tool unumd https://www.gildor.org/smf/index.php/topic,458.msg15196.html... The same tool also works for blacklist. I would like to change the type of enemy spawned in the map but I cannot find any assistance on it. UEExplorer doesnt work because it is some kinda custom map file
justsomehnguy 1 days ago [-]
> # Why??????
> The CPU wasn't terribly slow for the time, but wasting cycles would have been noticed.
> Compressing data means you save space on the disc...
While wasting cycles isn't a good thing it's even worse if you are wasting those cycles by not using them because you are waiting for a sloooow media.
And while you can invent a compressed format for the every asset type you have it would be really easier to just compress the whole thing a let the compressor to do the magic.
NB: I still somewhat remember the original SC and it was like 'future is now' with all those glorious shadows and sunshine blooming.
burnt-resistor 21 hours ago [-]
Neat.
I've been authoring IFF/LBM and PCX format en/decode libraries recently because of the half-assed implementations that half-heartedly cherrypick a few features rather than fully-support these formats robustly.
Rendered at 16:25:53 GMT+0000 (Coordinated Universal Time) with Vercel.
For example, one format I use is just to concatenate multiple files into a single one, I use it to group video timeline seeker images into one file - it is faster than using archive or tar/gzip. Another one is a format that concatenates AES-GCM chunks into a single file, which allows me to have interrupted writes and it also supports seeking and streaming of reads.
These things are quite useful, but there is no general use(like gzip/tar). Usually there is some specific functionality needed, so they have to always be written from scratch.
I did something like this when I was moving my files onto a new computer like 25 years ago, and all I had was a floppy drive. Just continuously dump the data onto a floppy until space runs out and ask for another one until there are no more files.
And yeah, like I said, random access is possible but you have to write your own "driver" for it.
> The CPU wasn't terribly slow for the time, but wasting cycles would have been noticed.
> Compressing data means you save space on the disc...
While wasting cycles isn't a good thing it's even worse if you are wasting those cycles by not using them because you are waiting for a sloooow media.
And while you can invent a compressed format for the every asset type you have it would be really easier to just compress the whole thing a let the compressor to do the magic.
NB: I still somewhat remember the original SC and it was like 'future is now' with all those glorious shadows and sunshine blooming.
I've been authoring IFF/LBM and PCX format en/decode libraries recently because of the half-assed implementations that half-heartedly cherrypick a few features rather than fully-support these formats robustly.