Using machine learning to choose compression algorithms (opens in new tab)

(vks.ai)

125 pointskootenpv6y ago58 comments

58 comments

Is there a way to do the reverse?

There's a quite "legendary" Game Boy Advance game out there (Klonoa - Densetsu no Star Medal) that never got a translation to English because it has some sort of in-house created compression by Namco applied to the game that was made so it could fit into a GBA cartridge. AFAIK no one was ever able to crack it open and release the code to de/compress it.

A while ago I had a "bounty" of USD100 for anyone that could do it (just the decompression and re-compression, not translation) but there aren't many people that want to fiddle with low-level GBA coding.

proverbialbunny6y ago

Anyone who does malware analysis professionally has to deal with packed data in new and funky ways, as malware binaries tend to be packed (ie compressed) in unique ways to get past antivirus software.

If the bounty was high enough there are people out there who do this sort of thing professionally and would probably jump on the opportunity.

eru6y ago

For something like an obscure GBA game, you can probably write up some blogposts to add reputation to the minuscule monetary value of the bounty.

1 more reply

ma2rten6y ago

Instead of trying to crack the compression algorithm directly, it would make sense to disassemble the machine code and try to understand what it does.

yorwba6y ago

I'm assuming the game is playable, i.e. the decompression code is included on the cartridge and you just don't know how it works. In that case you could emulate the game and use a language model to identify strings containing Japanese text (you'd need to know the encoding to do that) so they can be extracted for translation. That doesn't allow you to put the translations into the compressed code, but you might be able to instrument the emulator to inject translated strings on-the-fly.

amelius6y ago

You might as well write a tool that extracts strings from a video signal using OCR, and translates them. That would make the solution more universal, and you could even use it to e.g. suppress ads.

1 more reply

londons_explore6y ago

The GBA didn't have much RAM. There is a good chance tiny chunks of the game get decompressed as needed, and there is never a time when the whole thing is decompressed at once and can be dumped.

1 more reply

phire6y ago

It might end up smarter to RE the decompression and then patch the game to accept an uncompressed translation (and bump up the size of the rom from 16mb to 32mb)

anaphor6y ago

You should be able to do it with enough sample data, but it might not be perfect since you'd still be guessing at what it's really doing.

nautilus126y ago

Doesn't it depend on whether the algorithm is lossy? If it's lossy (not bijective) it's impossible to invert the function

simiones6y ago

This is discussing text or code compression. There's no point in using lossy methods for the dialogue of your game.

mappu6y ago

Are all compressors simply being run with their default arguments? There's a lot of scope for speed/filesize tradeoffs within a single compressor.

EDIT: You are missing `csv+zstd` ? It should obsolete `csv+gzip` at all speeds and compression levels.

There is a pareto-optimality frontier here - I ran my testing back in 2016 https://code.ivysaur.me/compression-performance-test/ but the numbers are now a little bit obsolete (e.g. zstd and brotli have both seen a lot of improvements).