Rars: a Rust RAR implementation, mostly written by LLMs (opens in new tab)

(bitplane.net)

94 pointsdavidsong12d ago85 comments

I spent a fortnight using Claude to create specs for every version of RAR, then another using gpt-5.5 to write compressors in Rust.

It's not fast and it's not pretty, but it works.

85 comments

spprashant12d ago

I have never attempted something so ambitious with AI, but this feels spot on in terms of experience. As you cede more control to the model, you will find yourself losing control on things like code quality and performance.

luckystarr12d ago

You have to make performance part of the spec. It will then create benchmarks and plan differently. If you omit this, you get what you get.

dclavijo12d ago

It wonders me that a coupe of days ago I did the same with Unique and a single skill.md, repo: https://github.com/daedalus/uniq-reconstruction, on succes I tried with rar but failed. Kudos

esafak12d ago

> It’s sloppy, it’s slow, it’s almost two megabytes in size and somewhat worse than WinRAR on compression.

As mathematicians say, optimization is left as an exercise to the reader. You did the hard part.

59nadir12d ago

I mean, not really...? A vibecoded mess that runs badly, that's not really the hard part for something like compression/decompression tools.

esafak12d ago

What's your easy way of reverse engineering every previous version of this file format, if you don't think that was the hard part?

1 more reply

rebolek12d ago

> "For the last 15 months or so my hobby has been shouting at Claude"

How can you shout at Claude when it’s

1) foobaring, bamblabooing and fghrtawing all the time without telling you what’s going on

2) when it finally interacts, it’s asking for a permission you told it 30 seconds ago "yes and do not ever ask me again until heat death of the Universe"

3) and after all of that, it just spits out: "you’re out of tokens, give up your liver or wait until next Trump’s war"

wolttam12d ago

> foobaring, bamblabooing and fghrtawing all the time without telling you what’s going on

Oh man, now I have to plug my tool[0]... it doesn't hide anything, but by default tries to provide a pleasant interface (ctrl+o to toggle details similar to CC, but less janky?)

Disclaimer: It's way simpler than Claude Code or even pi (on purpose)

[0]: https://codeberg.org/mlow/lmcli

davidsongOP12d ago

The proper way to work with Claude or Codex is, IMO, to load up the context with a discussion about what you're doing and why. You go back and forth, pushing back on its opinions and shaping the context until the tokens are ready to flow into the right shape. Every angle you miss is an opportunity for them to slop out all over the place, and, until Codex was mature, the longer you ran the task for, the more it'd spread out and lose shape.

Re-shaping the context sometimes involves severe pressures like "wtf is this ugly crap?" or "did I just spot you laying a turd in my codebase again?" and other strong forms of disapproval, mixed with "hmm not sure I like the sound of that"s, to "yea that's much better" to pull it back in the other direction.

The trick is to shape the flow before the tide comes in and you end up like King Canute

xphos12d ago

Would it really take 5 years to develop rare compress and decompression that seems an extreme overestimate in time. I don't know of the compressor decompression but that seems really high

q3k12d ago

Yeah, sounds closer to a 5 week thing, if you know what you're doing.

davidsongOP12d ago

Well, it is every version of RAR. Documenting the quirks of rar 1.4, 1.5, 2.0, 2.9, 3.0, 4.0, 5.0 and 7.0, multiple compression strategies, PPMd, RARVM, compression levels, encryption, multi volume support, a huge test corpus, round trips for compatibility... The spec docs are linked.

1 more reply

self_awareness12d ago

5 week is a decompressor for 1 version. If this supports multiple versions of RAR, then writing decompressors alone for all of them is probably a year effort of work.

Sembiance12d ago

Neat. FYI your “home” links are broke. And here are 150,000 more unique RAR files if you need to test if your compressor produces the same byte-for-byte output if you unrar and then re-rar these: https://discmaster.textfiles.com/search?format=rar&dedup=ded...

davidsongOP3d ago

Thanks. I'm downloading all these now and will do a proper pass and compare outputs for correctness once complete :)

Lockal12d ago

Knowing how compressors work, even official rar may produce an archive with different hash, even within the minor updates.

Imustaskforhelp12d ago

Kudos, this is a really cool project (even if it might be AI generated), I have starred the repo, (3rd starrer here)

One thing I have been curious at is are there any ways to stop a rar compression mid way and then continue it later?

Like suppose I have a compression happening for a large file, then would there be a possibility with this project to shut down the computer mid compression and continue it after starting it again?

I would really love it if you can add this functionality!

davidsongOP12d ago

Thanks!

I guess you could save the state to a file on SIGINT, flush what's been written and pick it back up again if the state file exists when you restart, and use the CRCs of the files to abort if things have changed. I don't fancy doing that for so many versions of RAR, but it would be a cool feature to add it to an `xz` fork. I like the idea.

Imustaskforhelp12d ago

Thanks it would still be interesting to see this added to xz but supposing the fact that LLM's were able to create the rars project, I suppose it might not be that difficult to add that to rar format eventually. Starting to do it from xz might make the most sense if you like the idea right though.

Another idea for rar format that I have which I would love to hear your opinion on is that there are sometimes multiple .part01 .part02 .part03 and so on

I have found that when you try to unrar it, it requires all the files at the particular.

It would be really beneficial imo if it was possible to have the ability if there was some ability to somehow just unrar .part01 without requiring all the contents of .part02,03 etc.

but from my very limited understanding, you also need some (I think last contents) of all files for the de-compression to work.

Would it be possible to do something of this endeavour so that you don't require all the parts themselves but just something of a patch of an end, I am not sure about compression algorithms if that might be possible though but it felt like something which was a bit possible albeit hard/difficult to do with rar format.

I would be curious to hear your opinions on it, and thanks for responding and I would be really interested in seeing the xz fork that you mentioned!

slopinthebag12d ago

How do we know it's actually correct?

davidsongOP12d ago

I compressed thousands of files, went through libarchive's and Sembiance's test data at least for the decompressor side. I recompressed the files, and round-tripped them against 7zip, unrar, every later version of winrar. It failed a lot at the start, and codex burned a lot of tokens instrumenting the binaries and dividing and conquering until things settled down and round-trips worked properly.

I can't really say it works in every case as I honestly didn't spend that much time on it. But it works in the majority of cases. There's likely some nasty bugs hiding in there.

perching_aix12d ago

By using it.

slopinthebag12d ago

Thus all software that can be used is correct?

You know what I meant: How can we have confidence that this implementation of RAR is functionally identical to what it's based on? What would give me the confidence to use it in a critical piece of infrastructure?

2 more replies

repelsteeltje12d ago

It works == it's correct?

2 more replies

TacticalCoder12d ago

I could be correct but way too slow in edge cases (unlikely with Rust but you never know), leaking temporary files, having security holes, etc.

There's much more about correctness of a piece of software than: "produces the same output as the original on x test cases".

I'm not saying it's a bad implementation and, if anything, LLMs are much better at translating/porting existing code (and finding bugs) than at writing things unheard of.

You're basically saying, if I may make a pun: "rust me bro, it's correct".

2 more replies

trembolram12d ago

The same guy reimplemented Amiga LZX in Rust. https://bitplane.net/dev/rust/amiga-lzx/

He seems to like doing this kind of stuff.

themafia12d ago

> But, it works, and the world now has a free software RAR implementation.

Does it? How are you legally intending to use copyright to license this machine output? How would you know it's not encumbered in any way?

falcor8412d ago

In all seriousness, why should anyone care?

I always found software IP to be absurd, but this is a particularly absurd situation. We're talking here about a small utility tool implemented from scratch and open sourced, with no apparent intent to make any money from it.

Are you concerned about the "encumberence" of using "unlicensed" tools to manipulate .doc, or .pdf, or .mp3 files?! Well I'm not, and if anyone ever tried to sue me for improper access to their proprietary formats, I'll show them some old testament impropriety.

Georgelemental12d ago

Judges tend to frown on old testament impropriety. And corporations tend to frown on employees who draw the ire of judges

davidsongOP12d ago

I generally don't anyway. Since the WTFPL came out I've been licensing under that with a warranty clause (don't blame me).

My main goal here was an experiment to see how far I could push the technique, and learn things along the way. Regardless of whether people dare to use it commercially or not, we have interoperability for the foreseeable future. As an archivist/computing historian I think that's important.

perching_aix12d ago

Really unsure why this is getting downvoted, to my understanding this is a massive, unsettled concern.

It wasn't even a disasm/pseudocode to formal spec flow, and then a separate human implementation. The same human has been in the loop throughout, and large parts of it were generated directly.

It's basically guaranteed tainted.

Edit: I should have skimmed a bit more patiently, there was in fact no "disasm/pseudocode + the human getting tainted" part to this apparently.

ameliaquining12d ago

I read the post you're replying to as saying "this is copyright-encumbered and nonfree because it's a derivative work of everything in Claude's and GPT-5.5's training corpus", which is an argument I find fairly tiresome. (Realistically, if courts actually rule that this is the case, this tiny little project will be the least of anyone's concerns.)

"This is copyright-encumbered and nonfree because it's a derivative work of the legacy RAR binaries" is a different argument (and seems like it depends on details of the setup that were somewhat glossed over in the post).

3 more replies

charcircuit12d ago

The human wasn't looking at the copyrighted code and was giving high level steering instructions. If you look at the spec generated it doesn't look like a derivative work of the copyrighted material. The program was generated from the spec. It seems mostly fine from my perspective.

1 more reply

gibspaulding12d ago

> Really unsure why this is getting downvoted

Because it’s a boring argument that we’re not going to make progress on until it is actually tested in court.

Also, if/when this is is tested, the court’s options seem to be (a) say yeah this is fine, or (b) cause unending havoc that if followed through on would destroy the economy (a precedent that any org who’s proprietary code made it into ai training data could sue any org that was using code generated by that model? Do the math on how many suits that is.)

cactusplant737412d ago

> and it almost earned me an OpenAI ban

Were you flagged for a cybersecurity violation?

gibspaulding12d ago

> Well, it turned out that at some time during spec investigation, Claude needed to understand authenticity verification which is a paid feature. With a context full of reverse engineering tools it cracked WinRAR and bypassed product registration, then dutifully documented its crimes in the spec. The docs, when viewed, triggered OpenAI’s alarms and stopped it dead in its tracks. I squashed this out of the git history, and decided not to implement the feature at all.

You can draw your own conclusions as to what this says about the state of agentic development.

p0w3n3d12d ago

First, isn't RAR compression algorithm proprietary somehow?

Second, why compress to RAR if you can compress to 7z?

WithinReason12d ago

One thing it has built in that other algorithms don't is optional redundancy so it can recover data from a damaged archive

p0w3n3d12d ago

wow, yeah I remember even using it (when moving files on floppy). I'm all for it then, however I wonder if this won't be sued down.

rvz12d ago

> but it works.

Are you sure "it works?"

sntran12d ago

Good luck with keeping it online. Somebody built `rar-stream` with Rust, and its GitHub is no longer there.

npn12d ago

Rar is proprietary. Good luck.

snvzz11d ago

Clean room is a thing.

First use some model to make a specification. Then another to implement it from the specification.

hayd12d ago

https://law.stackexchange.com/a/83552

I suppose the question is whether the author had ever entered into a contract limiting reverse engineering...

npn12d ago

If they read the source code of unrar, or in this case, using genai (which obviously included unrar source code in its training set) then yes. You can check the agreement for unrar source code release.

unixhero12d ago

Rar means weird in Norwegian and adorable in Swedish. Just for an anecdote.

hackyhacky12d ago

It means hello in dinosaur

vedaba12d ago

Those almost sound like antonyms which is ironic given how closely related the two languages are

mhitza12d ago

Rare, in Romanian.

periodjet12d ago

Finally, a sane and enjoyable read about a coding project. Feel like it’s been months since we had one of these that wasn’t filled to the brim with bluesky/mastodon-flavored whining about AI.

Kudos to the author. A fun read, thank you for sharing.

RIMR12d ago

For everyone out there whining about AI, there's one of you whining about being anti-AI.

Maybe just cut the unprompted whining?

perching_aix12d ago

Would be great, but then it's a saturation game, and the other side doesn't have any compelling reason to hold back the same way. So it's contingent on how fair the platform is, and what nonverbal, out of band options remain.

HN is better than most in this regard thanks to community flagging, but even then there's a lot of it. Ultimately, it'd seem that the ratio you're describing skews a whole lot more towards the anti-ai sentiment side, than towards the anti-anti-ai one (or towards a stalemate). Or rather, that the latter sentiment is not common enough necessarily to thwart such comments. And so you see it reflected verbally instead.

j / k navigate · click thread line to collapse

85 comments

spprashant12d ago

luckystarr12d ago

You have to make performance part of the spec. It will then create benchmarks and plan differently. If you omit this, you get what you get.

dclavijo12d ago

It wonders me that a coupe of days ago I did the same with Unique and a single skill.md, repo: https://github.com/daedalus/uniq-reconstruction, on succes I tried with rar but failed. Kudos

esafak12d ago

> It’s sloppy, it’s slow, it’s almost two megabytes in size and somewhat worse than WinRAR on compression.

As mathematicians say, optimization is left as an exercise to the reader. You did the hard part.

59nadir12d ago

I mean, not really...? A vibecoded mess that runs badly, that's not really the hard part for something like compression/decompression tools.

esafak12d ago

What's your easy way of reverse engineering every previous version of this file format, if you don't think that was the hard part?

1 more reply

rebolek12d ago

> "For the last 15 months or so my hobby has been shouting at Claude"

How can you shout at Claude when it’s

1) foobaring, bamblabooing and fghrtawing all the time without telling you what’s going on

2) when it finally interacts, it’s asking for a permission you told it 30 seconds ago "yes and do not ever ask me again until heat death of the Universe"

3) and after all of that, it just spits out: "you’re out of tokens, give up your liver or wait until next Trump’s war"

wolttam12d ago

> foobaring, bamblabooing and fghrtawing all the time without telling you what’s going on

Oh man, now I have to plug my tool[0]... it doesn't hide anything, but by default tries to provide a pleasant interface (ctrl+o to toggle details similar to CC, but less janky?)

Disclaimer: It's way simpler than Claude Code or even pi (on purpose)

[0]: https://codeberg.org/mlow/lmcli

davidsongOP12d ago

The trick is to shape the flow before the tide comes in and you end up like King Canute

xphos12d ago

Would it really take 5 years to develop rare compress and decompression that seems an extreme overestimate in time. I don't know of the compressor decompression but that seems really high

q3k12d ago

Yeah, sounds closer to a 5 week thing, if you know what you're doing.

davidsongOP12d ago

1 more reply

self_awareness12d ago

5 week is a decompressor for 1 version. If this supports multiple versions of RAR, then writing decompressors alone for all of them is probably a year effort of work.

Sembiance12d ago

davidsongOP3d ago

Thanks. I'm downloading all these now and will do a proper pass and compare outputs for correctness once complete :)

Lockal12d ago

Knowing how compressors work, even official rar may produce an archive with different hash, even within the minor updates.

Imustaskforhelp12d ago

Kudos, this is a really cool project (even if it might be AI generated), I have starred the repo, (3rd starrer here)

One thing I have been curious at is are there any ways to stop a rar compression mid way and then continue it later?

Like suppose I have a compression happening for a large file, then would there be a possibility with this project to shut down the computer mid compression and continue it after starting it again?

I would really love it if you can add this functionality!

davidsongOP12d ago

Thanks!

Imustaskforhelp12d ago

Another idea for rar format that I have which I would love to hear your opinion on is that there are sometimes multiple .part01 .part02 .part03 and so on

I have found that when you try to unrar it, it requires all the files at the particular.

It would be really beneficial imo if it was possible to have the ability if there was some ability to somehow just unrar .part01 without requiring all the contents of .part02,03 etc.

but from my very limited understanding, you also need some (I think last contents) of all files for the de-compression to work.

I would be curious to hear your opinions on it, and thanks for responding and I would be really interested in seeing the xz fork that you mentioned!

slopinthebag12d ago

How do we know it's actually correct?

davidsongOP12d ago

I can't really say it works in every case as I honestly didn't spend that much time on it. But it works in the majority of cases. There's likely some nasty bugs hiding in there.

perching_aix12d ago

By using it.

slopinthebag12d ago

Thus all software that can be used is correct?

2 more replies

repelsteeltje12d ago

It works == it's correct?

2 more replies

TacticalCoder12d ago

I could be correct but way too slow in edge cases (unlikely with Rust but you never know), leaking temporary files, having security holes, etc.

There's much more about correctness of a piece of software than: "produces the same output as the original on x test cases".

I'm not saying it's a bad implementation and, if anything, LLMs are much better at translating/porting existing code (and finding bugs) than at writing things unheard of.

You're basically saying, if I may make a pun: "rust me bro, it's correct".

2 more replies

trembolram12d ago

The same guy reimplemented Amiga LZX in Rust. https://bitplane.net/dev/rust/amiga-lzx/

He seems to like doing this kind of stuff.

themafia12d ago

> But, it works, and the world now has a free software RAR implementation.

Does it? How are you legally intending to use copyright to license this machine output? How would you know it's not encumbered in any way?

falcor8412d ago

In all seriousness, why should anyone care?

Georgelemental12d ago

Judges tend to frown on old testament impropriety. And corporations tend to frown on employees who draw the ire of judges

davidsongOP12d ago

I generally don't anyway. Since the WTFPL came out I've been licensing under that with a warranty clause (don't blame me).

perching_aix12d ago

Really unsure why this is getting downvoted, to my understanding this is a massive, unsettled concern.

It wasn't even a disasm/pseudocode to formal spec flow, and then a separate human implementation. The same human has been in the loop throughout, and large parts of it were generated directly.

It's basically guaranteed tainted.

Edit: I should have skimmed a bit more patiently, there was in fact no "disasm/pseudocode + the human getting tainted" part to this apparently.

ameliaquining12d ago

3 more replies

charcircuit12d ago

1 more reply

gibspaulding12d ago

> Really unsure why this is getting downvoted

Because it’s a boring argument that we’re not going to make progress on until it is actually tested in court.

cactusplant737412d ago

> and it almost earned me an OpenAI ban

Were you flagged for a cybersecurity violation?

gibspaulding12d ago

You can draw your own conclusions as to what this says about the state of agentic development.

p0w3n3d12d ago

First, isn't RAR compression algorithm proprietary somehow?

Second, why compress to RAR if you can compress to 7z?

WithinReason12d ago

One thing it has built in that other algorithms don't is optional redundancy so it can recover data from a damaged archive

p0w3n3d12d ago

wow, yeah I remember even using it (when moving files on floppy). I'm all for it then, however I wonder if this won't be sued down.

rvz12d ago

> but it works.

Are you sure "it works?"

sntran12d ago

Good luck with keeping it online. Somebody built `rar-stream` with Rust, and its GitHub is no longer there.

npn12d ago

Rar is proprietary. Good luck.

snvzz11d ago

Clean room is a thing.

First use some model to make a specification. Then another to implement it from the specification.

hayd12d ago

https://law.stackexchange.com/a/83552

I suppose the question is whether the author had ever entered into a contract limiting reverse engineering...

npn12d ago

unixhero12d ago

Rar means weird in Norwegian and adorable in Swedish. Just for an anecdote.

hackyhacky12d ago

It means hello in dinosaur

vedaba12d ago

Those almost sound like antonyms which is ironic given how closely related the two languages are

mhitza12d ago

Rare, in Romanian.

periodjet12d ago

Finally, a sane and enjoyable read about a coding project. Feel like it’s been months since we had one of these that wasn’t filled to the brim with bluesky/mastodon-flavored whining about AI.

Kudos to the author. A fun read, thank you for sharing.

RIMR12d ago

For everyone out there whining about AI, there's one of you whining about being anti-AI.

Maybe just cut the unprompted whining?

perching_aix12d ago

j / k navigate · click thread line to collapse