Apparently every important browser has supported it for well over a decade: https://caniuse.com/mdn-api_window_stop
Here's a screenshot illustrating how window.stop() is used - https://gist.github.com/simonw/7bf5912f3520a1a9ad294cd747b85... - everything after <!-- GWTAR END is tar compressed data.
Posted some more notes on my blog: https://simonwillison.net/2026/Feb/15/gwtar/
But could be very interesting for use cases where the main logic lives on the server and people try to manually implement some download- and/or lazy-loading logic.
Still probably bad unless you're explicitly working on init and redirect scripts.
I made my own bundler skill that lets me publish artifacts https://claude.ai/public/artifacts/a49d53b6-93ee-4891-b5f1-9... that can be decomposed back into the files, but it is just a compressed base64 chunk at the end.
I guess the next question will be if it does work in environments that let you share a single file, will they disable this ability once they find out people are using it.
Php has a similar feature called __halt_compiler() which I've used for a similar purpose. Or sometimes just to put documentation at the end of a file without needing a comment block.
Hell, html is probably what word processor apps should be saving everything as. You can get pixel-level placement of any element if you want that.
Yes, they're both approximately the same in terms of size on disk and even network traffic for a fully loaded page, one is a much better browser experience.
> You can get pixel-level placement of any element if you want that.
You may well be able to, but it is largely anathema to the goals of html.
But not being able to "just" load the file into a browser locally seems to defeat a lot of the point.
[1] https://en.wikipedia.org/wiki/Television_pilot#Backdoor_pilo...
On this case I wonder if the format can be further optimized. For example, .js files are supported for loading locally and albeit a very inefficient way to load assets, it could overcome this local disk limitation and nobody reads the HTML source code in either way so it won't need to win any code beauty contests. I'll later look into this theory and ping the author in case it works.
As final wish list, would be great to have multiple versions/crawls of the same URL with deduplication of static assets (images, fonts) but this is likely stretching too much for this format.
Multiple versions or multiple pages (maybe they can be the same thing?) would be nice but also unclear how to make that. An iframe wrapper?
I considered and rejected deduplication and compression. Those can be done by the filesystem/server transparent to the format. (If there's an image file duplicated across multiple pages, then it should be trivial for any filesystem or server to detect or compress those away.)
> An iframe wrapper?
The way Archive.org does this navigation between multiple versions is quite pleasant to use. Don't know for sure but might be an iframe added on top.
I certainly could be missing something (I've thought about this problem for all of a few minutes here), but surely you could host "warcviewer.html" and "warcviewer.js" next to "mycoolwarc.warc" "mycoolwrc.cdx" with little to no loss of convenience, and call it a day?
Would W3C Web Bundles and HTTP SXG Signed Exchanges solve for this use case?
WICG/webpackage: https://github.com/WICG/webpackage#packaging-tools
"Use Cases and Requirements for Web Packages" https://datatracker.ietf.org/doc/html/draft-yasskin-wpack-us...
As far as I know, we do not have any hash verification beyond that built into TCP/IP or HTTPS etc. I included SHA hashes just to be safe and forward compatible, but they are not checked.
There's something of a question here of what hashes are buying you here and what the threat model is. In terms of archiving, we're often dealing with half-broken web pages (any of whose contents may themselves be broken) which may have gone through a chain of a dozen owners, where we have no possible web of trust to the original creator, assuming there is even one in any meaningful sense, and where our major failure modes tend to be total file loss or partial corruption somewhere during storage. A random JPG flipping a bit during the HTTPS range request download from the most recent server is in many ways the least of our problems in terms of availability and integrity.
This is why I spent a lot more time thinking about how to build FEC in, like with appending PAR2. I'm vastly more concerned about files being corrupted during storage or the chain of transmission or damaged by a server rewriting stuff, and how to recover from that instead of simply saying 'at least one bit changed somewhere along the way; good luck!'. If your connection is flaky and a JPEG doesn't look right, refresh the page. If the only Gwtar of a page that disappeared 20 years ago is missing half a file because a disk sector went bad in a hobbyist's PC 3 mirrors ago, you're SOL without FEC. (And even if you can find another good mirror... Where's your hash for that?)
> Would W3C Web Bundles and HTTP SXG Signed Exchanges solve for this use case?
No idea. It sounds like you know more about them than I do. What threat do they protect against, exactly?
Browsers check SRI integrity hashes if they're there
There's HTTP-in-RDF, and Memento protocol. VCR.py and similar can replay HTTP sessions, but SSL socket patching or the TLS cookie or adding a cert for e.g. an archiving https proxy is necessary
Browser Devtools can export HAR HTTP archives
If all of the resource origins are changed to one hostname for archival, that bypasses same origin controls on js and cookies; such that the archived page runs all the scripts in the same origin that the archive is served from? Also, Browsers have restrictions on even inline JS scripts served from file:/// urls.
FWIU Web Bundles and SXG were intended to preserve the unique origins of resources in order to safely and faithfully archive for interactive offline review.
Similar to the window.stop() approach, requests would truncate the main HTML file while the rest of that request would be the assets blob that the service worker would then serve up.
The service worker file could be a dataURI to keep this in one file.
Works locally, but it does need to decompress everything first thing.
How does it bypass the security restrictions which break SingleFileZ/Gwtar in local viewing mode? It's complex enough I'm not following where the trick is and you only mention single-origin with regard to a minor detail (forms).
Of course, since it's on an HTTP server, it could easily handle doing multiple requests of different files, but sometimes that's inconvenient to manage on the server and a single file would be easier.
Maybe this is downstream of Gwern choosing to use MediaWiki for his website?
> Maybe this is downstream of Gwern choosing to use MediaWiki for his website?
This has nothing at all to do with the choice of server. The benefit of being a single-file, with zero configuration or special software required by anyone who ever hosts or rehosts a Gwtar in the future, would be true regardless of what wiki software I run.
(As it happens, Gwern.net has never used MediaWiki, or any standard dynamic CMS. It started as Gitit, and is now a very customized Hakyll static site with a lot of nginx options. I am surprised you thought that because Gwern.net looks nothing like any MediaWiki installation I have seen.)
- an executable header
- which then fuse mounts an embedded read-only heavily compressed filesystem
- whose contents are delivered when requested (the entire dwarf/squashfs isn't uncompressed at once)
- allowing you to pack as many of the dependencies as you wish to carry in your archive (so, just like an appimage, any dependency which isn't packed can be found "live"
- and doesn't require any additional, custom infrastructure to run/serve
Neat!
https://gwern.net/doc/philosophy/religion/2010-02-brianmoria...
I will try on Chrome tomorrow.
Beyond that, depending on how badly the server is tampering with stuff, of course it could break the Gwtar, but then, that is true of any web page whatsoever (never mind archiving), and why they should be very careful when doing so, and generally shouldn't.
Now you might wonder about 're-archiving': if the IA serves a Gwtar (perhaps archived from Gwern.net), and it injects its header with the metadata and timeline snapshot etc, is this IA Gwtar now broken? If you use a SingleFile-like approach to load it, properly force all references to be static and loaded, and serialize out the final quiescent DOM, then it should not be broken and it should look like you simply archived a normal IA-archived web page. (And then you might turn it back into a Gwtar, just now with a bunch of little additional IA-related snippets.) Also, note that the IA, specifically, does provide endpoints which do not include the wrapper, like APIs or, IIRC, the 'if_/' fragment. (Besides getting a clean copy to mirror, it's useful if you'd like to pop up an IA snapshot in an iframe without the header taking up a lot of space.)
I find it easier to just mass delete assets I don't want from the "pageTitle_files/" directory (js, images, google-analytics.js, etc).
If you really just want the text content you could just save markdown using something like https://addons.mozilla.org/firefox/addon/llmfeeder/.
Yes I have. I tried maff, mht, SingleFile and some others over the years. MAFF was actually my goto for many years because it was just a zip container. It felt future-proof for a long time until it wasn't (I needed to manually extract contents to view once the supporting extension was gone).
I seem to recall that MHT caused me a little more of a conversion problem.
It was my concern for future-proofing that eventually led me back to "Save As..".
My first choice is "Save as..." these days because I just want easy long-term access to the content. The content is always the key and picking and choosing which asset to get rid of is fairly easy with this. Sometimes it's just all the JS/trackers/ads, etc..
If "Save as..." fails, I'll try 'Reader Mode' and attempt "Save as.." again (this works pretty well on many sites). As a last resort I'll use SingleFile (which I like too - I tested it on even DOS browsers from the previous century and it passed my testing).
A locally saved SingleFile can be loaded into FF and I can always perform a "Save As..." on it if I wanted to for some reason (eg; smaller file, js-trackers, cleaner HTML, etc).
I prefer it because it can save without packing the assets into one HTML file. Then it's easy to delete or hardlink common assets.
Yes. A web browser can't just read a .zip file as a web page. (Even if a web browser decided to try to download, and decompress, and open a GUI file browser, you still just get a list of files to click.) Therefore, far from satisfying the trilemma, it just doesn't work.
And if you fix that, you still generally have a choice between either no longer being single-file or efficiency. (You can just serve a split-up HTML from a single ZIP file with some server-side software, which gets you efficiency, but now it's no longer single-file; and vice-versa. Because if it's a ZIP, how does it stop downloading and only download the parts you need?)
what if a web server on localhost happens to handle the request? why not request from a guaranteed unaccessable place like http://0.0.0.0/ or http://localhost:0/ (port zero)
great job
I don't know if anyone else gets "unemployed megalomaniacal lunatic" vibes, but I sure do.
The Lighthaven retreat in particular was exceptionally shady, possibly even scam-adjacent; I was shocked that he participated in it.
Its almost as if someone charged you $$ for the privilege of reading it, and you now feel scammed, or something?
Perhaps you can request a refund. Would that help?