if there's anyone i would trust in exploring these avenues, it's him and the maintainers doing god's work in the nodejs repo in these past few years.
I do not think it is wise to brag that your solution to a problem is extremely painful but that you were impervious to all the pain. Others will still feel it. This code takes bandwidth to host and space on devices and for maintainers it permanently doubles the work associated with evolving the filesystem APIs. If someone else comes along with the same kind of thinking they might just double those doubled costs, and someone else might 8x them, all because nobody could feel the pain they were passing on to others
> Bundle a full application into a Single Executable.
Embed a zip file into the executable, or something. Node sort of supports this since v25, see --build-sea. Bun and Deno support this for a longer time.
> Run tests without touching the disk.
This must be left to the host system to decide. Maybe I want them to touch the disk and leave traces useful for debugging. I'd go with tmpfile / tmpdir; whoever cares, knows to mount them as tmpfs, which sits in RAM. (Or a ramdisk under Windows.)
> Sandbox a tenant’s file access. In a multi-tenant platform, you need to confine each tenant to a directory without them escaping
This looks like a wrong tool, again. Run your Node app in a container (like you are already doing), mount every tenant's directory as a separate mount point into your container. (Similar with BSD jails.) This seems like the only problem that is not trivial to solve without a "VFS", but I'm not very certain that such a VFS would be as well-audited as Docker, or nsenter and unshare. The amount of work necessary for implementing that is too much for the niche benefit it would provide.
> Load code generated at runtime. See tmpfs for a trivial answer. For a less trivial answer, I don't see how Node's code loader is bound to a filesystem. If it can import via https, Just use ESM loader hooks and register() your loader, assuming you're running Node ≥ 20.6.
While the large code changes were maintained, they were often split up into a set of semantically meaningful commits for purposes of review and maintenance.
With AI blowing up the line counts on PRs, it's a skill set that more developers need to mature. It's good for their own review to take the mass changes, ask themselves how would they want to systematically review it in parts, then split the PR up into meaningful commits: e.g. interfaces, docs, subsets of changed implementations, etc.
Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write? I couldn't care less if it improves (best case scenario) my open source codebase, I simply don't enjoy the imbalance.
Well, the process you’re describing is mature and intentionally slows things down. The LLM push has almost the opposite philosophy. Everyone talks about going faster and no one believes it is about higher quality.
Note aside, OpenJS executive director mentioned it's ok to use AI assistance on Node.js contributions:
I checked with legal and the foundation is fine with the DCO on AI-assisted contributions. We’ll work on getting this documented.
[1]: https://github.com/nodejs/node/pull/61478#issuecomment-40772...It is great to have a legal perspective on compliance of LLM generated code with DCO terms, and I feel safer knowing that at least it doesn't expose Node.js to legal risk. However it doesn't address the well known unresolved ethical concerns over the sourcing of the code produced by LLM tooling.
Speed code all your SaaS apps, but slow iteration speeds are better for a runtime because once you add something, you can basically never remove it. You can't iterate. You get literally one shot, and if you add a awkward or trappy API, everyone is now stuck with it forever. And what if this "must have" feature turns out to be kind of a dud, because everyone converged on a much more elegant solution a few years later? Congratulations, we now have to maintain this legacy feature forever and everyone has to migrate their codebase to some new solution.
Much better to let dependencies and competing platforms like bun or deno do all the innovating. Once everyone has tried and refined all the different ways of solving this particular problem, and all the kinks have been worked out, and all the different ways to structure the API have been tried, you can take just the best of the best ideas and add it into the runtime. It was late, but because of that it will be stable and not a train wreck.
But I know what you're thinking. "You can't do that. Just look at what happens to platforms that iterate slowly, like C or C++ or Java. They're toast." Oh wait, never mind, they're among the most popular platforms out there.
It's not an AI issue. Node.js itself is lots of legacy code and many projects depend on that code. When Deno and Bun were in early development, AI wasn't involved.
Yes, you can speed up the development a bit but it will never reach the quality of newer runtimes.
It's like comparing C to C++. Those languages are from different eras (relatively to each other).
I can't help but wonder if this matter could result in an io.js-like fork, splitting Node into two safe-but-slow-moving and AI-all-the-things worlds. It would be historically interesting as the GP poster was, I seem to recall, the initial creator of the io.js fork.
If and when there is evidence that AI is actually increasing the speed of improvement (and not just churn), it would make sense to permit it. Unless and until such evidence emerges, the risks greatly outweigh the benefits, at least for a foundational codebase like this.
That sort of statement might also be sarcasm in another context: I personally use AI a lot, but also recognize that there are a lot of projects out there that are suffering from low quality slop pull requests, devs that kinda sign out and don't care much about the actual code as long as it appears to be running, alongside most LLMs struggling a lot with longer term maintenance if not carefully managed. So I guess it depends a lot on how AI is used and how much ideological opposition to that there is. In a really testable codebase it could actually work out pretty well, though.
If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.
Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.
On a more serious note, I think that this will be thoroughly reviewed before it gets merged and Node has an entire security team that overviews these.
> Many contributions contain routine, non-copyrightable material, and developers still sign off on them.
> Compilers change code in ways developers do not always track. Template generators create output from their own logic. Stack Overflow answers are often copied into codebases without much thought about licensing.
Who reviewed and approved the PR?
I like the idea of it mocking the file system for tests, but I feel like that should probably be part of the test suite, not Node.
The example towards the end that stores data in a sqlite provider and then saves it as a JSON file is mind-boggling to me. Especially for a system that's supposed to be about not saving to the disk. Perhaps it's just a bad example, but I'm really trying to figure out how this isn't just adding complexity.
node -e "new Function('console.log(\"hi\")')()"
or more to the point node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c + 'console.log(exports.say({ text: \"like this\"}))')())"
that one is particularly bad, because umd messes with the global object - so this works node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c)()).then(() => console.log(exports.say({ text: 'oh no'})))"I had to laugh, because the post you're replying to STRONGLY reminds me of this story, https://news.ycombinator.com/item?id=31778490 , in which some people on the GNOME project objected to thumbnails in the file-open dialog box because it might be a "Security issue" (even though thumbnails were available in the normal file browser, something those commenters probably should have known about, but didn't, but they just had to chime in anyway).
My current flow is to literally embed the JavaScript in the binary, then on start, write the JavaScript code to `/tmp/{random}` and point Node.js to execute the code at that destination.
A virtualized filesystem also allows for a safer "plugin" story for Node.js - where JavaScript plugins can be prevented from accessing the real filesystem.
Just my opinion, probably not a popular one. But I will be avoiding an upgrade to Node.js after 24.14 for a while if this is becoming an acceptable precedent.
I do see some original benefits to a VFS though, bad application decisions aside, but they are exceedingly minor.
As an aside I think JavaScript would benefit from an in-memory database. This would be more of language enhancement than a Node.js enhancement. Imagine the extended application capabilities of an object/array store native to the language that takes queries using JS logic to return one or more objects/records. No SQL language and no third party databases for stuff that you don't want to keep in offline storage on a disk.
> I think JavaScript would benefit from an in-memory database.
That database would probably look a lot like a JSON object. What are you suggesting, that a global JSON object does not solve?The more structures you have in a given application and the larger those structures become in their schemas the more valuable a uniform storage and retrieval solution becomes.
isn't that just global state, or do you mean you want that to be persistent?
I get it, I've implemented things for tests, I'm just wondering if this shouldn't be solved at an OS level.
--- update
Let's put this another way, my code does effectively, child_process.spawn('something-that-reads-and-write-a-file')
now I'm back to the same issue. To test I need a virtual file system. Node providing one won't help.
I do think it's more painful to distribute files when you're a distributed as a single binary vs scripts, since the latter has to figure out bundling of files anyway.
But still - it does exist
A ZIP fork embedded into the executable should be an obvious read-only VFS implementation. Bring your assets with you, even maybe build them with the standard zip utility.
It should take relatively few LOCs, provided that libzip is already linked into the executable anyway.
That’s so dehumanizing, I would happily write such code.
See https://pnpm.io/motivation
Also, while popularity isn't necessarily a great indicator of quality, a quick comparison shows that the community has decided on pnpm:
https://web.archive.org/web/20161003115800/https://blog.mozi...
Combined with a hackable IDE like Atom (Pulsar) made with the same tech it’s a pretty great dev exp for web devs
Python had shared packages for a long time and those are fine up to a point but circa 2017 I was working at a place where we had data scientists making models using different versions of Tensorflow and stuff and venv’s are essential to that. We were building unusually complex systems and having worse problems than other people but if you do enough development you will have trouble with shared packages.
The node model of looking for packages in the local directory has some appeal and avoids the need for “activation” but I like writing Python-based systems that define one or more command line programs that I can go use in any directory I want. For instance, if I want to publish one of my Vite projects I have a ‘transporter’ written in Python that looks for a Vite project in the current directory and uploads it to S3, updates metadata and invalidates cloudfront and all that. I have to activate it which is a minor hassle but then I can go to different Vite projects and publish them.
You can’t import or require() a module
that only exists in memory.
You can convert it into a data url and import that, can't you?There's Docker, OverlayFS, FUSE, ZFS or Btrfs snapshots?
Do you not trust your OS to do this correctly, or do you think you can do better?
A lot of this stuff existed 5, 10, 15 years ago...
Somehow there's been a trend for every effing program to grow and absorb the features and responsibilities of every other program.
Actually, I have a brilliant idea, what if we used nodejs, and added html display capabilities, and browser features? After all Cursor has already proven you can vibecode a browser, why not just do it?
I'm just tired at this point
¹E.g. if you've got music, and it's sorted `artist/album/track<n>.extension`, and two artists collaborate on an album, which one gets the album in their folder? What if you want to sort all songs in the display by publication date? Even if they use the files on your filesystem without moving them, some sort of metadata database will be needed for efficient display & search.
This is the biggest takeaway for me for AI. It's not even that nobody wants to do these things, its that by the time you finish your tasks, you have no time to do these things, because your manage / scrum master / powers that be want you to work on the next task.
No.
The alternative is that you work on the same number of features and utilize the ability to make those features as robust as you know they could be, but you have other pressing matters to attend to. That's weighing the ability of AI against the ability of neglect.
Sure you can. Function() exists and require.cache exists. This is _intentionally_ exploitable.
From https://github.com/jupyterlite/jupyterlite/issues/949#issuec... :
> Ideally, the virtual filesystem of JupyterLite would be shared with the one from the virtual terminal.
emscripten-core/emscripten > "New File System Implementation": https://github.com/emscripten-core/emscripten/issues/15041#i... :
> [ BrowserFS, isomorphic-git/lightningfs, ]
pyodide/pyodide: "Native file system API" #738: https://github.com/pyodide/pyodide/issues/738 re: [Chrome,] Filesystem API :
> jupyterlab-git [should work with the same VFS as Jupyter kernels and Terminals]
pyodide/pyodide: "ENH Add API for mounting native file system" #2987: https://github.com/pyodide/pyodide/pull/2987
- https://github.com/yarnpkg/berry/issues/7065
- https://github.com/nodejs/node/issues/62012
This is because yarn patches fs in order to introduce virtual file path resolution of modules in the yarn cache (which are zips), which is quite brittle and was broken by a seemingly unrelated change in 25.7.
The discussion in issue 62012 is notable - it was suggested yarn just wait for vfs to land. This is interesting to me in two ways: firstly, the node team seems quite happy for non-trivial amounts of the ecosystem to just be broken, and suggests relying on what I'm assuming will be an experimental API when it does land; secondly, it implies a lot of confidence that this feature will land before LTS.
yarn/node relations specifically are... complicated. On display on corepack (yarn project which got bundled into official nodejs distribution) issue tracker.
> secondly, it implies a lot of confidence that this feature will land before LTS.
This confidence is somewhat concerning. Will it get reviewed at all or has the "trust the LLM" mandate arrived at Node too now.
Not spamming, not affiliated, just trying to help others avoid so much needless suffering.
I expect yarn to have a real competitor sooner rather than later that will replace it; and I do wonder if it is this vfs module that will enable it.
The sqlar schema is missing some of the info thats being stored atm, but there's nothing stopping you from adding your own fields/tables on top of the format, if anything the docs encourage it. It is just a sqlite database at the end of the day.
What I really want is a way of swapping FS with VFS in a Node.js program harness. Something like
node --use-vfs --vfs-cache=BIG_JSON_FILE
So basically Node never touches the disk and load everything from the memoryNot saying vfs is bad, just it's not impossible in a few lines of code to set up that. My idea for a simple version of a vfs in node is to use a RAM disk/RAMfs - would that work?
Basically an "fs-core" that everything ultimately goes through, and which can be switched out/layered with another implementation. Think express-style routing but for the filesystem.
That'll keep things simple in node's codebase while handing more power to users.
The node.js codebase and standard library has a very high standard of quality, hope that doesn't get washed out by sloppy AI-generated code.
OTOH, Matteo is an excellent engineer and the community owes a lot to him. So I guess the code is solid :).
I miss those days where you can tweak all kinds of software GUI by your self. Change icons, menus, shortcut keys, etc.
(I know, I know, it's ugly and has its own set of problems)
These arguments don't even make sense, they look LLM generated. I can't even formulate a disagreement against this nonsense.
By far the most critical issue is the over reliance on third party NPM packages for even fundamental needs like connecting to a database.
Databases are third party tech, I don’t think it’s unreasonable to use a third party NPM module to connect to them.
Java also has a JIT compiling JS engine that can be sandboxed and given a VFS:
https://www.graalvm.org/latest/security-guide/sandboxing/
N.B. there's a NodeJS compatible mode, but you can't use VFS+sandboxing and NodeJS compatibility together because the NodeJS mode actually uses the real NodeJS codebase, just swapping out V8. For combining it all together you'd want something like https://elide.dev which reimplemented some of the Node APIs on top of the JVM, so it's sandboxable and virtualizable.
I'm not saying Node should support every db in existence but the ones I listed are critical infrastructure at this point.
When using Postgres in Node you either rely on the old pg which pulls 13 dependencies[1] or postgres[2] which is much better and has zero deps but mostly depends on a single guy.
In my opinion, the pg repo and packages are an example of how OSS stuff should be maintained. Clean repo, clean code, well-maintained readme, and clearly focus on keeping things simple instead of overcomplicating.
Node.js on the other hand is not owned or controlled by one entity. It is not beholden to the whims of investors or a large corporation. I have contributed to Node.js in the past and I was really impressed by its rock-solid governance model and processes. I think this an under-appreciated feature when evaluating tech options.
Open 80, closed 492.