I was shocked to learn that was 15 years ago when I looked up the link to share, but if you are interested in the topic of "how to implement a database" it may be worth a look.
For what it is worth it was listed (not by me) on the C2 Wiki on the "Programs to Read" Page, where it was described as "[A] database written in Java with good unit tests and ShortMethods." [1]
Both statements are true, for a complete working example of a production database (it supported a commercial product for at least 10 years) it is actually a pretty accessible and well documented code-base.
The project is called AxionDB and can be found at [2].
[1] http://wiki.c2.com/?ProgramsToRead [2] http://axion.tigris.org/source/browse/axion/
* LMDB is legendary for its performance, but it is source code is kind of hard to read. Found there's a pretty decent port to Java which looks a lot more readable [1].
* LMDB doesn't support SQL. But there's this SQL implementation: Apache Calcite [2]. Creating a DB using llmdbjava + Calcite sound like an interesting project.
1: https://github.com/lmdbjava/lmdbjava/tree/master/src/main/ja...
It's been a very long time so I'm not sure I can name many specific examples, but IIRC some of the topics that became very clear were things like:
* the importance of the order in which JOINs and WHEREs are applied (to limit the number of rows being accessed per step)
* the relationship between columns that are selected and those that appear in WHERE and ORDER BY clauses.
* the value of tables with a small number of columns (limiting data that must be read per row)
* the cost/complexity of variable-width columns (VARCHAR vs a fixed-length string), again because of the time and complexity required to do something like "skip ahead three rows" in a data file
* the behavior of CLOB/BLOB types (stored external to the "main" table content)
* the various types of transaction isolation levels and the conditions in which it becomes hard or impossible to guarantee isolation without table or row locking
* etc.
The references at http://axion.tigris.org/readings.html describe some of the theoretical concepts we seemed to think were important or useful at the time.Also, there's a post at http://heyrod.com/articles/radio-blog/pleasures-of-profiling... that describes a session of performance-tuning on the index implementation that gets into some of the nitty-gritty implementation topics. (But a lot of the issues addressed there were an artifact of Java's primitive vs. object representation, which has been reduced by things like generics.)
The more we used HSQL the more it became clear that it was more like a SQL-parser wrapping a simple key/value store than a full-on ACID database. We created AxionDB precisely because we needed the kinds of capabilities you mention and at the time HSQL did not provide them and wasn't remotely architected to support them.
To be honest though, creating a moderately robust RDBMS from "scratch" turned out not to be the most ambitious or complex part of the overarching project that spawned AxionDB. The harder part was trying to use Java's primitive, built-in HTML-renderer to create something approximating a fully featured browser. The effort and complexity behind something like Gecko, WebKit, Edge, Blink, etc. is very easy to underestimate. It's a hard problem, made much harder by having to tackle the kinds of content you find "in the wild. Frankly building a database was a much more straightforward problem than that.
Instead, there's just a key-value store implemented on top of a javascript hashmap and a filesystem.
I don't want to put more content under a terrible post, but the best resource for this material Jennifer Widom's MOOC.
Build a working Tetris game from literal logic gates up.
Unless the project specifically needed to leverage features of the language, or a web browser, it's an incredibly poor choice for building anything with well maintained abstractions. Or anything at all ready, when the language is covered in warts.
I imagine the author hasn't yet discovered for themselves why it's a poor choice, given only a key-value store has been implemented (using JSON.stringify, no less)
I spent a lot of my life being a JavaScript hater and after working with it for four years I can definitely see how it is really a simple starting point for a lot of concepts. Sure it has warts. What doesn’t?
Have you worked with a JS language for any extended period of time? What about that made you not even want to consider it for anything? I’m really curious, I’m not trying to set you up - I’m really not exeperienced enough to do so :)
Edit: to be clear because of general curiosity and because I hate bringing work home, most side projects I work on are in some other language.
Not really. The author doesn't leverage any kind of static typesystem, which would at least mitigate some of the warts. He's chosen Node.js instead of leveraging the browser, so no free visualization layer for doing anything interesting with, unfortunately.
> Have you worked with a JS language for any extended period of time?
Most of my day job is working in a really, really big Javascript codebase. I can say with confidence that the language is something that we're absolutely stuck with, and I still have no idea why somebody would implement a pedagogical database with it (well, then again, they haven't; they've implemented a key-value store, which is trivial).
To offer an alternative, they could have chosen something boring but everywhere like Java, which is just as accessible to those with less experience. Then they'd have the possibility of doing fine-grained parallelism, file access, designing abstractions that fit within a statically typed language.
Maybe you’re right, but this page doesn’t actually teach you databases, and most of those courses don’t actually teach you programming.
At least not efficiently.
I have worked with JS for a long time, and I don’t hate it, but it’s such a terrible language and environment that the most popular part of it is literally a strict syntactical superset of of it.
I don’t particularly like Typescript by the way, and I don’t think it’s really that useful, but you can’t deny that most people do.
I think 95% of all projects would be better of using something that wasn’t JavaScript for the whole stack, and I think that’s the reason so few things are build on node. I like graphql as much as you do, I also think Prisma is okish, but I’d rather use Django, Flask, Java Spring it .Net Core because they are so much more efficient in the long term.
Of course on the client side, all the innovations and all the talent lies with JS, and there is some advantages for using JS for your whole stack. Those advantages end at the DB though, at least in my opinion.
This isn’t a problem, you can use Prisma for Postgres, and there are decent drivers for mssql, but I’d never advise people to use nosql unless they had a very specific reason for doing so, and I can’t think of one.
I don't think the JS part is the issue. Node.js does I/O, files and co. The issue is that the article doesn't teach how to build a database at all.
It doesn't explain how to efficiently persist and fetch data from a file, indexing strategies with trees, concurrent file access,locking, basic transaction... that's what I expect from a tutorial about how to build a basic DB system.
Why choose a language plagued with warts and a half-baked ecosystem for something pedagogical? A language with all sorts of peculiarities from the '95 browser era, and not at least leverage browser technology? A language which gives you very non-interesting coarse control over resource, and has no concept of parallelism?
Does anyone have DB internal book recommendation that inst' boring as hell.
Fair warning: this is not for beginners. If you find the going too hard, start with a DB textbook.
Also, that site doesn't seem to include the papers themselves. But most of those papers are very famous, so if you search for the titles, you should find copies. Worst case, you might need to visit a university library.
which one can teach how to build a DB system from scratch? I'm not talking about SQL theory or implementing a SQL parser but the actual persistence, indexing part.
https://www.amazon.com/dp/1558605088/?coliid=IYEILMZI5DVNM&c...
and
Transaction Processing: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems) ISBN-13: 978-1558601901
https://www.amazon.com/dp/1558601902/?coliid=I2GWJZ9XJ5D4JI&...
https://www.apress.com/gb/book/9781484219638
It's specific to MS SQL Server but I found the chapters on log file management, indexes & isolation levels to be accessible and enjoyable.
https://www.amazon.co.uk/Database-Management-Systems-Raghu-R...
For anyone else who is interested in learning how to build a database, can I thoroughly recommend following along with Andy Pavlo's Advanced Database Systems course from CMU[1]. Every lecture is accompanied by reading lists, notes, and assignments. Whats more, I find Andy's style to be very easy to parse even on complex topics.
Even if you think you know a fair bit about this domain, you will likely learn a lot!
[1]https://www.youtube.com/playlist?list=PLSE8ODhjZXjYplQRUlrgQ...