If you implemented your own parser for the schema and the JSON and only used an AST to validate + span information (which can just be a pair of u16s for start/end of a token) then you can collect your error messages very, very quickly and generate the error message once validation is finished.
Heavily optimized compilers will do this for semantic analysis and type checking, where the IR they use can be constructed quickly and then used with the input text to get helpful messages out, while the guts of the algorithm is only working with semantic information in a structure that's compact and easy to access/manipulate.
All that said, serde_json is incredibly convenient and giving up to write your own parser is a big hammer for a problem that probably doesn't need it.
I had a thought in my reply [0] on this that actually might let him eat his cake and have it too in this regard. I think you can heavily abuse serde::de::Visitor to schema validate without actually parsing (or with less parsing, at any rate). I went into more detail in my comment but I wanted to ping you (@duped).
Or, use sonic_rs::LazyValue if you want to parse json on demand as you traverse it.
struct Token {
token_type: u8,
type_info: u8,
token_start: u32, // offset into the source file.
token_len: u16
}
It's blazing fast. The lexer/parser can process millions of lines per second. The textual information is included, and the location information is included.It's been developed for embedded systems (it was written originally for a NATS implementation in the Zephyr RTOS), so it's a bit limited and there's no easy way to know where some parsing/type validation error happened, but the information is there if one wants to obtain it: https://github.com/lpereira/lwan/blob/master/src/samples/tec...
The fewer bytes you can pack data into the more data that you can process per second. Computers may be the most complex machines ever built but the simple fact of having fewer things to touch means you can touch more things in the same amount of time remains true.
- almost no cost in the non-erroring path
- no extra data structures to manage
- a lot less code
I think the post would benefit from a better analysis of why this doesn't work for them.
This idea was added after I wrote the post and wasn't taken from my own optimization efforts in `jsonschema`. Originally, in `jsonschema` the output type is actually a badly composed iterator and I intended to simplify it to just a `Result<(), ValidationError>` for the article, but with this output, there are actually way better optimizations than I originally implemented.
If I'd discovered this idea earlier, I'd probably spend more time investigating it.
Two suggestions: it’s not immediately obvious whether subsequent benchmark result tables/rows show deltas from the original approach or from the preceding one (it’s the latter, which is more impressive). Maybe call that out the first couple of times?
Second, the “using the single Vec but mutating it” option would presumably benefit from a reserve() or with_capacity() call. Since in that approach you push to the vector in both erroring and non-erroring branches, it doesn’t have to be exact (though you could do a bfs search to find maximum depth, that doesn’t strike me as a great idea) and could be up to some constant value since a single block memory allocation is cheap in this context. (Additionally, the schema you are validating against defines a minimum depth whereas the default vec has a capacity of zero, so you’re guaranteed that it’s a bad choice unless you’re validating an empty, invalid object.)
But I agree with the sibling comment from @duped that actually parsing to JSON is the biggest bottleneck and simply parsing to the minimum requirements for validation would be far cheaper, although it depends on if you’ll be parsing immediately after in case it isn’t invalid (make the common case fast, assuming the common case here is absence of errors rather than presence of them) or if you really do just want to validate the schema (which isn’t that rare of a requirement, in and of itself).
(Edit: I do wonder if you can still use serde and serde_json but use the deserialize module’s `Visitor` trait/impl to “deserialize” to an enum { Success, ValidationError(..) }` so you don’t have to write your own parser, get to use the already crazy-optimized serde code, and still avoid actually fully parsing the JSON in order to merely validate it.)
If this were in the real world, I would use a custom slab allocator, possibly from storage on the stack rather than the heap, to back the Vec (and go with the last design with no linked lists whatsoever). But a compromise would be to give something like the mimalloc crate a try!
> Two suggestions: it’s not immediately obvious whether subsequent benchmark result tables/rows show deltas from the original approach or from the preceding one (it’s the latter, which is more impressive). Maybe call that out the first couple of times?
Agree!
> Second, the “using the single Vec but mutating it” option would presumably benefit from a reserve() or with_capacity() call. Since in that approach you push to the vector in both erroring and non-erroring branches, it doesn’t have to be exact (though you could do a bfs search to find maximum depth, that doesn’t strike me as a great idea) and could be up to some constant value since a single block memory allocation is cheap in this context. (Additionally, the schema you are validating against defines a minimum depth whereas the default vec has a capacity of zero, so you’re guaranteed that it’s a bad choice unless you’re validating an empty, invalid object.)
Oh, this is a cool observation! Indeed it feels like `with_capacity` would help here
> But I agree with the sibling comment from @duped that actually parsing to JSON is the biggest bottleneck and simply parsing to the minimum requirements for validation would be far cheaper, although it depends on if you’ll be parsing immediately after in case it isn’t invalid (make the common case fast, assuming the common case here is absence of errors rather than presence of them) or if you really do just want to validate the schema (which isn’t that rare of a requirement, in and of itself).
My initial assumption was that usually the input is already parsed. E.g. validating incoming data inside an API endpoint which is then passed somewhere else in the same representation. But I think that is a fair use case too and I was actually thinking of implementing it at some point via a generic `Json` trait which does not imply certain representation.
> (Edit: I do wonder if you can still use serde and serde_json but use the deserialize module’s `Visitor` trait/impl to “deserialize” to an enum { Success, ValidationError(..) }` so you don’t have to write your own parser, get to use the already crazy-optimized serde code, and still avoid actually fully parsing the JSON in order to merely validate it.)
Now when I read the details, it feels like a really really cool idea!
> If this were in the real world, I would use a custom slab allocator, possibly from storage on the stack rather than the heap, to back the Vec (and go with the last design with no linked lists whatsoever). But a compromise would be to give something like the mimalloc crate a try!
Nice! In the original `jsonschema` implementation the `validate` function returns `Result<(), ErrorIter>` which makes it more complex to apply that approach, but I think it still should be possible.
0 1 M / Nth node
[ N elem ] [.] -> [N elem] [.] -> .... -> [M - (M / N) elem] [null]
And so if you want to index the `i`th element you chase the pointers index (node i) :
acc = 0
while acc + N < i
acc += N
node = node.next
return node.array[j - acc]
Or something like that? You can do better using a B tree and exploit the fact that you don't need keys to just store pointers in the branch nodes. This reduces the number of pointer dereferences in large arrays.For example say you have N = 8 and there are 1024 elements in the list, you would need 127 pointer dereferences to reach the end of the array. In a B-tree you only have 3. (double check my math, I might be off by one).
This is the typical implementation for immutable/persistent vector types. If you keep the links between leave nodes at the bottom of the tree, you've got a B+ tree and have fast iteration through it as well.
But you size the nodes dynamically, rather than using a fixed size.
I beg to disagree.
In kernels, drivers, and embedded systems they are very common.
Seems like the glaring exception to the rule!
Out of all the programmers in the world, what percentage of them do you think work in the kernel/driver/embedded spaces?
Second, linked lists are useful in a lot more places than that. Probably a better proxy would be low-level coders. You almost always want a linked list somewhere when you're dealing with memory addresses and pointers. Maybe not for the primary collections, but there are always secondary ones that need to maintain a collection with a different membership or ordering, and vectors of pointers don't have many clear advantages over intrusive linked lists for those secondary collections.
I did start to encounter some fresh grads with degrees that said "computer science" on them that couldn't answer some basic linked list questions. I was beginning to think it was a bad set of questions until I hit those kids. If you claim to know "computer science" and don't know what a linked list is, especially beyond some text books stuff, I'm probably not interested.
In embedded world, memory often needs to be exactly controlled, and allocation failures are fatal without a more complex MMU. In kernel world, I believe the main reason is that allocations can block.
In addition, a lot of data structures might be shared across multiple cores. Linked lists can be traversed and mutated concurrently (although with a bit of care).
I/e in a DMA-based ethernet driver, the ethernet MAC receives packets asynchronously from the processor, perhaps faster than the processor can ingest them. So the mac interrupts the processor to give it new packets, and the processor can't sit processing the packets in the interrupt context, so it needs to put them into some ordered list for processing later when it has downtime. In a true embedded system, the memory for this list is going to be fixed or statically allocated, but you still don't really want to have an array-style list with fixed indexing, as you'll have to manage what happens when the index wraps around back to 0 etc, so instead you just construct a linked list in that pre-allocated memory.
I wouldn't say linked lists aren't really used in high-level applications, as I said they're used all over the place whenever you have external asynchronous communication, it's just that modern high-level frameworks/libs totally abstract this away from most people writing high level code.
From my understanding this is what led to the recent 'unpatchable' exploit in Apple's M1, but rather then trying to guess it by some heuristic, why not just give compilers option to make that optimization?
It turns out programmers, or rather their employers, don't really care about using hardware efficiency. They care about shipping things yesterday, because that's how business deals get closed, and making software efficient is secondary to money changing hands. Performance never really matters to business people after the checks are cashed.
Multicore computers have been ubiquitous for more than a decade, yet the overwhelming majority of software built today is single-threaded microservices, where in they spend most of their time serializing and deserializing message payloads.
This is all really to say that most performance is already being left on the table for the majority of what computers are used for today.
Benchmarking is hard, and very often is done in ideal conditions, which never materialize in real use cases.
The major slowness of linked lists is cache inconsistency. New nodes can be put all over the memory space. However, if you can make sure all or part of the list exists in contiguous blocks of memory, then there's a good chance that when the CPU loads up the next node it will also grab the next 3 nodes in a cache line.
The closer these node addresses are in memory, the faster things will be.
Yes, if you have a simple choice between a vector and a linked list, then the vector is vastly superior due to locality and (for non-intrusive linked lists) allocations. So much so that vectors often win even when you're doing lots of O(n) deletions that would be O(1) with a linked list.
But that doesn't mean that linked lists are useless! A vector gives you a single ordering. What if you need multiple? What if you need a free list, which you're never going to be iterating over but will just grab off an item at a time? I find it quite common to have one "natural" order, for which I will use a vector (or equivalently, a bump allocator of fixed-size items), and then one or several auxiliary orders like the entries belonging to a particular owner or the ones that will need to be visited for some sort of cleanup or an undo list or some sort of stack or queue of pending items to process. Your common iteration will be in memory order, but that doesn't mean you won't ever want to do different iterations.
It annoys me that this is always omitted, with the attitude that linked lists are obsolete and useless because vectors be moar better faster gooder, to the point that it's a waste of time to learn how to manipulate linked lists anymore.
I guess a lot of this is probably due to the popularity of high level languages, where you're just dealing with references to everything in the first place. But in those, the arguments in favor of vectors are often not valid because that blessed golden vector of Objects is giving you no more locality than giving your Object a `next` field: the vector of Objects is represented internally as a vector of pointers to the actual Object data, so your oh so nicely cached lookups are doing memory reads that are 100% pure overhead compared to following a `next` link from an Object that fits into a cache line. In both cases, your performance is going to be limited by the locality of your Object data, which is the same whether you have a vector of pointers or Object data with an intrusive pointer.
Also, if you have an existing setup and need to introduce a new list, it is sometimes far easier to drop in a `next` (and maybe `prev`) field than to refactor everything to accommodate a new vector. Especially since the vector will move all of your data when resizing the vector, which invalidates any pointers you might be using for other purposes. If you'll be iterating that list frequently, then the vector may very well be a good idea. If it's just for error cases or slow paths, then linked lists really aren't bad at all.
I'm not trying to argue for linked lists here so much as arguing against the blanket arguments against them.
</rant>
You can very much have a single vector owning the memory, and do all other ordering over auxiliary vectors of indices. Should be cheaper and faster than holding more linked lists.
If you want to remove elements, you can very much use tombstones to flag deleted elements and then clean up after some threshold.
Cheaper in space depends on the percentage of elements in the auxiliary list. An intrusive list has space for every element to be in the list, which is wasteful if few are. A vector that grows by doubling could waste nearly half of its elements. Indexes can often be smaller than pointers, though, which favors the vector approach.
Faster is debatable. Iterating the vector of indexes is quite fast, but indirecting into the data in a random order is still slow. An intrusive linked list doesn't need to do the former, only the latter. (Then again, it also bloats up the data slightly, which matters for small items since fewer fit in cache.)
The main reason why linked lists could still be at an advantage is if you don't want allocations in your auxiliary list manipulation path. Maybe you don't want the unpredictable timing, or maybe you can't handle failure.
I agree on tombstoning, but note that you're giving up some of the vector advantages by doing so. Your processing loop now has a much less predictable branch in it. (Though really, the vector probably still wins here, since the linked list is built out of data dependent accesses!)
Sometimes these auxiliary lists need to contain things that are no longer in the main list, too, as in the case of a free list. (And if you swap such things to the end, then you've just invalidated all of your indexes.) And non-intrusive linked lists can point to variable-size elements (though they lose most of the other benefits of an intrusive linked list.)
Anyway, it's the usual "use the right tool for the right job", I just claim that linked lists are sometimes the right tool.
(Random side note: I use "indexes" instead of "indices" these days, for a silly reason—"indices" is a fine word, I have no issue with it, but it seems to encourage some people to use the non-word "indice" which is an abomination.)
In general, use vecs until you absolutely can't.
Perhaps there are a few uses for LL's in limited circumstances.
It would be interesting to see the performance of a `Vec<&str>` where you reuse the vector, but also a `Vec<u8>` where you copy the path bytes directly into the vector and don't bother doing any pointer traversals. The example path sections are all very small - 'inner', 'another', 5 bytes, 7 bytes - less than the length of a pointer! storing a whole `&str` is 16 bytes per element and then you have to rebuild it again anyway in the invalid case.
---
This whole article is kinda bad, it's titled 'blazingly fast linked lists' which gives it some authority but the approach is all wrong. Man, be responsible if you're choosing titles like this. Someone's going to read this and assume it's a reasonable approach, but the entire section with Vec is bonkers.
Why are we designing 'blazingly fast' algorithms with rust primitives rather than thinking about where the data needs to go first? Why are we even considering vector clones or other crazy stuff? The thought process behind the naive approach and step 1 is insane to me:
1. i need to track some data that will grow and shrink like a stack, so my solution is to copy around an immutable Vec (???)
2. this is really slow for obvious reasons, how about we: pull in a whole new dependency ('imbl') that attempts to optimize for the general case using complex trees (???????????????)
You also mention:
> In some scenarios, where modifications occur way less often than clones, you can consider using Arc as explained in this video
I understand you're trying to be complete, but 'some scenarios' is doing a lot of work here. An Arc<[T]> approach is literally just the same as the naive approach but with extra atomic refcounts! Why mention it in this context?
You finally get around to mutating the vector + using it like a stack, but then comment:
> However, this approach requires more bookkeeping and somewhat more lifetime annotations, which can increase code complexity.
I have no idea why you mention 'code complexity' here (complexity introduced by rust and its lifetimes), but fail to mention how adding a dependency on 'imbl' is a negative.
To me this is a self answering question.
The idea comes back to [0] which is similar to one of the steps in the article, and before adding `push` & `pop` I just cloned it to make things work. That's what Rust beginners do.
> This feels like you had a cool conclusion for your article, 'linked lists faster than vec', but you had to engineer the vec example to be worse. Maybe I'm being cynical.
Maybe from today's point in time, I'd think the same.
> It would be interesting to see the performance of a `Vec<&str>` where you reuse the vector, but also a `Vec<u8>` where you copy the path bytes directly into the vector and don't bother doing any pointer traversals. The example path sections are all very small - 'inner', 'another', 5 bytes, 7 bytes - less than the length of a pointer! storing a whole `&str` is 16 bytes per element and then you have to rebuild it again anyway in the invalid case.
Yeah, that makes sense to try!
> This whole article is kinda bad, it's titled 'blazingly fast linked lists' which gives it some authority but the approach is all wrong. Man, be responsible if you're choosing titles like this. Someone's going to read this and assume it's a reasonable approach, but the entire section with Vec is bonkers.
> Why are we designing 'blazingly fast' algorithms with rust primitives rather than thinking about where the data needs to go first? Why are we even considering vector clones or other crazy stuff? The thought process behind the naive approach and step 1 is insane to me:
> 1. i need to track some data that will grow and shrink like a stack, so my solution is to copy around an immutable Vec (???)
> 2. this is really slow for obvious reasons, how about we: pull in a whole new dependency ('imbl') that attempts to optimize for the general case using complex trees (???????????????)
That's clickbait-y, though none of the article's ideas aim to be a silver bullet. I mean, there are admittedly dumb ideas in the article, though I won't believe that somebody would come up with a reasonable solution without trying something stupid first. However, I might have used better wording to highlight that and mention that I've come up with some of these ideas when was working on `jsonschema` in the past.
> I understand you're trying to be complete, but 'some scenarios' is doing a lot of work here. An Arc<[T]> approach is literally just the same as the naive approach but with extra atomic refcounts! Why mention it in this context?
If you don't need to mutate the data and need to store it in some other struct, it might be useful, i.e. just to have cheap clones. But dang, that indeed is a whole different story.
> I have no idea why you mention 'code complexity' here (complexity introduced by rust and its lifetimes), but fail to mention how adding a dependency on 'imbl' is a negative.
Fair. Adding `imbl` wasn't a really good idea for this context at all.
Overall I think what you say is kind of fair, but I think that our perspectives on the goals of the article are quite different (which does not disregard the criticism).
Thank you for taking the time and answer!
- [0] - https://github.com/Stranger6667/jsonschema-rs/commit/1a1c6c3...