I love being able to show that study, if you properly architect your sqlite system and am willing to purchase hardware, you can go a long long way, much further than almost all companies go, with your data access code needing nothing more than the equivalent of System.Data.Sqlite
1. Only use a single connection for all access. Open the database one time at startup. SQLite operates in serialized mode by default, so the only time you need to lock is when you are trying to obtain the LastInsertRowId or perform explicit transactions across multiple rows. Trying to use the one connection per query approach with SQLite is going to end very badly.
2. Execute one time against a fresh DB: PRAGMA journal_mode=WAL
If at this point you are still finding that SQLite is not as fast or faster than SQL Server, MySQL, et.al., then I would be very surprised.
I do not think you can persist a row to disk faster with any other traditional SQL technology. SQLite has the lowest access latency that I am aware of. Something about it living in the same process as your business application seems to help a lot.
We support hundreds of simultaneous users in production with 1-10 megs of business state tracked per user in a single SQLite database. It runs fantastically.
E.g.:
transaction {
if (expensiveFunction(query()))
update();
}
(My applications always get much faster when I plug them into Postgres after SQLite. But then I do do the odd sort and group by, not OLAP, but because they express the computation I want and Postgres is simply much better at that.)What about large aggregation queries, that are parallelized by modern DBMS?
Does it still scale that well if you have many concurrent read and write transactions (e.g. on the same table)?
If you can come to my company and replace our 96-core SqlServer boxes with SQLite I'll pay you any salary you ask for.
> SQLite scales almost perfectly for parallel read performance (with a little work)
They aren't using stock SQLite, they're using SQLite wrapped in Bedrock[1], and their use case is primarily read-only.
SQLite is fantastic at read-only, or read-mostly, use cases. You start to run into trouble when you want to do concurrent writes, however. I tried to use SQLite as the backend of a service a couple of years ago, and it locked up at somewhere around tens of writes per second.
[1]: www.bedrockdb.com
(a) built their own transaction/caching/replication layer using Blockchain no less.
(b) paid SQLite team to add a number of custom modifications.
(c) used expensive, custom, non-ephemeral hardware.
Now you could do all of this or just use an off the shelf database that you aren't having to write custom code to use and if you choose a distributed one e.g. Cassandra will be able to run on cheap, ephemeral hardware.
(a) They implemented a very boring transaction/caching/replication layer that is like any other DB except they borrowed the idea that "longest chain" should be used for conflict resolution.
(b) They worked with upstream to get a few patches that were unique to their use-case. Once you're in deep with any DB this really isn't that uncommon.
(c) They used a dedicated (lol non-ephemeral) white-box server that has a lower amortized cost than EC2.
(d) Bedrock isn't bound to the hardware. You could run it on EC2 and reap the benefits just the same except you'd pay more.
But what does "properly architect your sqlite system" mean and how does this compare to just spinning up a postgres service (nothing sharded or fancy otherwise)?
- https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q... - https://bedrockdb.com/
I've been so many solutions that would be easily and reliably implemented on a single or small SQL database cluster of various types that turn into these complex systems to avoid the costs of scaling up the RDBMS.
SQLite resources are a lot lower due to the database being a flatfile on the OS. It's only main resource is storage.
MySQL is a application that not only requires configuration, tweaking, turning and tender-loving-care but consumes constant resources utilising the processor, memory and storage.
Not knocking SQLite - great for desktop apps or maybe local dev environments.
You can store it in localStorage and read it making your reload be a lot smaller. If it's not in localStorage, you can just request a fresh copy of SQLite from us.
I recall that F Scott Fitzgerald said "the test of a first-rate intelligence is the ability to hold two opposed ideas in mind at the same time and still retain the ability to function."
The value of new methodologies, languages, and techniques is partly that the enthusiastic proponents of them are given a chance to prove out that there is value, and so become motivated to go the extra distance to achieve the project specific outcome.
This value is destroyed if people are forced to use the technique, instead of championing its introduction. So measurement is made even harder!
My favorite thing about static typing is that it makes code more self-documenting. The reason I love Go specifically is because if you have 10 people write the same thing in Go, it's all going to come out relatively similar and use mostly built-in packages. Any API requests are going to be self-documenting, because you have to write a struct to decode them into. Any function has clear inputs and outputs; I can hover a parameter in my IDE and know exactly what's going on. You can't just throw errors away; you always are aware of them, and any functions you write should bubble them up.
Typescript addresses this somewhat, but basically offsets that complexity with more configuration files. I like Typescript in use, but I can't stand the fact that Javascript requires configuration files, transpilers, a million dependencies. Same for Python and mypy.
Yes, I could just look at class members in a dynamic language, but there's nothing that formally verifies the shape of data. It's much more annoying to piece apart. I don't use static analyzers, but my guess is that languages like Go and Rust are the most compatible with them. Go programs are the closest thing to a declarative solution to a software problem, of any modern language IMO. As we continue experimenting with GPT-generated programs, I think we're going to see much more success with opinionated languages that have fewer features and more consistency in how finished programs look.
Microservices are also great at making large applications more maintainable, but add additional devops complexity. It's harder to keep track of what's running where and requires some sort of centralized logging for requests and runtime.
success, err := fail()
do(success)
success, _ := fail()
fail()True, if you completely ignore the function's return values, you can throw errors away, but then you wouldn't be using the language in the way that makes it powerful to me; that there are simple and clear interfaces that you interact with.
The first and third one fails on go vet. The first one also fails to compile if you never read from err in the entire function.
That list reminds me of [1], which rants about this state of affairs and [2] that puts many beliefs to the test.
[1] https://youtu.be/WELBnE33dpY
[2] https://www.oreilly.com/library/view/making-software/9780596...
I'm hoping to have a written version by the end of September.
I'm somewhat ADD and get bored easily, so not only do I need to do something more like software but I also have to stay as broad and high-level as possible within the discipline, to stave off ennui. NOTE: very much not arguing this is a good way to go through life.
The hardest part of most projects is taking unrealistic and ever changing client demands and trying to turn them into something that will actually work in reality; a process which is probably all too familiar to many software developers.
I think software 'engineering' is uniquely ambiguous in this regard, because software development as a discipline is in equal parts both design, and construction, and the design part bleeds into the 'construction' part, corrupting it (for want of a better word) in a way you would imagine that 'pure' engineering would not.
The issues are us - the developers. The machine code is mostly fine...
This seems applicable to everything, almost, in this whole experiment humans have going on here on planet earth, it just doesn't seem like it. To see it, you have to have (at least) the ability and willingness to look.
If the spec is 5x more complicated than the code would be then I'm not sure I see much of a point coz you're just creating different spaces for bugs to hide in.
We are throwing a lot of resources against a problem because we are not able to educate people good enough to understand basic performance optimizations.
You are a Data Scientist/anyone else and you don't understand your tooling? You are doing your job wrong.
It would be nice to have proper studies, but it‘s difficult to control the other variables ...
I'm honestly not convinced it helps that much. And it seems to cost a lot to me.
I like database and API schemas though. And I like clojure.spec and function preconditions a lot.
I just don’t get how people are working that it represents a time cost rather than a large time savings. I don’t mean that as a dig, I just mean I genuinely don’t know what that must look like. And I’ve written a lot more code in dynamic languages, and got my start there, so it’s not like I “grew up” writing Java or something like that.
On the other end people who prefer weakly typed languages see problems as primarily that of data transformation. For example from HTML Form to Http Request to SQL Db to CSV File and so on.
Both approaches are differentiated by the perspective on the problem.
That, plus type inference makes the static typing pretty painless.
There's also the case that I find the type systems of Rust, Elm, etc to be much more helpful than the type systems of C++ or Sorbet (type system for Ruby).
But my current job has very, very little that would benefit from static typing. Adding it into the mix would slow us down, both literally and figuratively.
However I do think static typing provides an enormous benefit to picking up code that is 5 years old and written by someone else. The ability to see “this is a nonnullable int32 value type” greatly reduces the amount of paths you have to go down when you have to change something or understand what’s going wrong with it. Tradeoff is you end up with a lot more code to maintain...
For example I‘m using TypeScript with a GraphQL code generator. Now let‘s assume I add a new value to a GraphQL enum. I run codegen, then fix everything until the compiler is happy. Afterwards, all places where this enum was ever touched will take it into account correctly, including mappings, translations, all switch statements, conditions, lists where some of the other values are mentioned and so on.
This is something that‘s not possible in a dynamic language and it‘s not even possible in Java, really.
I rely on this daily.
I want my code to be clear and with certain expectations fulfilled, rather than a mystery in front of me. I'm not there to learn what could be passed into my functions - I'm there to create functionality.
In saying that, I'm interested in if there is any accepted, peer reviewed literature with quantifiable data as to whether strongly typed languages are "better" (whatever the study might define as better such as being faster, more scalable, etc). From what I've heard and read, most of the better-ness that strong typing provides is related to people problems and being able to scale a team, not necessarily scaling a system or making the system better. When learning Go and TypeScript after primarily writing Ruby and Javascript, I'm convinced of the better-ness strong typing provides whether it's related to readability, better IDE intellisense, or speed (although Go for example is faster then Ruby and JS not just because it's strongly typed, but compiled), I'm just interested in if there's real data to support using them instead of anecdata.
The TypeScript Tax: A Cost vs. Benefit Analysis - https://medium.com/javascript-scene/the-typescript-tax-132ff...
The author leans against TypeScript, but does cite some relevant studies on the benefits.
---
One of the cited articles:
To Type or Not to Type: Quantifying Detectable Bugs in JavaScript - http://earlbarr.com/publications/typestudy.pdf (PDF)
So I’d guess that the number of type related bugs in dynamic languages is just a little bit greater than in static languages, simply because it is harder to make that kind of mistake in a typed language. But as a category, they aren’t common mistakes in the first place.
I can confidently say that I’m a bit of an expert at writing bugs :) and of all the kinds of bugs I write, type related bugs are probably no where near the top of the list.
That’s not to say that static typing isn’t better - I definitely think it is. But I can also believe that it doesn’t necessarily reduce the bug count by a huge margin. (For whatever it’s worth I think the main benefits are documentation and refactoring...)
But a lot of the time, their language is simply unable to encode certain properties as types, so by definition they don't think of some classes of bugs they do write as "type errors". Maybe in a statically typed languages they would have been type errors indeed!
It's as if the tool you use sometimes reinforces your blind spots: "you don't know what you don't know".
PS: anecdote is probably irrelevant, but I've written plenty of dumb type errors with Python. Things that would have been caught by a test or an external tool, sure, or the type checker of a statically typed language could have caught for me for free, leaving the more relevant logic tests to me. I tend to write type errors left and right. Maybe I'm simply not a good Python programmer, of course!
I also believe that, especially after it outscales what one person is able to (fully) overview. Static types are something a program can reason about so it allows so much more productivity boosting tooling to be created. This also goes way beyond simply catching type errors at compile time vs. runtime (a downside which can largely be mitigated by test coverage). Just look what an IDE for e.g. Java can do simply in helping you navigating a big codebase. Then throw in refactoring which is in many cases can even be a completely automated operation and in much more cases is at least greatly assisted by the tools. Tools for dynamic languages can often at most guess, making good guesses is hard so in practice you get mostly stuff which is pretty limited in its usefulness.
(Not a balanced evaluation, just cherry-picking failures. But I suppose we're at the point in the hype cycle where it's easier to find success stories being talked about.)
The hype is huge in that train, but I'm sure there must be lots of professionals that have already learned about its shortcomings. Not sure if proper studies exist about Kubernetes yet, though. Hopefully you'll get a PR with some content.
There used to be an often cited paper by Boehm about the cost of catching bugs early vs late on production, usually mentioned by advocates of testing early, where the quoted conclusion was something like "studies show it's 10 times more costly to catch bugs late on production" or something like that. This is a very well known study, I'm likely misquoting it (the irony!) and readers here are probably familiar with it or its related mantra of early testing.
I haven't read the paper itself (I should!), but later someone claimed that a- Boehm doesn't state what people quoting him say he said, b- the relevant studies had serious methodological problems that call into question the conclusion he did say, c- there are plenty of examples where fixing bugs late on production wasn't particularly costly.
edit: I'm not arguing testing isn't necessary, in case that upset someone reading this post. I'm not really arguing anything, except that the study by Boehm that most people quote was called into question (and was probably misquoted to begin with). This doesn't prove/disprove anything, except maybe hinting at a possible Cold Shower. It does show that we as a field have a serious problem in software engineering with backing up claims with well designed studies and strong evidence, but this shouldn't come as a surprise to anyone reading this.
Often a fancy new thing is introduced with a very long list of pros: "fast, scalable, flexible, safe". Rarely, is a list of cons included: "brittle, tough learning curve, complicated, new failure modes".
This practice always strikes me as odd because the first law of engineering is "everything is a trade-off". So, if I am going to do my job as an engineer I really need to understand both the "pros" and "cons". I need to understand what trade-off I'm making to get the "pros". And only then can I reason about wether the cost is justified.
I would not have expected that. Still, I prefer to use full(er) identifiers. I don't like to guess how things were abbreviated, especially when consistency isn't guaranteed. If I were using a different language and IDE, this might be better.
If you don't have more data than can fit on a reasonably large hard drive, you do not have big data and you are likely able to process it faster and cheaper on one system.
Today that threshold would be around 10TiB.
Thoughts on this one? I found the presentation to be somewhat mixed.
I found the initial comb through of the agile principles to be needlessly pedantic ("'Simplicity... is essential' isn't a principle, it's an assertion!"); anyone reading in good faith can extract the principle that's intended in each bullet of that manifesto.
The critique of user stories (~35 mins in) was more interesting; it's something we've been bumping up against recently. I think the agile response would be "if your features interact, you need a user story covering the interaction", i.e. you need to write user stories for the cross-product of your features, if they are not orthogonal.
I'm not really convinced that this is a fatal blow for user stories, and indeed in the telephony example it is pretty easy to see that you need a clarifying user story to say how the call group and DND features interact. But it does suggest that other approaches for specifying complex interactions might be better.
Maybe it would be simpler to show a chart of the relative priorities or abstract interactions? E.g. thinking about Slack's notorious "Should we send a notification" flowchart (https://slack.engineering/reducing-slacks-memory-footprint-4...), I think it's impossible (or at least unreasonably verbose) to describe this using solely user stories. I do wonder if that means it's impossible for users to understand how this set of features interact though?
Regarding the purported opposition in agile to creating artifacts like design docs, it's possible that I'm missing some conversation/context from the development of Agile, but I've never heard agile folks like Fowler, Martin, etc. argue against doing technical design; they just argue against doing too much of it too early (i.e. against waterfall design docs and for lean-manufacturing style just-in-time design) and that battle seems to have largely been won, considering what the standard best-practices were at the time the Agile manifesto was written vs. now.
Rxjs, etc.
Angular uses typescript and rxjs excessively and, while I used to like typescript, the combo has made me reconsider.
Rxjs send like an overcomplex way to do common tasks. Has RRP caught on anywhere else? Is there a usage that doesn't suck?
All research is inconclusive? Sure. I wonder what kind of type systems were in there? I guess Java and similars are accounted and yet I wouldn’t put any faith in them. ML, Swift, Haskell... now that’s something else.
1. e.g. what percentage of the gains attributed to Lisp were more likely due to the candidate pool in the 90s/2000s skewing heavily towards people who learned it at elite CS programs, especially if you're doing a challenge competition which benefits from having studied various algorithms?
I was working on a new type of locking mechanism and thought I would be smart by modelling it in spin [http://spinroot.com], which has been used for these kind of things before.
I ended up with a model that was proven in spin, but still failed in real code.
Given that's anecdata with a sample size of 1, but still was a valuable experience to me.
Shower: Out-of-date sources
This article isn't about showers, nor positive results, making the title quite confusing :)
[1] https://www.medicalnewstoday.com/articles/325725 [2] https://www.wimhofmethod.com/science
https://news.ycombinator.com/item?id=8289007
Topic: The curious case of the cyclist’s unshaven legs
From a comment (this part clearly intended to be witty I think): Really, I thought it was weird, and probably inappropriate, to mix in so much of an outsider's amateur and unsopported opinion about science into an otherwise interesting story about leg hair drag.