After this it is upto you. The papers involve references to lot of distributed systems literature. If you are interested you can go through resources here [4]. If you want to go a more hands-on way, I would also recommend reading AWS DynamoDB best practices (you can read up Cassandra or CouchDB also) documentation [5] to see the practical consideration while using these systems. Then try to use it or any other NoSQL database in a side project and see whether they are good fit. The data modelling would involve thinking hard about use-cases and would also help you compare this to relational systems.
[1] https://static.googleusercontent.com/media/research.google.c... [2] http://www.aosabook.org/en/nosql.html [3] http://www.allthingsdistributed.com/files/amazon-dynamo-sosp... [4] https://github.com/aphyr/distsys-class [5] http://docs.aws.amazon.com/amazondynamodb/latest/developergu...
(There must be something appealing to developers using JSON's style syntax rather than a Structured Query Language.)
There should be a solid reason to pick noSQL in general, and when such appear, picking the right one amongst the available noSQL platform is another job.
This is ranting.
I am a Postgres proponent but saying that PostgresSQL/mySQL/SQLite is the better choice in the vast majority of cases the parent has come across is reckless. The words were well chosen making the rant not that obvious.
There aren't good or bad DBs. Every DB has its strengths and respective trade-offs. As much I like Postgres, there so many use cases to use also other DBs and also NoSQL ones. I am not feeding the troll and starting reasoning why NoSQL can be terrific or SQL can be a big struggle, I am on both sides, both SQL and NoSQL have their place.
It's sad that a thread which is about learning NoSQL gets hijacked by a unrelated top comment opposing NoSQL.
Sorry to latch on I’m very eager to learn. Our stacks of choice are Django and Flask respectively, if that helps
“Trains are usually a better choice. Most people don’t need planes”
A: Not trolling, but X is vastly usually better than noX.
IDK what tolling is.
And it's never about JSON, it's about latency and resilience, about being able to simply add and replace nodes, about just working in a modern distributed environment.
It will not only help you understand what's "SQL" and "NoSQL" data stores, it also covers the differences between each of them, what problems they are designed to solve, how they try to solve it, and if it'll help with your problems as well.
Students seem to find the Dynamo paper to be the single most enlightening resource. It does a great job of explaining Amazon's use case and how the solution fits the problem. I also reference the relevant Red Book chapter and some students value that context.
It's worth noting that students are very comfortable with relational DBMSs by this point, both in theory and in practice. It quickly becomes clear to them that NoSQL is better called "no transactions", as they know the costs and benefits of various isolation levels in a traditional RDBMS. If you don't yet have an undergraduate-level background in database systems I'd encourage you to seek that out either first or at least along the way to understanding NoSQL systems. My recommendations for how to do this as a self-learner are up on https://teachyourselfcs.com.
[0] https://en.wikipedia.org/wiki/Consensus_(computer_science)
[1] https://en.wikipedia.org/wiki/PACELC_theorem
[2] https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...
I'm still learning how to determine when I should use NoSQL instead of SQL. My best advice is to carefully consider how to plan on querying your data. If you plan on making complex queries that link multiple relationships, NoSQL is not for you.
After I've optimized my query/indexes to get from 60s to like 4s running through usual stuff and trying to not do anything too stupid, how to get it to <200ms? Maybe better question how to structure data so you don't need the complex query?
Designing Data Intensive applications http://dataintensive.net/
It's slightly dated, but it still gives a strong overview of the different paradigms. The truth is what you want to learn probably differs greatly depending on the paradigm that fits your application. NoSQL databases can broadly be categorized into document-oriented, key-value store, columnar, and graph. This video will help you understand what (at least three) of those are. Then you can focus in on books/articles about the paradigm that makes the most sense for you.
Tutorial from Felix Gessert about NoSQL https://medium.baqend.com/nosql-databases-a-survey-and-decis...
and Slides https://www.slideshare.net/felixgessert/nosql-data-stores-in...
[1] See http://dataintensive.net
Their tips are here, and I think this applies to most/all NoSQL (someone correct me if I'm wrong.) https://firebase.google.com/docs/database/web/structure-data
The tl;dr is:
- Avoid complex queries. Structure data so that you can make simple queries that execute fast.
- Avoid nesting & flatten data as much as is reasonable.
NoSQL is easier to learn & use than SQL, there's lower barrier to entry, but the trade off is that it's less powerful than SQL, so you have to keep your data simple too.
Isn't this contradictory?
This is referring more to schema than data. In part what that means is to avoid nested indexes... subtle but different than avoiding any nesting at all. In other words, if you can treat the nested data as a blob, it's probably okay, but if it's being used for a query, it's adding complexity that can cause trouble.
Some of the reasons for that are Firebase-specific, it has to do with security rules and how security can get too complicated if you're not careful with nesting.
But I'd guess it still applies to other NoSQL data... nesting data as part of the schema is like making another table, and all the complexity that comes with it. Except it's a new table you can only get to by going through the first table.
A common problem with nesting is thinking you got the order right for your use case and later finding out you sometimes want to index by the inner data rather than the outer data. If you only have A/B (B nested in A) and you need to query for As, then you're fine. When you find out you need to query for Bs, you have a problem.
Firebase even recommends duplicating data, if necessary, to have two indexes A/B and B/A, rather than trying to query for nested data.
Then read this book for in-depth details - Designing Data-Intensive Applications : https://dataintensive.net/
and of course the orirginal papers from Amazon and Google.
If you have more questions - contact me at HN AT NoSql dot Com