Dynamic Typing in SQL (opens in new tab)

(rockset.com)

97 pointscanadi7y ago24 comments

24 comments

combatentropy7y ago

SQLite is also dynamically typed, https://www.sqlite.org/datatype3.html

imveeve7y ago

hi, this is Venkat from Rockset.

yes, that is correct. We are big fans of SQLite too.

manigandham7y ago

Just learned about Rockset today. Seems interesting but that pricing is way off considering that a columnar data warehouse like Snowflake has great JSON support and can handle way more data for far less. For 10M rows, even a single-node Postgres or MongoDB install will run just fine.

It's not an OLTP system, so is the primary use-case for Rockset to be an OLAP system with low response times?

canadiOP7y ago

Hey, Igor from Rockset here.

Rockset’s primary use-cases are: 1/ developers building low-latency operational applications, esp. combining real-time data sets with other structured data sets (eg: you are building a microservice to relieve pressure from your OLTP system) 2/ data scientists wanting to quickly test hypotheses on different structured and semi-structured datasets without having to stand-up any servers or do any ETL or data prep. (you can suspend collections/documents in Rockset when you don’t use them -- our pricing page currently only lists Active Documents’ pricing)

Rockset is mutable which allows it to keep itself in sync with any data source, unlike columnar data warehouses, which are not optimized for data manipulation.

Rockset’s strong dynamic typing allows it to treat JSON as a data representation format rather than a special data type or a storage format. So, once you load JSON data into Rockset, you can access all fields at all levels without any special JSON operators or functions.

Comparing Snowflake with Rockset is perhaps akin to comparing Teradata with Elasticsearch. Both useful systems but built for very different use cases.

The biggest thing Rockset has in common with Snowflake is in sharing the philosophy that data management systems have to be built ground up for the cloud to take full advantage of cloud economics. Our blog (https://rockset.com/blog/) has a few posts on these already and we will write more.

manigandham7y ago

The SQL with dot notation is nice, and something other databases have failed at doing with JSON. As for updates, it looks like _id is set automatically, and there's no UPDATE endpoint listed in the docs, so how is data mutated (by key)?

And my comment was more about the price, because at low volume of 10M records, any database system can already do fast queries. From the whitepaper, it looks like the indexing is intensive and that's where most of the cost is coming from, and explains the active/passive storage and tiered rocksdb-cloud setup.

Interesting use-case for prototyping but I don't see how it is cost-effective for higher-scale usages. Congrats on the launch though.

1 more reply

NoiseByNW7y ago

> Strong, because values have one specific type (or None)

Because its type system does not provide implicit type conversion I guess. But some languages have a stronger type system with, in addition, highly qualified types (e.g. constness, range, sign or any custom contract).

> Dynamic, because variables acquire type information at runtime, and the same variable can, at different points in time

This is not a characteristic of dynamic typing, some statically typed languages supports variable shadowing in the same scope. A dynamic (or hybrid) typing system is IMHO simply a system which provides a variant type as first class citizen.

tudorb7y ago

> But some languages have a stronger type system with, in addition, highly qualified types (e.g. constness, range, sign or any custom contract).

Sure. Those are overly simplistic categories, but a full discussion of type systems in programming languages would be out of scope for that post.

> some statically typed languages supports variable shadowing in the same scope

Shadowing is a different issue. Shadowing refers to two different variables (which happens to have the same name), so it's orthogonal to whether the same variable is allowed to hold values of different types at different times.

ubernostrum7y ago

You're welcome to your humble opinion.

Your opinion is not generally accepted, though. The most common definitions of static versus dynamic typing boil down to whether both names and values have types and those types must be compatible (static) or whether only values have types and no checking for compatibility with names is required (dynamic). That is:

    a = 3
    a = 'foo'

The above probably fails in a statically-typed language, because 'a' will be a name of a type incompatible with the value 'foo'. In a dynamically-typed language it's perfectly OK.

AnaniasAnanas7y ago

> The above probably fails in a statically-typed language

I am not sure that this is always correct. I am pretty sure that your example would be accepted in some static type systems with union types such as https://www.cl.cam.ac.uk/~sd601/papers/mlsub-preprint.pdf

1 more reply

seanmcdirmid7y ago

There are also static type systems that support dynamic inheritance (eg the one designed for Cecil) where the type of a value can change over time.

sly0107y ago

SQL types are actually default nullable, which puts them to the "not typed enough" category for my taste.

tudorb7y ago

Then you really won't like it that we're moving in the opposite direction :)

I personally like strong, static type systems. But a lot of real world data doesn't match static schemas, which is why businesses hire data engineers to (build systems to) clean it up.

sly0107y ago

Oh, the real world is strongly typed. It's just the that the type is "json.Value" :(

I of course recognize that a database is not very useful if you can't get your data in there.

bwestergard7y ago

It would be helpful to see a side by side comparison with Postgres JSON schemas/queries.

bwestergard7y ago

Even supposing this is an easier way to do without constraints, the effort to weaken static guarantees should be looked at askance. The great merit of Postgres' JSON types is that they <i>can</i> be used without losing strong data integrity guarantees.

tudorb7y ago

Yes. In our case, though, Rockset is not a transactional database, but a query engine on top of existing data, so we're not the right place to enforce constraints anyway.

tudorb7y ago

That's on my TODO list, both in terms of performance and in terms of features / ease-of-use. (Another interesting thing to compare against is SQL++ (https://ci.apache.org/projects/asterixdb/sqlpp/manual.html) which has a couple of usability affordances that we might adopt.)

DLA7y ago

And a comparison to CockroachDB which also has excellent JSON support along with all the benefits of SQL. See: https://www.cockroachlabs.com/blog/json-coming-to-cockroach/

tudorb7y ago

I think the main difference between our flavor of SQL and CockroachDB's (or PostgreSQL's) is that we don't have a specific "json" data type, but rather:

1. We don't need you to specify the types of any fields, indeed, we don't even need to know upfront which top-level fields ("columns") may exist in the document.

2. You don't need to use special operators / functions to deal with JSON-typed values.

j / k navigate · click thread line to collapse

24 comments

combatentropy7y ago

SQLite is also dynamically typed, https://www.sqlite.org/datatype3.html

imveeve7y ago

hi, this is Venkat from Rockset.

yes, that is correct. We are big fans of SQLite too.

manigandham7y ago

It's not an OLTP system, so is the primary use-case for Rockset to be an OLAP system with low response times?

canadiOP7y ago

Hey, Igor from Rockset here.

Rockset is mutable which allows it to keep itself in sync with any data source, unlike columnar data warehouses, which are not optimized for data manipulation.

Comparing Snowflake with Rockset is perhaps akin to comparing Teradata with Elasticsearch. Both useful systems but built for very different use cases.

manigandham7y ago

Interesting use-case for prototyping but I don't see how it is cost-effective for higher-scale usages. Congrats on the launch though.

1 more reply

NoiseByNW7y ago

> Strong, because values have one specific type (or None)

> Dynamic, because variables acquire type information at runtime, and the same variable can, at different points in time

tudorb7y ago

> But some languages have a stronger type system with, in addition, highly qualified types (e.g. constness, range, sign or any custom contract).

Sure. Those are overly simplistic categories, but a full discussion of type systems in programming languages would be out of scope for that post.

> some statically typed languages supports variable shadowing in the same scope

ubernostrum7y ago

You're welcome to your humble opinion.

    a = 3
    a = 'foo'

The above probably fails in a statically-typed language, because 'a' will be a name of a type incompatible with the value 'foo'. In a dynamically-typed language it's perfectly OK.

AnaniasAnanas7y ago

> The above probably fails in a statically-typed language

1 more reply

seanmcdirmid7y ago

There are also static type systems that support dynamic inheritance (eg the one designed for Cecil) where the type of a value can change over time.

sly0107y ago

SQL types are actually default nullable, which puts them to the "not typed enough" category for my taste.

tudorb7y ago

Then you really won't like it that we're moving in the opposite direction :)

I personally like strong, static type systems. But a lot of real world data doesn't match static schemas, which is why businesses hire data engineers to (build systems to) clean it up.

sly0107y ago

Oh, the real world is strongly typed. It's just the that the type is "json.Value" :(

I of course recognize that a database is not very useful if you can't get your data in there.

bwestergard7y ago

It would be helpful to see a side by side comparison with Postgres JSON schemas/queries.

bwestergard7y ago

tudorb7y ago

Yes. In our case, though, Rockset is not a transactional database, but a query engine on top of existing data, so we're not the right place to enforce constraints anyway.

tudorb7y ago

DLA7y ago

And a comparison to CockroachDB which also has excellent JSON support along with all the benefits of SQL. See: https://www.cockroachlabs.com/blog/json-coming-to-cockroach/

tudorb7y ago

I think the main difference between our flavor of SQL and CockroachDB's (or PostgreSQL's) is that we don't have a specific "json" data type, but rather:

1. We don't need you to specify the types of any fields, indeed, we don't even need to know upfront which top-level fields ("columns") may exist in the document.

2. You don't need to use special operators / functions to deal with JSON-typed values.

j / k navigate · click thread line to collapse