undefined | Better HN

0 pointscerved5y ago0 comments

While schema creation SQL can be a be a bit unwieldy, I'm not sure I appreciate which part is the problem or what you're trying to achieve.

Obviously you can SELECT * INTO FROM .. if you're just temporarily inserting data.

I'm not sure I see the value in automatically importing arbitrary data into a schemad database object. I think it's too complicated to be carried out by the database and should probably be done by some other piece of software and under human supervision/guidance.

Are you suggesting that INSERT/UPDATE statements also have the ability to modify objects? That sounds like it would add complexity without much gain as opposed to just running and alter table query.

I'm not saying that the process of writing schema modifying queries is painless but I'm not sure I'm convinced that we can have the cake and eat it

For your example I personally would've AWKED into a predefined schema or inserted the data as JSON

0 comments

jiggawatts5y ago

> Obviously you can SELECT * INTO FROM .. if you're just temporarily inserting data

Okay, bad example. I should have specified: Insert into an existing table, add the missing columns automatically. There are some special-cases where most database engines can automatically generate schemas, and "SELECT... INTO" is one of those few. Usually only allowed with an empty destination table.

> I'm not sure I see the value in automatically importing arbitrary data into a schemad database object.

There are lots of use-cases for this. You might not have them, but other people do all the time. Just about any large-scale enterprise software needs to be extensible in the field, for example. Think the likes of SAP or Siebel. I've seen similar problems crop up in CMDBs, job and ticket management software, etc... Famously, Jira is slow precisely because it is so bad at handling this kind of extensibility efficiently.

> Are you suggesting that INSERT/UPDATE statements also have the ability to modify objects?

Not necessarily, although that could be an option. What I mean is that changes to the table schema should be made using insert/update/delete statements, where the only "data" is things such as the column names, types, constraints, etc...

> That sounds like it would add complexity without much gain as opposed to just running and alter table query.

It would dramatically reduce complexity, removing an entire language from database engines, along with all the associated vendor-specific syntax, quirks, and limitations.

In fact, the same "schema" that is used in the wire protocol could be directly equal to the actual schema, and its update language. So if you get back a query result (with data), and want to create a copy of that schema elsewhere (e.g.: a local cache database), then you just take the "header" from the result set and "insert" it into the destination database schema. No conversion, no escaping, nothing.

Seriously: Try this as an exercise. Use Java or C#, write a select statement from a query (that has join, views, etc...), and then write the code that generates a table to cache this data in a separate, local database instance.

Do it. Sit down and give it a good go. You won't appreciate how hard this is until you do!

Now write the code to update the cache table dynamically if the source table changes. Assume that table has a petabyte of data. (That's why it's a table, it's a cache for something that's far too big to fit in memory!)

Now write the code to do all of the above with foreign constraints.

Good luck!

> I'm not sure I'm convinced that we can have the cake and eat it

There is a reason NoSQL databases exist and are wildly popular. It's not that "schemaless" is truly better so often, it's more that modifying the schema in most DBMS offerings is so fiddly that it's essentially impossible to do programmatically.

> or inserted the data as JSON

NoSQL in a nutshell! You haven't solved the problem, you've given up and resorted to schemaless tables instead, exactly the "bad" example above with all of its limitations and issues.

I've seen some good attempts at solving this problem, but they barely scratch the surface.

cervedOP5y ago

> Okay, bad example. I should have specified: Insert into an existing table, add the missing columns automatically. There are some special-cases where most database engines can automatically generate schemas, and "SELECT... INTO" is one of those few. Usually only allowed with an empty destination table.

See, I don't think I would want SQL that would modify the schema programatically. And imho, that sounds like an invitation for trouble. It can be done of-course, but, to me, the value of the relational data model is to enforce consistency through constraint. If I don't care about consistency, I may as well just dump the data as schemaless JSON and then periodically turn that semi-structured data into actual relational data.

Note that I said schemad database object.

Normalizing the data model is too complex to do programmatically and sounds ill advised. If you want the advantage of painless schemaless data, dump it as JSON.

> NoSQL in a nutshell! You haven't solved the problem, you've given up and resorted to schemaless tables instead, exactly the "bad" example above with all of its limitations and issues.

Note that I was meant this as a stop-gap solution, which to build on and turn into schema-controlled data. Preferably iteratively and hopefully not in production.

I've done this when I've had semi-structured data that I wanted to dump into the RDBMS at which point there was no schema. But it's a band aid. That data is in my experience best turned into schema controlled data ASAP, otherwise inconsistencies start to creep in. Relations that were not immediately obvious pop-up and I don't think there's hope to let the SQL database carry out such a task programmatically.

There are already tools do a good job in analyzing data, which aid the construction of schemas. There are also tools that make updating them relatively painless, especially if you've already inserted the semi-structured data, as most RDBMS' have decent support for handling JSON these days.

Then again, I'm not sure I'm able to appreciate the language you dream of. To me, it sounds like SQL but where ALTER TABLE and UPDATE/INSERT are conjoined into one query. And I guess it would also deal with simple relations (ie, the ones already present by virtue of arrays, associative arrays etc.) Which, admittedly, I see little point besides having same lines which were in two queries, now in one. Which I imagine would mostly just slow down the query parser and make things hard for the optimizer. Not that I don't appreciate that it might be convenient, it doesn't sound like the added complexity has enough trade-offs. Besides, with dumb relations, you end up with deformalized data, ripe for inconsistencies. Ie, the whole problem with NOSQL data modelling in the first place.

I wonder if such a job is not better put on something that is not the RDBMs, and in my experience, such tooling exists and already does a decent job without putting an extraneous responsibility on the already very complicated RDBMs.

I think I would feel comfortable putting schema generation in the hands of Java/C# applications. Instead I would go the route of dumping the semi-structured data straight into the RDBMs as JSON, if I was unable to to predefine the schema. I would do this in a development environment, then I would analyze the data, define the schema to structure it and then move it into production where the data would be queried in regular SQL.

speaking entirely from my own experience, schemaless data is nothing but a pain that results in much more work in the long run. For me it has been the data modelling equivalent of peeing your pants. First, it's nice and warm but pretty soon it get's cold and unpleasant.

Much like there's a reason we do not make computer program themselves, giving them the responsibility of designing the data model seems even unwiser. It's a task that requires intelligence.

j / k navigate · click thread line to collapse