undefined | Better HN

0 pointsdanShumway3y ago0 comments

I don't have horribly strong opinions here, but the argument feels circular to me:

- The format should be kept simple to encourage more people to build tools on top of it, and users will be more likely to work with it.

- We should deal with the emergent complexity of bad validation by making tools more complicated and having them detect errors on their end.

If users are going to use a validation tool to work with data, then they can also use a helper tool to generate data. And if the goal is to make it easier to build on top of data, import it, etc... allowing developers to do less work validating everything makes it easier for them to build things.

I'm going over the various threads on this page, and half of the critics here are saying that user data should be user facing, and the other half are saying that separate tools/validators should be used when submitting data. I don't know how to reconcile those two ideas; particularly a few comments that I'm seeing that validation should be primarily clientside embedded in tools.

Again, no strong opinions, and I'll freely admit I'm not familiar enough with OSM's data model to really have an opinion on whether simplification is necessary. But one of the good things about user facing data should be that you can confidently manipulate it without requiring a validator. If you need a validator, then why not also just use a tool to generate/translate the data?

To me, "just use a tool" doesn't seem like a convincing argument for making a data structure more error prone, at least not if the idea is that people should be able to work directly with that data structure.

----

> you'll need to create a new version of the file format and update all tools reading it even if they won't handle the new type, instead of just having old tools silently ignoring the new format that they don't understand.

Again, not sure that I understand the full scope of the problem here, and I'm not trying to make a strong claim, but extensible/backwards-compatible file formats exist. And again, I don't really see how validation solves this problem, you're just as likely to end up with a validator in your pipeline that rejects extensions as invalid, or a renderer that doesn't know how to handle a data extension that used to be invalid or impossible.

Wouldn't be nicer to have a clear definition of what's possible that everyone is aware of and can reason about without inspecting the entire validation stack? Wouldn't it be nice to not finish a big mapping project and then only find out that it has errors when you submit it? Or to know that if your viewer supports vWhatever of the spec that it is guaranteed to actually work, and not fall over when it encounters a novel extension to the data format that it doesn't understand or that it didn't think was possible? Personally, I'd rather be able to know right off the bat what a program supports rather than have to intuit it by seeing how it behaves and looking around for missing data.

Part of what's nice about trying to do extensions explicitly rather than implicitly through assumptions about data shape, is that it's easier to explicitly identify what is and isn't an extension.

0 comments

TuringTest3y ago

> If users are going to use a validation tool to work with data, then they can also use a helper tool to generate data. And if the goal is to make it easier to build on top of data, import it, etc... allowing developers to do less work validating everything makes it easier for them to build things.

That's good thinking for cases where you have a single toolset, in which tools can be kept in sync to collaborate with one another.

But in an open distributed data platform, where several possibly incompatible toolsets will be used, forcing a type of validation on the data itself based on the expectation of one group of tools can make some other applications impossible. In these cases, making the data format simple will make it easier to developers to build new tools, and the difficulties of synchronizing different tools may be dealt with in a different layer.

danShumwayOP3y ago

> That's good thinking for cases where you have a single toolset, in which tools can be kept in sync to collaborate with one another.

This is interesting. I would actually kind of argue the exact opposite, that more rigorously defined formats are more important the more diverse your toolsets get, and less important the less diverse they are.

The whole point of having a rigorously defined data format that blocks certain validation errors at the data level is that it's easier for diverse toolsets to work with that data, because they don't need to all implement their own validators, and they don't need to worry as much about other tools accidentally sending them malformed/broken data.

> making the data format simple

I think where we might be disagreeing is that I argue more specific data formats that inherently block validation errors are simpler than vague formats where there are restrictions and errors you can make, but those restrictions aren't clearly documented and aren't obvious until after you try to import the data.

I would point to something like the Matrix specification -- they have put comparatively more work into making sure that the Matrix specification (while flexible) is consistent, they don't want clients randomly making a bunch of changes or assumptions about the data format. That's partially inspired by looking back at standards like Jabber and seeing that having a lack of consensus about data formats caused tools to become extremely fragmented and hard to coordinate with each other. See https://news.ycombinator.com/item?id=17064616 for more information on that.

My feeling is that when you introduce validation layers, you have not actually gotten rid of restrictions between user applications, and you have not actually made coordination simpler, because different tools are going to break when they see pieces of data that they consider invalid or that they didn't realize they needed to be able to handle. All that's really happened is that complexity has been moved into the individual applications and that logic has been duplicated across a bunch of different apps.

In contract, when every single tool is speaking the same language and agrees what is and isn't valid data, then it's very fast to build tools that you know will be compatible with everything else in the ecosystem.

TuringTest3y ago

I'm thinking of Markdown as an example of a format with loose validation rules and a low entry barrier.

Sure, having several slightly incompatible versions with different degrees of completeness is a pain in the ass for rendering it. But insisting on a single format (such as titles can only be made with '#' not '-----', tables can only be '|--', comments can only be '-' not '*', etc) and rejecting as invalid any other user input would be way worse in terms of its purpose as an easy to learn, easy to read text-only format.

1 more reply

j / k navigate · click thread line to collapse

0 pointsdanShumway3y ago0 comments

I don't have horribly strong opinions here, but the argument feels circular to me:

- The format should be kept simple to encourage more people to build tools on top of it, and users will be more likely to work with it.

- We should deal with the emergent complexity of bad validation by making tools more complicated and having them detect errors on their end.

----

Part of what's nice about trying to do extensions explicitly rather than implicitly through assumptions about data shape, is that it's easier to explicitly identify what is and isn't an extension.

0 comments

TuringTest3y ago

That's good thinking for cases where you have a single toolset, in which tools can be kept in sync to collaborate with one another.

danShumwayOP3y ago

> That's good thinking for cases where you have a single toolset, in which tools can be kept in sync to collaborate with one another.

> making the data format simple

TuringTest3y ago

I'm thinking of Markdown as an example of a format with loose validation rules and a low entry barrier.

1 more reply

j / k navigate · click thread line to collapse