undefined | Better HN

0 pointszmmmmm3y ago0 comments

Actually I think genomics / bioinformatics is a counterpoint there. One of the things I like about the field is nearly every file format is under-engineered. It's TSV all the way down and if you need compression gzip it. If you need to index that, sort it (literally often with unix sort command) and block-gzip it. Anything more engineered arose specifically because the above failed and something more is actually needed.

The downside is it's a giant hellscape of unstructured, poorly specified formats where data types are barely specified at all or if they are most of the schema is published on some rambling blog post by some rando scientist. You will spend most of your time understanding it by empirical reverse engineering of the data that you are trying to deal with.

0 comments

cratermoon3y ago

Oh, then eventually they'll get a committee together and after a few years they'll produce a unified file format that somehow manages to cover all the cases in the different existing formats (or at least the ones used by well-funded PIs) and is a hellscape of optional properties and required elements so poorly specified that it's impossible for any two implementations to communicate.

j / k navigate · click thread line to collapse

0 comments

cratermoon3y ago

j / k navigate · click thread line to collapse