Parse, Don't Validate (2019) (opens in new tab)

(lexi-lambda.github.io)

389 pointsmelse4y ago270 comments

270 comments

This principle is how pydantic[0] utterly revolutionized my python development experience. I went from constantly having to test functions in repls, writing tons of validation boilerplate, and still getting TypeErrors and NoneTypeErrors and AttributeErrors left and right to like...just writing code. And it working! Like one time I wrote a few hundred lines of python over the course of a day and then just ran it... and it worked. I just sat there shocked, waiting for the inevitable crash and traceback to dive in and fix something, but it never came. In Python! Incredible.

[0] https://pydantic-docs.helpmanual.io/

jimmaswell4y ago

I've found this to be simply a matter of experience, not tooling. As the years go by I find the majority of my code just working right - never touched anything like pydantic or validation boilerplate for my own code, besides having to write unit tests as an afterthought at work to keep the coverage metric up.

kortex4y ago

No this was like over a week, and 100% due to the tooling. Pydantic, pycharm, black, mypy, and flake8. Pretty much went from "type hints here and there" to "what happens if I try writing python as if it were (95%) statically typed." I'd been testing well before this point but it's not the same as writing test.

The development process is totally different when you write structured types first and then write your logic. 10/10 would recommend.

Usual caveat: this is what makes sense to me and my brain. Your experience may be different based on neurotype.

Scarbutt4y ago

The development process is totally different when you write structured types first and then write your logic. 10/10 would recommend.

Unless you were writing very small throwaway scripts, in what world where you writing your logic first and thinking about your data structures later?

2 more replies

cortesoft4y ago

Tests aren't to make sure your code works when you write it, it is to make sure it doesn't break when you make changes down the line.

exdsq4y ago

How do you know your code works when you write it if you don't test it?

3 more replies

vikingcaffiene4y ago

Man, for a dev with as much experience as you’re claiming to have, this comment ain’t a great look.

I’d argue that the more experience you get the more you write code for other people which involves adding lots of tooling, tests, etc. Even if the code works the first time, a more senior dev will make sure others have a “pit of success” they can fall into. This involves a lot more than just some “unit tests as an afterthought to keep the coverage up.”

mixmastamyk4y ago

Adding lots, no. I agree with the grandparent.

Keeping the code simple, finding the right abstractions, untangling coupling, gets the most bang for the buck. See the “beyond pep8” talk for a enlightened perspective.

That said, lightweight testing and tools like pyflakes to prevent egregious errors helps an experienced dev write very productively. Typing helps the most with large, venerable projects with numerous devs of differing experience levels.

2 more replies

hardwaregeek4y ago

Agreed. It's like saying "oh well I just fly the airplane really carefully". A lot of codebases eclipse the point where one person can understand the whole system. Testing, static analysis and tooling are what allows us to keep the plane flying.

1 more reply

JPKab4y ago

It's an immediate tell when someone makes statements like the one you're replying to.

It immediately tells me that they've never worked on large software projects, and if they have they haven't worked on ones that lasted more than a few months.

I apologize to folks reading this for my rather aggressive tone but I've been writing software for a long time in numerous languages, and people with the unit tests as an afterthought attitude are typically rather arrogant in fool hardy.

The most recent incarnation I've encountered is the hotshot data scientist who did okay in a few Kaggle competitions using Jupyter notebooks, and thinks they can just write software the way they did for the competitions with no test of any kind.

I had one of these on my team recently and naturally I had to do 95% of the work to turn anything he produced into a remotely decent product. I couldn't even get the guy to use nbdev, which would have allowed him to use Jupyter to write tested, documented, maintainable code.

2 more replies

JPKab4y ago

I've worked with plenty of coders who talk about how awesome their code is even though they just write unit test as an afterthought. They also talk about how they don't need validation and everything is just awesome.

I hated working with those coders because they weren't really very good and their code was always the worst to maintain. They are the equivalent of a carpenter who brags about how quickly they can bang nails but can't build a stable structure to save their life.

globular-toast4y ago

I agree. I'm often baffled by some developers who seem to think dynamic typing is a minefield that inevitably goes wrong all the time. I note these are almost always Javascript programmers, though. In practice, experience developers in dynamic languages like Python, Lisp etc. rarely make such errors. The number of bugs we deal with that would have been caught early by static typing are vanishingly small.

The best argument I've heard for doing type annotation is for documentation purposes to help future devs. But I don't completely buy this either. I touch new codebases all the time and I rarely spend much time thinking about what types will be passed. I can only assume it comes with experience.

Type annotation actually ends up taking a hell of a long time to do and is of questionable benefit if some of the codebase is not annotated. People spend sometimes hours just trying to get the type checker to say OK for code that actually works just fine!

exdsq4y ago

It's okay if you're working on a blog site, less so if you're working on an air-planes autopilot.

1 more reply

mixmastamyk4y ago

Sounded familiar. Worked correctly the first time it was run, sans types.

https://www.python.org/about/success/esr/

JPKab4y ago

Curious, but how does pydantic compare to marshmallow?

I'm currently using marshmallow in a project, specifically using the functionality that builds parsers from dataclasses.

I was curious what the differences were.

goodoldneon4y ago

My company soured on Marshmallow a while back due to performance. Maybe it’s gotten a lot better, but it has a bad reputation here. Most people seem really happy after we started using Pydantic. Take all that with a grain of salt since I’m just parroting hearsay :)

kortex4y ago

Personal opinion: pydantic crushes Marshmallow. Not even a fair fight. Pydantic is more performant, has better mypy/linter integration, and more powerful data model. We had a project where we pre-emptively used marshmallow to marshall/validate data. Had to remove that and solely use it at the ORM layer because of performance (and it still struggles).

I haven't used pydantic's ORM integration, but I don't hesitate to use pydantic models everywhere as business logic classes unless I need ludicrous speed.

That's all opinion, but I'd definitely give pydantic a swing.

AlphaSite4y ago

Marshmallow feels like a more flexible project, but it needs more work for a similar effect IMO.

My cons are: - Uses some dsl to define types - Doesn’t marshal to model objects by default, but from DICT to DICT

My pro is: - much more configurable and powerful

ElevenPhonons4y ago

I've also found Pydantic to be a valuable library to use.

However, it does have a strongly opinionated approach to casting that can sometimes yield non-obvious results. This behavior is documented and I would suggest new potential adopters of the library to explore this casting/coerce feature in the context of your product/app requirements.

For the most part, it's not an huge issue, but I've run into a few surprising cases. For example, sys.maxint, 0, '-7', 'inf', float('-inf') are all valid datetime formats.

- https://pydantic-docs.helpmanual.io/usage/models/#data-conve... - https://gist.github.com/mpkocher/30569c53dc3552bc5ad73e09b48...

exdsq4y ago

Just started working on a new SaaS startup and using FastAPI & Pydantic. The development experience has been great.

omegalulw4y ago

Can't you do this already with Python type annotations? I am a fan of typing in general (not just for data model as this seems to be) and using types everywhere saves a lot of debugging hassle and even allows for catching some bugs with static analysis.

nooorofe4y ago

There is one thing concerns me about type annotation. With annotation Python code looks like Java code, but with it I don't have advantages of Java (like speed). Besides I see things like `x: typing.Union[str, typing.Any]` - maybe better no annotation.

kortex4y ago

> x: typing.Union[str, typing.Any]

That's absolutely a valid and useful annotation. It tells me, and autocomplete, that "x" is probably a str, more likely than not, but I need to be aware that it might not be.

dnadler4y ago

We've had a similar experience using pydantic. We integrated it quite tightly with a django project and it's been awesome.

theptip4y ago

Where did you find it valuable to wire in to Django?

dnadler4y ago

We effectively use them as serializers, and also as a way to allow users to interact with the models in a controlled manner.

In our app, django essentially sits between a compute cluster and a front end. The pydantic objects are used to define the work to be done on the compute cluster.

spinningslate4y ago

This is a great post. I come back to it frequently.

There's beautiful clarity in the articulation, and the essence is easy to grasp yet powerful. It reminds me a bit of Scott Wlaschin's Railway Oriented Programming (ROP) [0]. As a technique, ROP nicely complements "parse don't validate". As an explanation, it's similarly simple yet wonderfully effective.

I've a real admiration for people who can explain and present things so clearly. With ROP, for example, the reader learns the basics of monads without even realising it.

[0]: https://fsharpforfunandprofit.com/rop/

ChrisMarshallNY4y ago

I agree. It's a very well-written post. I am not a Haskell person, but it was quite clear to me.

I feel that we don't put enough value, these days, on the ability to write clear, articulate exposition. Also, I believe that many people are not willing to read articles, books, or papers, of any meaningful length.

Everything needs to be boiled down to <10 min. read time, or <18 min. TED talks.

themulticaster4y ago

The way you somewhat randomly mention the value of clear and articulate exposition makes me assume you just had to wade through a 300-page specification document for a government contract regarding pencil sharpeners or something similar. If that's the case, you have my sympathy.

Anyways, I definitely agree.

ChrisMarshallNY4y ago

Not recently (thank the Gods), but I used to work for a defense contractor, and I have dealt with many specification documents (like the Bluetooth spec).

wrycoder4y ago

Those are the extras. This is the post:

https://fsharpforfunandprofit.com/posts/recipe-part2/

dgb234y ago

This principle can be applied to dynamic languages as well if you have some mechanism such as type hinting, pre-conditions etc. that are checked by a linter during development, even if it isn't, you can still use it at runtime with sufficient error handling.

The essential point of this blog post is to avoid "shotgun parsing", where parsing/validating is done just from a procedural standpoint, where it matters when exactly it happens. In the paper "Out of the Tar Pit" it is asserted that this leads to "accidental complexity" (AKA "pain and anxiety"), which is something every programmer has experienced before, possibly many times.

I've become a fan of declarative schema to (json-schema/OpenApi, clojure spec etc.) to express this kind of thing. Usually this is used at the boundaries of an application (configuration, web requests etc.) but there are many more applications for this within the flow of data transformations. If you apply the "parse don't validate" principle you turn schema-validated (sic!) data into a new thing. Whether that is a "ValidatedData" type or meta data, a pre-condition or runtime check says more about the environment you program rather than the principle in discussion. The benefit however is clear: Your code asserts that it requires parsed/validated data where it is needed, instead of when it should happen.

frogulis4y ago

I think the article goes a little further than what you describe -- it would have you use a strong type that cannot represent illegal values.

There's a follow-up article by the same author (that I unfortunately can't find), in which she explains this point.

As an example, returning a NonZero newtype over Int is not as type safe as using an ADT that lacks a zero value altogether. Using a NonEmpty newtype over List is not as type safe as using the NonEmpty ADT that has an element as part of its structure.

Basically newtype still has use, but it is not as airtight as a well-designed ADT.

runeks4y ago

> There's a follow-up article by the same author (that I unfortunately can't find), in which she explains this point.

I think this is it: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

sgift4y ago

> This principle can be applied to dynamic languages as well if you have some mechanism such as type hinting, pre-conditions etc. that are checked by a linter during development, even if it isn't, you can still use it at runtime with sufficient error handling.

Every time I read something like this my mind translates it to "after building an ad-hoc compiler you can do all the things a compiler can do. Just not as well, but you can do it." -- Same with "I don't need a compiler, my tests stop all this kind of bugs"

dgb234y ago

I know of the advantages of static typing and very much appreciate them. My point was more about how the concept in the article may be translated to other types of tooling.

iamwil4y ago

A related mantra is to "Make impossible states impossible" https://www.youtube.com/watch?v=IcgmSRJHu_8

nathcd4y ago

I think the original formulation is "make illegal states unrepresentable" from Yaron Minsky: https://blog.janestreet.com/effective-ml/ and https://blog.janestreet.com/effective-ml-revisited/ (or maybe there are older sources than 2010?)

adamlett4y ago

I think making this about the type checker is a bit of a red herring. There is nothing in this – otherwise excellent – advice that can’t be applied to a dynamically typed language like Ruby. It’s the same insight that leads OOP folks to warn against the Primitive Obsession code smell (http://wiki.c2.com/?PrimitiveObsession). It’s also the insight that leads to the Hexagonal Architecture ( https://en.wikipedia.org/wiki/Hexagonal_architecture_(softwa... ).

imoverclocked4y ago

I agree but I also think it’s on the right path. This seems partially like a “why Haskell” in disguise to me.

I’ve run across DSLs that have three or more layers of parsing and validation. Embedding different languages within each other (eg: JSON snippets within your own DSL) definitely leads to the issues the article talks about.

Also, growing your own parser without understanding standard lexer/parser basics seems far more common than it ought to be. I’m not talking brilliant design, rather the extremely naive one-character-at-a-time-in-a-really-complex-loop variety of design.

The better level of bad is, “I know what lexers/parsers are, now I’ll write something to basically implement a type-checking parser with the lexed+parsed tree as input.”

This article is basically stating, “Why not just get your parser to do it all for you in one swell foop?” When I have refactored code to follow this kind of design, I have never regretted the outcome.

garethrowlands4y ago

I don't think types are a red herring here. Because if you follow this advice, then just using logic and your source code, you can prove what data is valid. And, since types and (constructive) logic are so strongly related, then the types are, in some sense "there" even if you don't see them. To put it another way, it's nice if your computer can make the proofs but if it can't, does that make the theorems any less true?

Zababa4y ago

The advantage of the type checker is that it automatically check types for you. If you have only one function that produces a parsedArray and multiple functions that accept a parsedArray, you can be sure where they come from.

adamlett4y ago

Yes, that’s literally the advantage of a static type checker. I don’t dispute that. I’m just saying that the advice in the article is just as applicable in a dynamic language and confers the same benefits. True, you’re not protected against accidentally using one type where another was expected, but that’s not really the point as I see it. The point is to use better types. Types you can lean on instead instead of nervously tiptoe around.

flqn4y ago

What would you rather have: an object of value potentially outside your domain and an expensive boolean function saying if it's ok that you need to apply everywhere just to be sure, or a method of producing values that you know are always within your domain which you have to apply just once and no expensive boolean function?

mirekrusin4y ago

In typescript parsing/asserting types with combinators works very well merging runtime with static type system [0], it has to be used at i/o boundary, then it enters static type system guarantee and no assertions are necessary, makes very nice codebase.

[0] https://github.com/appliedblockchain/assert-combinators

lloydatkinson4y ago

I wish it had actual proper examples. I've no idea how to use that.

uryga4y ago

from a look at the readme, you combine those `$.TYPE` things to build a validation function that checks if its argument matches some pattern (and throws an exception if it doesn't).

  import * as $ from '@appliedblockchain/assert-combinators'

  const validateFooBar = (
    $.object({
      foo: $.string,
      bar: $.boolean
    })
  )
  // probably roughly equivalent to
  /*
  const validateFooBar = (x) => {
    console.assert(
      typeof x === 'object' &&
      typeof x.foo === 'string' &&
      typeof x.bar === 'boolean'
    )
    return x
  }
  */


  const test1 = { foo: "abc", bar: false }
  const test2 = { foo: 0, quux: true } 
  const { foo, bar } = validateFooBar(test1) // ok
  const oops = validateFooBar(test2) // throws error

the source is pretty readable too if you want to get an idea how it works.

https://github.com/appliedblockchain/assert-combinators/blob...

mirekrusin4y ago

Yes, the difference between console.assert and assert-combinators is that assert combinators return well typed ts result (and are more terse/minimal).

hermanradtke4y ago

Check out https://github.com/gcanti/io-ts/blob/master/index.md instead. I find it more composable and you can define a codec and get a native type from it so you are only defining things once.

mirekrusin4y ago

Assert combinators are composable, light, terse (very little verbosity), types can be defined in single place, instead of type/interface definition you can use return type of assert function.

They don’t go into deep category theory, you won’t find monads and friends, they are first level, straightforward any typescript developer can pick up in minutes - this is by design. It stops at combinators in typescript to solve very specific problem and nothing more. Haskell in ts is not the goal of this npm.

mirekrusin4y ago

Thank you, you are right. I’ll add examples and ping here.

ukj4y ago

Software Engineers: Parse, don't validate.

Mathematicians: Parsing is validation

https://gallais.github.io/pdf/draft_sigbovik21.pdf

Drup4y ago

To everyone in this subthread: sigbovik is a conference published every 1st of April.

This paper is an April's fool joke. I didn't think people could take that one seriously. I guess it's a good April's fool then. :)

ukj4y ago

The conference is indeed a spoof, but in so far as what Mathematicians call a "proof" - the paper contains one. Agda is a proof assistant in the spirit of the Calculus of Constructions ( https://en.wikipedia.org/wiki/Calculus_of_constructions ).

So is the joke on Computer Scientists or Mathematicians? You decide ;)

Beware of bugs in the above code; I have only proved it correct, not tried it --Donald Knuth

Drup4y ago

Sigbovik's jokes are of the kind where the premise is completely bonkers. The rest of the development is made with the utmost rigor to highlight said bonkersitude, Reductio ad absurdum.

1 more reply

rualca4y ago

The wide adoption of Flask as Python's backend development framework of choice makes it quite clear that software developers have a hard time picking up April fool's jokes.

4111111111111114y ago

Or how great April fool's ideas can actually be if they turn out to be real

cjfd4y ago

Software engineers like efficiently running software. Mathematicians like beautiful definitions. Scientists like non-trivial discoveries.

This paper.... uh.... what exactly is it good for?

I suppose it could be kind of nice as some kind of undergraduate paper writing project kind of thing but it looks too professional for that.... I am kind of at a loss why this was written. Maybe it is some strange kind of satire....

squiddev4y ago

It's written for sigbovik 2021 [1][2], which is very much a joke conference. Other papers this year were "Lowestcase and Uppestcase letters: Advances in Derp Learning" and "On the dire importance of mru caches for human survival (against Skynet)".

[1]: http://www.sigbovik.org/ [2]: http://www.sigbovik.org/2021/proceedings.pdf

m3koval4y ago

SIGBOVIK is a parody of computer science conferences. It's a running joke hosted on April Fools Day every year at CMU - and apparently a quite convincing one. ;-)

Source: I attended SIGBOVIK a few times in grad school.

ukj4y ago

This paper is good for parsing/validating your source code (from the view-point of your compiler/interpreter).

Code is data after all.

pwdisswordfish84y ago

The point being, the converse of ‘parsing is validation’ is not true.

ukj4y ago

Then you have some formally inexpressible/impredicative notion of "validation" in mind. For posterity (lifting from the depths of the threads):

General case: Validating random data as input into some program.

Particular case: Validating random source code (data) as input into some compiler (program).

Do compilers parse or validate?

> "the converse of ‘parsing is validation’ is not true."

If that were the case then you should be able to give an example of a compiler validating random source code (data) but not parsing it.

What determines the validity of random input is precisely a compiler's ability to parse it.

nsajko4y ago

I think that you actually agree with the comment you responded to, it's just that you misinterpreted what it was trying to say.

1 more reply

ukj4y ago

The word "is" implies an isomorphism.

If you see it differently you are implicitly assuming a non-formalist perspective on what "validation" means. Tell us about it.

pwdisswordfish84y ago

‘A square is a rectangle’ means squares are isomorphic to rectangles?

1 more reply

thereare5lights4y ago

> The word "is" implies an isomorphism.

Are you talking about a bijective mapping or are you saying it's a synonym for identical?

Because the former doesn't make any sense here and the latter is not true.

Red is a color does not imply that all colors are red.

1 more reply

codetrotter4y ago

The word “is” is also often used informally to mean “is a kind of”.

1 more reply

Attummm4y ago

That's how the validation tool for Python Maat works. By creating a completely new dictionary. https://github.com/Attumm/Maat

errnesto4y ago

For js / typescript I like: https://github.com/paperhive/fefe

basically ist’s just functions that take a value of one type and return a other one

vorticalbox4y ago

There is also joi, zod, myzod just to name a few.

I personally use myzod as its fast it parsing, zero dependancies and you can infre types from your schemas.

catlifeonmars4y ago

Don’t forget https://github.com/gcanti/io-ts

mulmboy4y ago

How does Maat compare with pydantic? https://github.com/samuelcolvin/pydantic

Attummm4y ago

Pydantic is using classes and typehinting. The new dataclasses style. Currently Maat doens't have a parser for dataclasses, it could come in the future. Pydantic works great with typehinting.

Maat was created before dataclasses existed. For validation Maat offers the same. But it also allows for some really neat features such as validation on encrypted data. https://github.com/Attumm/Maat/blob/main/tests/test_validati...

Since validation is written as dictionaries its possible to store the validations in caching db such as Redis.

And since its simple its easy to extend for anyone use case. And there are no other dependencies.

Benchmarks of pydantic has Maat around twice as Pydantic.

Attummm4y ago

Unable to change my comment.

Benchmarks of pydantic has Maat around twice the speed of Pydantic

rmnclmnt4y ago

If only for conciseness, readability and speed, I'd take Pydantic over any day. Being able to express 80% of type checking using Python native type hints + dataclasses is just so intuitive!

And it's getting some wide adoption, for instance FastAPI which uses it for request validations.

Attummm4y ago

Engineering is about tradeoffs, even though both projects do validation.

The points you made are all very valid points.

At my employer we use both projects. If the data is very nested, or really large Maat is used.

1 more reply

k__4y ago

Could someone rewrite the examples in TypeScript?

Some points really elude me because Haskell uses many symbols and is very dense.

seanwilson4y ago

From the Twitter link:

> IME, people in dynamic languages almost never program this way, though—they prefer to use validation and some form of shotgun parsing. My guess as to why? Writing that kind of code in dynamically-typed languages is often a lot more boilerplate than it is in statically-typed ones!

I feel that once you've got experience working in (usually functional) programming languages with strong static type checking, flakey dynamic code that relies on runtime checks and just being careful to avoid runtime errors makes your skin crawl, and you'll intuitively gravitate towards designs that takes advantage of strong static type checks.

When all you know is dynamic languages, the design guidance you get from strong static type checking is lost so there's more bad design paths you can go down. Patching up flakey code with ad-hoc runtime checks and debugging runtime errors becomes the norm because you just don't know any better and the type system isn't going to teach you.

More general advice would be "prefer strong static type checking over runtime checks" as it makes a lot of design and robustness problems go away.

Even if you can't use e.g. Haskell or OCaml in your daily work, a few weeks or just of few days of trying to learn them will open your eyes and make you a better coder elsewhere. Map/filter/reduce, immutable data structures, non-nullable types etc. have been in other languages for over 30 years before these ideas became more mainstream best practices for example (I'm still waiting for pattern matching + algebraic data types).

It's weird how long it's taking for people to rediscover why strong static types were a good idea.

chriswarbo4y ago

I have the same feeling after spending a few years with Haskell, StandardML, Agda, Idris, Coq, etc.

One trick I've found very useful is to realise that Maybe (AKA Option) can be though of as "a list with at most one element". Dynamic languages usually have some notion of list/array, which we can use as if it were a Maybe/Option type; e.g. we can follow a 'parse, don't validate' approach by wrapping a "parsed" result in a list, and returning an empty list otherwise. This allows us to use their existing 'map', 'filter', etc. too ;)

(This is explored in more detail, including links to logic programming, in https://link.springer.com/chapter/10.1007%2F3-540-15975-4_33 )

If we want to keep track of useful error messages, I've found Scala's "Try" type to be useful ('Try[T]' is isomorphic to 'Either Throwable T'). Annoyingly, built-in sum type; the closest thing is usually a tagged pair like '[true, myFoo]'/'[false, myException]', which is pretty naff.

np_tedious4y ago

> a list with at most one element

I've found scala or even LINQ to really hammer down this point, even to those who aren't into FP very much. Doing that map/flatmap makes it click for just about anyone

specialist4y ago

Ya, having 'foreach( null )' be a no-op is my #1 language feature request.

Not being a big fan of method chaining, a null saavy foreach would probably eliminate most of my null checks, need for Optional.

lukashrb4y ago

For what its worth: People don't use dynamic language because they don't know better or never used a static language. To better understand what dynamic languages bring to the table, here are some disadvantages of static types to consider:

Static types are awesome for local reasoning, but they are not that helpful in the context of the larger system (this already starts at the database, see idempotency mismatch).

Code with static types is sometimes larger and more complex than the problem its trying to solve

They tightly couple data to a type system, which (can) introduce incidental complexity >(I'm still waiting for pattern matching + algebraic data types) This is a good example, if you pattern match to a specific structure (e.g. position of fields in your algebraic data type), you tightly coupled your program to this particular structure. If the structure change, you may have to change all the code which pattern matches this structure.

tikhonj4y ago

The example with pattern matching doesn't have anything to do with static types. You'll have exactly the same problem if you pattern match against positional arguments in Python:

    match event.get():
        case Click((x, y)):
            handle_click_at(x, y)

(Example from PEP 636[1].)

In both Python and statically typed languages you can avoid this by matching against field names rather than positions, or using some other interface to access data. This is an important design aspect to consider when writing code, but does not have anything to do with dynamic programming. The only difference static typing makes is that when you do change the type in a way that breaks existing patterns, you can know statically rather than needing failing tests or runtime errors.

The same is true for the rest of the things you've mentioned: none are specific to static typing! My experience with a lot of Haskell, Python, JavaScript and other languages is that Haskell code for the same task tends to be shorter and simpler, albeit by relying on a set of higher-level abstractions you have to learn. I don't think much of that would change for a hypothetical dynamically typed variant of Haskell either!

[1]: https://www.python.org/dev/peps/pep-0636/#matching-sequences

lukashrb4y ago

You're absolutely right. I guess I mentioned pattern matching in particular because of the cited sentence from OP "I'm still waiting for pattern matching + algebraic data types".

> The same is true for the rest of the things you've mentioned: none are specific to static typing!

Sure, I could be wrong here. I frequently am. But could you point out why do you think that?

1 more reply

lolinder4y ago

This argument is common, but I've never understood how a dynamically typed language is supposed to avoid coupling algorithms to data structures.

When using a data structure, I know what set of fields I expect it to have. In TypeScript, I can ask the compiler to check that my function's callers always provide data that meets my expectations. In JavaScript, I can check for these expectations at runtime or just let my function have undefined behavior.

Either way, if my function's assumptions about the data's shape don't turn out to be correct, it will break, whether or not I use a dynamic language.

It seems that most of the people who make this argument against static typing are actually arguing against violations of the Robustness Principle[0]: "be conservative in what you send, be liberal in what you accept".

A statically typed function that is as generous as possible should be no more brittle against outside change than an equally-generous dynamically typed function. The main difference is that the statically typed function is explicit about what inputs it has well-defined behavior for.

[0] https://en.wikipedia.org/wiki/Robustness_principle

throwaway3464344y ago

If you are doing things with what is basically strings, as you find a lot of user input from web form inputs is, the advantages in having a lot of different strict types of Strings isnt huge. In these scenarios, using just basic types gets you a long way, because there is often very little business logic - route here, render that template, set these attributes.

Even a lot of JSON or XML parsing, you throw it into a parser and take what you need; if an unrelated field isnt what you expected, just move on with things rather than stop everything because the library author forgot about extension or openness possibilities (ie: an xs:any in a schema).

This attitude comes from the assumption not that types are unhelpful; just the chances we've modelled every outcome into our view of the world and gotten that right is unlikely.

When you get to complex systems with state changes to data and strict, well defined policies, rules engines, etc? Thats where dynamic languages often start adding all of that validation layer, to assert for right now you should act more like a type system and it's important - it probably has financial or security or other risks.

healsjnr14y ago

In my own anecdotal experience, I think it comes down to what the product you are working on needs, and how your team works.

Recently, I spent 3 years on Scala then switched jobs and spent 3 years in Ruby.

It was a shock to go back to dynamic languages, but after 3 months, I honestly couldn't tell which felt more productive or led to more stable high quality product.

In Ruby, we had all the issues people point out about dynamic languages, but the product didn't lean heavily on complex data structures or algorithms. We embraced complexity and failure and get good processes, designs and practices to deal with this.

In Scala, we had more rigour, but I also know I spent a lot of time on type design. Once things were sorted there was a lot of confidence in it, but generally, it took a lot longer to get there.

For certain systems that is absolutely worth it, for others (and in my case) it did feel like the evolution of the product meant this effort never really paid off.

giovannibonetti4y ago

When you said "idempotency mismatch" you were meaning impedance mismatch, right?

tome4y ago

Strange if so because it's the "Object-relational impedance mismatch" not the "Static type-relational impedance mismatch".

https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...

lukashrb4y ago

Your are right! Thank you for correcting me.

RHSeeger4y ago

Plenty of people know multiple statically typed and dynamic languages, and multiple functional, imperative, and other languages; and use dynamic languages for some things but not other things. The set of people using dynamic languages isn't just "those that haven't had their eyes opened yet to what static languages can do". Different languages and paradigms make different things easier.

I do believe that, for long lasting, larger projects, static typing tends to make the code easier to maintain as time goes on. But not every project is like that. In fact, not every project uses a single language. Some use statically typed languages for some parts, and dynamically typed for others (this is common in web dev).

wpietri4y ago

For sure. I do most of my work in situations of high volatility of domain and requirements and relatively high risk. (E.g., startups, projects in new areas.)

Static typing really appeals to me on a personal level. I enjoy the process of analysis it requires. I love the notion of eliminating whole classes of bugs. It feels way more tidy. I took Odersky's Scala class for fun and loved it.

But in practice, they're just a bad match for projects where the defining characteristic is unstable ground. They force artificial clarity when the reality is murky. And they impose costs that pay off in the long run, which only matters if the project has a long run. If I'm building something where we don't know where we're going, I'll reach for something like Python or Ruby to start.

This has been brought home to me by doing interviews recently. I have a sample problem that we pair on for an hour or so; there are 4 user stories. It involves a back end and a web front end. People can use any tools they want. My goal isn't to get particular things done; it's to see them at their best.

After doing a couple dozen, I'm seeing a pattern: developers using static tooling (e.g., Java, TypeScript) get circa half as much done as people using dynamic tooling (Python, plain JS). In the time when people in static contexts are still defining interfaces and types, people using dynamic tools are putting useful things on the page. Making a change in the static code often requires multiple tweaks in situations where it's one change in the dynamic code. It makes the extra costs of static tooling really obvious.

That doesn't harm the static-language interviewees, I should underline. The goal is to see how they work. But it was interesting to see that it wasn't just me feeling the extra costs. And those costs are only worth paying when they create payoffs down the road.

j1elo4y ago

Great comment. There are no silver bullets. I am Team static typing, but recognize how heavy of a burden would be to start a purely exploratory development in Rust or Java. It just "cuts your wings" in the name of correctness... well some times it is useful to have the ability to start with a technically incorrect implementation that anyways only fails in a corner case that is not your main point of research.

On the other hand, as the initial code grows and grows, the cost of moving it all to a saner language grows too... discouraging a rewrite. So we end up with very complex production software that started as dynamic and is still dynamic.

2 more replies

yawaramin4y ago

> But in practice, they're just a bad match for projects where the defining characteristic is unstable ground. They force artificial clarity when the reality is murky.

The rebuttal to this is provided by another post from Alexis King: https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-typ...

> This story sounds compelling, but it isn’t true. The flaw is in the premise: static types are not about “classifying the world” or pinning down the structure of every value in a system. The reality is that static type systems allow specifying exactly how much a component needs to know about the structure of its inputs, and conversely, how much it doesn’t.

Static typing helps you to safely model the parts of the system that you actually do know about, while allowing you to leave the unknown parts loosely defined. And when business requirements change, the compiler helps you to make those changes in a safe, guided way. This is why people often report the experience of doing large refactorings driven by compiler error messages, then running and finding that the change works correctly on the first try. A confidence few report in dynamically-typed languages.

> After doing a couple dozen, I'm seeing a pattern: developers using static tooling (e.g., Java, TypeScript) get circa half as much done as people using dynamic tooling (Python, plain JS).

I hope you understand and accept the fact that interview code challenges are not very representative of real-world software engineering. On-the-spot interview code is throwaway; real production code is not.

1 more reply

giantrobot4y ago

I have had the exact same experience. There's lots of utility in statically typed languages. They're great if your problem space is well defined. With respect to type checking, it's like a jig in wood or metal working. You trade flexibility for correctness.

When the problem space is less well defined the type-related boiler plate adds a lot of friction. It's not impossible to overcome that friction but it slows down progress. When you're under a tight deadline development velocity is often more valuable than absolute correctness or even overall runtime efficiency.

An delivered product that works is usually more valuable than an undelivered product that's more "correct" or efficient. A development project is just a cost (for various values of cost) until it ships.

2 more replies

mixmastamyk4y ago

This is the best comment on the subject and should be at the top rather the current dogmatic ones.

rowanG0774y ago

I'm one of those people. But the reasons I use dynamic languages is in spite of them being dynamic. I choose Python because it has insane library support, I don't like Python as a language though. I would instantly choose Haskell if it had even half the available libraries. But I can't justify having to write everything myself and take 10x as long.

seanwilson4y ago

> The set of people using dynamic languages isn't just "those that haven't had their eyes opened yet to what static languages can do".

This was more aimed at people who are new to the idea of parsing over validating. In a strong statically typed language, the type system would naturally guide you to use this approach so if this isn't natural to you then time in other languages would probably be worthwhile.

RHSeeger4y ago

My misunderstanding then. Your comment came across (to me) as saying that people using dynamic language just don't know any better; that, if they would just learn a static language, they would suddenly understand the error of their ways. That's the thought that I was responding to.

steveklabnik4y ago

> It's weird how long it's taking for people to rediscover why strong static types were a good idea.

I don't think it's weird. Most of those languages were not popular in industry for various reasons, and the ones that were (especially in say, the 90s) did not have particularly capable static type systems. The boilerplate/benefit ratio was all off.

The way I describe this dichotomy personally is, I would rather use Ruby than Java 1.5. I would rather use Rust than Ruby. (Java 1.5 is the last version of Java I have significant development experience in, and they've made the type system much more capable since those days.)

benrbray4y ago

Yeah, I remember I used to get frustrated when I had to read code that used map() or even .forEach() extensively, thinking a simple, imperative for loop would suffice. I slowly came to realize that a for loop gives you too much power. It's a hammer. It holds the place of a bug you just haven't written yet. Now I'm the one writing JavaScript like it's Haskell. Although Haskell could learn a thing or two from TypeScript about row polymorphism.

touisteur4y ago

On the other end I'm endlessly tired of 'too simple' foreach/map iterators. They're OK until you want to do something like different execution on first and/or last element... Give me a way to implement a 'join' pattern over the foreach iterators, or less terse iterators (with 'some' positional information). I think I'm just ranting about the for-of iterator in Ada...

Jtsummers4y ago

Ada provides a kind of iterator called a Cursor which could be used to build up a package of functions similar to the various C++ standard library algorithms. I believe this has actually already been done. Cursors can also be converted back to positional information if it makes sense (like with a Vector).

1 more reply

DougBTX4y ago

I quite like the “enumerate” pattern. When indexes matter, instead of `for x in v` you would write, `for (i, x) in enumerate(v)`, then the language only needs one type of for loop as both cases use the same enumerator interface.

1 more reply

k__4y ago

Is using "row polymorphism" the same as using a "structutal type system"?

I never heard about the former.

inbx04y ago

Not really, and correct me if I'm wrong but afaik TS doesn't actually do row polymorphism so much as structural subtyping - although the difference between the two is pretty small and you can get pretty close to row polymorphism with structural subtyping + generics.

But even if these were the same thing and we want to be a bit pendantic since this is HN after all, structural type systems often support some kind of subtyping or row polymorphism, but it's not a strict requirement for a type system to be "structural". You could have a structural type system that doesn't allow

  { a: int, b: int }

to be used where

  { a: int }

is expected. How practical such a type system would be... I don't know. Flow type checker for JavaScript makes a distinction between "exact" types, i.e. object must have exactly the properties listed and no more, and "inexact" types where such subtyping is allowed.

2 more replies

yashap4y ago

This matches my personal experience, for sure. I started out writing Python (pre-type-annotations) and JS, with lots of “raw” dicts/objects. For the past ~7 years though, I’ve written mostly Scala, but also a decent amount of TypeScript, Go and Java, and it’s completely transformed how I code, dramatically for the better.

Now, even in the rare case where I write some Python, JS or PHP, I write it in a very statically typed style, immediately parsing input into well-thought-out domain classes. And for backend services, I almost always go with 3 layers of models:

1) Data Transfer Objects. Map directly to the wire format, e.g. JSON or Protobuf. Generally auto-generated from API specs, e.g. using Open API Generator or protoc. A good API spec + code gen handles most input validation well

2) Domain Objects. Hand written, purely internal to the backend service, faithfully represent the domain. The domain layer of my code works exclusively with these. Sometimes there’s a little more validation when transforming a DTO into a domain model

3) Data Access Objects. Basically a representation of DB tables. Generally auto-generated from DB schemas, e.g. using libs like Prisma for TS or SQLBoiler for Go

Can’t imagine going back to the “everything is a dictionary” style for any decent sized project, it becomes such a mess so quickly. This style is a little more work up front, when you first write the code, but WAYYYYYY easier to maintain over time, fewer bugs and easier to modify quickly and confidently, with no nasty coupling of your domain models to either DB or wire format concerns. And code gen for the DTO and DAO layers makes it barely more up-front work.

dragonwriter4y ago

> It's weird how long it's taking for people to rediscover why strong static types were a good idea.

Its weird how long its taken for languages with static typing and type systems designed for correctness (and designed well for that end) rather than princupally for convenience of compilation to be available that are generally usable (considering licensing model, features, ecosystem, etc.)

archsurface4y ago

"It's weird how long it's taking for people to rediscover why strong static types were a good idea." It sounds like you're projecting your limited experience eg you've dismissed all Ms dev. Many people have been using strong static for decades. The benefits have never been out of sight. Many use dynamic, but many have always used strong static.

jhgb4y ago

> > IME, people in dynamic languages almost never program this way, though—they prefer to use validation

I wonder how many people the author met.

ukj4y ago

Every programming paradigm is a good idea if the respective trade-offs are acceptable to you.

For example, one good reason why strong static types are a bad idea... they prevent you from implementing dynamic dispatch.

Routers. You can't have routers.

justinpombrio4y ago

Are you sure you know what dynamic dispatch is? Java has dynamic dispatch, and it is a statically typed language. In Java, it's often called "runtime polymorphism".

https://www.geeksforgeeks.org/dynamic-method-dispatch-runtim...

And using it doesn't give up any of Java's type safety guarantees. The arguments and return type of the method you call (which will be invoked with dynamic dispatch) are type checked.

ukj4y ago

In English there seems to be the eternal confusion between what things are and what we call them.

When a router does lookups from a static table it's "static routing".

When Java does lookups from a static table it's "dynamic dispatch".

The same type of computation is being characterised as both "static" and "dynamic".

When a router does lookups from a dynamic table it's "dynamic routing" - there is no equivalent in Java because making the dispatch table reflexive/mutable is precisely what violates type-safety!

1 more reply

ukj4y ago

Are you sure you know when type-checking and when dynamic dispatching happens?

Compile time is not runtime.

The Java compiler is not the JVM.

The compiler does type checking. The JVM does the dynamic dispatching.

Neither does both.

1 more reply

ImprobableTruth4y ago

Huh? Could you give a specific example? Because e.g. C++ and Rust definitely have dynamic dispatch through their vtable mechanisms.

ukj4y ago

Do you understand the difference between compile time and runtime?

Neither C++ nor Rust give you static type safety AND dynamic dispatch because all of the safety checks for C++ and Rust happen at compile time. Not runtime.

4 more replies

kortex4y ago

Sure you can. You just need the right amount of indirection and abstraction. I think almost every language has some escape hatch which lets you implement dynamic dispatch.

ukj4y ago

This is a trivial and obvious implication of Turing completeness. Why do you even bother making the point?

With the right amount of indirection/abstraction you can implement everything in Assembly.

But you don't. Because you like all the heavy lifting the language does for you.

First Class citizens is what we are actually interested in when we talk about programming language paradigm-choices.

https://en.wikipedia.org/wiki/First-class_citizen

3 more replies

wodenokoto4y ago

When I think of validation I think of receiving a data file and checking that all rows and columns are correct and generating a report about all the problems.

Does my thing have a different name? Where can I read up on how to do that best?

rmnclmnt4y ago

You can use the now widely adopted Great Expections[0] library, which fits exactly this use-case for data validation!

[0] https://greatexpectations.io

wodenokoto4y ago

Thanks for the link. It looks really nice.

I see they’ve raised a lot of money. Does anyone know what their revenue model is?

geofft4y ago

I think that is in fact validating in the sense that the article means it.

Here's validating a CSV in Python (which I'm using because it's a language that's, well, less excited about types than the author's choice of Haskell, to show that the principle still applies):

    def validate_data(filename):
        reader = csv.DictReader(open(filename))
        for row in reader:
            try:
                date = datetime.datetime.fromisoformat(row["date"])
            except ValueError:
                print("ERROR: Invalid date", row)
            if date < datetime.datetime(2021, 1, 1):
                print("ERROR: Last year's data", row))
            # etc.
        return errors

    def actually_work_with_data(filename):
        reader = csv.DictReader(open(filename))
        for row in reader:
            try:
                date = datetime.datetime.fromisoformat(row["date"])
            except ValueError:
                raise Exception("Wait, didn't you validate this already???")
            # etc.

Yes, it's a kind of silly example, but - the validation routine is already doing the work of getting the data into the form you want, and now you have some DRY problems. What happens if you start accepting additional time formats in validate_data but you forget to teach actually_work_with_data to do the same thing?

The insight is that the work of reporting errors in the data is exactly the same as the work of getting non-erroneous data into a usable form. If a row of data doesn't have an error, that means it's usable; if you can't turn it into a directly usable format, that necessarily means it has some sort of error.

So what you want is a function that takes the data and does both of these at the same time, because it's actually just a single task.

In a language like Haskell or Rust, there's a built-in type for "either a result or an error", and the convention is to pass errors back as data. In a language like Python, there isn't a similar concept and the convention is to pass errors as exceptions. Since you want to accumulate all the errors, I'd probably just put them into a separate list:

    @attr.s # or @dataclasses.dataclass, whichever
    class Order:
        name: str
        date: datetime.datetime
        ...

    def parse(filename):
        data = []
        errors = []
        reader = csv.DictReader(open(filename))
        for row in reader:
            try:
                date = datetime.datetime.fromisoformat(row["date"])
            except ValueError:
                errors.append(("Invalid date", row))
                continue
            if date < datetime.datetime(2021, 1, 1):
                errors.append(("Last year's data", row))
                continue
            # etc.
            data.append(Order(name=row["name"], date=date, ...))
        return data, errors

And then all the logic of working with the data, whether to actually use it or to report errors, is in one place. Both your report of bad data and your actually_work_with_data function call the same routine. Your actual code doesn't have to parse fields in the CSV itself; that's already been done by what used to be the validation code. It gets a list of Order objects, and unlike a dictionary from DictReader, you know that an Order object is usable without further checks. (The author talks about "Use a data structure that makes illegal states unrepresentable" - this isn't quite doable in Python where you can generally put whatever you want in an object, but if you follow the discipline that only the parse() function generates new Order objects, then it's effectively true in practice.)

And if your file format changes, you make the change in one spot; you've kept the code DRY.

quickthrower24y ago

I thought of input validation for web forms. Similar thing I guess. In Haskell you can create a type that you know is a validated email address but you still need a validation function from String -> Maybe Email to actually validate it at runtime

WJW4y ago

I think for the usecase GP gives it'd be even better to have a function `String -> Either (LineNumber,String,[Problem]) Email`, so that you can report back which of the lines had problems and what kind of problems. For web form validation you can skip the line number but it'd still be useful to keep the list of problems, so that you can report back to the user what about their input did not conform to expectations.

CuriousCosmic4y ago

That's just a parser though. Like described in the post, parsers sometimes can fail but importantly they always pass along the result if they succeed. Validation functions on the other hand only validate that said data is valid.

The argument is that if you need to interact with or operate on some data you shouldn't be designing functions to validate the data but rather to render it into a useful output with well defined behaviour.

foota4y ago

Data validation?

Vosporos4y ago

Wonderful piece, it has really opened my eyes on foreign data ingestion.

pansa24y ago

“Parse, don’t [just] validate”.

Say I have a string that’s supposed to represent an integer. To me, “Validate” means using a regex to ensure it contains only digits (raising an error if it doesn’t) but then continuing to work with it as a string. “Parse” means using “atoi” to obtain an integer value (but what if the string’s malformed?) and then working with that.

I first thought this article was recommending doing the latter instead of the former, but the actual recommendation (and I believe best practice) is to do both.

b3morales4y ago

You seem to suggest that it's possible to parse without validating, which I'm not sure I follow. Surely validation is just one of the phases or steps of parsing?

pansa24y ago

Functions like `atoi` parse strings into integers, but will happily accept “ 10blah” and return 10. In my experience it’s best to validate that the string is well-formed (e.g. contains only digits) before passing it to one of those functions.

nsajko4y ago

The point is that validation is (or should/can be) a byproduct of parsing. I.e., you shouldn't "do both", rather the validation should be encompassed by the parsing, as much as it makes sense.

bruce3434344y ago

This still sounds like validation but with extra steps. (or less?)

_vufv4y ago

The post is saying:

- don't drop the info gathered from checks while validating, but keep track of it

- if you do this, you'll effectively be parsing

- parsing is more powerful that validating

"Extra steps" would be keeping track of info gathered from checks.

bruce3434344y ago

Right. My takeaway was "verify and validate once, then put it in a specially marked datastructure, or if your language allow it make the typesystem guarantee some conditions of the data, then work with that from there". Where does parsing come in the picture?

vvillena4y ago

Well, ain't that it? If you validate a string so that it contains some angle bracket tags at the beginning and the end, ensure that both the tag values contain the same string (except for one extra marker in the end tag), and store the tag name and the string within the tags in a purpose-made data structure, you can call it whatever you want. Some will call that a rudimentary XML parser.

justinpombrio4y ago

You understand it. It's using the word "parse" metaphorically, to mean "validate, then put it in a specially marked data structure". For example, `parse_int :: string -> maybe int` is a parsing function, and it "validates that the string is an integer, then puts it in a specially marked data structure called int". However, the post uses the word "parse" not only for true parsing functions (that convert text into a data structure), but also for conversions from data structure to data structure.

I also find this a confusing use of the word "parse", and it's not explained in the post, and I think "parse, don't validate" is a poor slogan as a result. The traditional slogan is "make illegal states unrepresentable", though that's a bit narrower of a concept.

1 more reply

kortex4y ago

> verify and validate once, then put it in a specially marked datastructure

Not to steal vvillena's thunder, but that's pretty much the dictionary definition of "parsing"

> analyze (a string or text) into logical syntactic components, typically in order to test conformability to a logical grammar.

Parsing is taking some collection of symbols, and emitting some other structure that obeys certain rules. Those symbols need not be text, they can be any abstract "thing". A symbol could be a full-blown data structure - you can parse a List into a NotEmptyList, where there's some associated grammar with the NEL that's a stricter version of the List grammar.

1 more reply

kortex4y ago

Validation is checking if something looks like a data structure. Parsing is smashing data into a data structure, and failing out if you can't do it.

At the end of parsing, you have a structure with a type. After validation, you may or may not have a structure with that type, depending on how you chose to validate.

But I think the big win is, parsers are usually much easier to compose (since they themselves are structured functions) and so if you start with the type first, you often get the "validation" behavior aspect of parsing for "free" (usually from a library). Maybe title should have been "Parse, don't manually validate."

But if your type doesn't catch all your invariants, yeah it does feel kinda just like validation.

dgb234y ago

It's more like putting the "extra steps" at the right place in the code.

gbrown_4y ago

Previous discussion https://news.ycombinator.com/item?id=21476261

AlexSW4y ago

I couldn't agree with this post more.

I found myself replacing the configuration parsing code in a C++ project that was littered with exactly the validation issues described, and converted it to that which the author advocates. The result was a vastly more readable and maintainable codebase, and it was faster and less buggy to boot.

Another nice advantage is that the types are providing free/self- documentation, too.

TheAceOfHearts4y ago

Related discussion from last month which links to this article in the repo description: https://news.ycombinator.com/item?id=27166162

pengwing4y ago

I'd call it: pattern match, don't validate.

Gotta go and program more Elixir...

pickdig4y ago

a tiny advice: can the blog author do some separation between his next and previous article? it's really hard to read unless you hover your mouse over one of them otherwise. link to what i'm talking about:

https://ibb.co/kc0HKy9

ChrisArchitect4y ago

discussed alot a month ago

https://news.ycombinator.com/item?id=27166162

j / k navigate · click thread line to collapse

270 comments

kortex4y ago

[0] https://pydantic-docs.helpmanual.io/

jimmaswell4y ago

kortex4y ago

The development process is totally different when you write structured types first and then write your logic. 10/10 would recommend.

Usual caveat: this is what makes sense to me and my brain. Your experience may be different based on neurotype.

Scarbutt4y ago

The development process is totally different when you write structured types first and then write your logic. 10/10 would recommend.

Unless you were writing very small throwaway scripts, in what world where you writing your logic first and thinking about your data structures later?

2 more replies

cortesoft4y ago

Tests aren't to make sure your code works when you write it, it is to make sure it doesn't break when you make changes down the line.

exdsq4y ago

How do you know your code works when you write it if you don't test it?

3 more replies

vikingcaffiene4y ago

Man, for a dev with as much experience as you’re claiming to have, this comment ain’t a great look.

mixmastamyk4y ago

Adding lots, no. I agree with the grandparent.

Keeping the code simple, finding the right abstractions, untangling coupling, gets the most bang for the buck. See the “beyond pep8” talk for a enlightened perspective.

2 more replies

hardwaregeek4y ago

1 more reply

JPKab4y ago

It's an immediate tell when someone makes statements like the one you're replying to.

It immediately tells me that they've never worked on large software projects, and if they have they haven't worked on ones that lasted more than a few months.

2 more replies

JPKab4y ago

globular-toast4y ago

exdsq4y ago

It's okay if you're working on a blog site, less so if you're working on an air-planes autopilot.

1 more reply

mixmastamyk4y ago

Sounded familiar. Worked correctly the first time it was run, sans types.

https://www.python.org/about/success/esr/

JPKab4y ago

Curious, but how does pydantic compare to marshmallow?

I'm currently using marshmallow in a project, specifically using the functionality that builds parsers from dataclasses.

I was curious what the differences were.

goodoldneon4y ago

kortex4y ago

I haven't used pydantic's ORM integration, but I don't hesitate to use pydantic models everywhere as business logic classes unless I need ludicrous speed.

That's all opinion, but I'd definitely give pydantic a swing.

AlphaSite4y ago

Marshmallow feels like a more flexible project, but it needs more work for a similar effect IMO.

My cons are: - Uses some dsl to define types - Doesn’t marshal to model objects by default, but from DICT to DICT

My pro is: - much more configurable and powerful

ElevenPhonons4y ago

I've also found Pydantic to be a valuable library to use.

For the most part, it's not an huge issue, but I've run into a few surprising cases. For example, sys.maxint, 0, '-7', 'inf', float('-inf') are all valid datetime formats.

- https://pydantic-docs.helpmanual.io/usage/models/#data-conve... - https://gist.github.com/mpkocher/30569c53dc3552bc5ad73e09b48...

exdsq4y ago

Just started working on a new SaaS startup and using FastAPI & Pydantic. The development experience has been great.

omegalulw4y ago

nooorofe4y ago

kortex4y ago

> x: typing.Union[str, typing.Any]

That's absolutely a valid and useful annotation. It tells me, and autocomplete, that "x" is probably a str, more likely than not, but I need to be aware that it might not be.

dnadler4y ago

We've had a similar experience using pydantic. We integrated it quite tightly with a django project and it's been awesome.

theptip4y ago

Where did you find it valuable to wire in to Django?

dnadler4y ago

We effectively use them as serializers, and also as a way to allow users to interact with the models in a controlled manner.

In our app, django essentially sits between a compute cluster and a front end. The pydantic objects are used to define the work to be done on the compute cluster.

spinningslate4y ago

This is a great post. I come back to it frequently.

I've a real admiration for people who can explain and present things so clearly. With ROP, for example, the reader learns the basics of monads without even realising it.

[0]: https://fsharpforfunandprofit.com/rop/

ChrisMarshallNY4y ago

I agree. It's a very well-written post. I am not a Haskell person, but it was quite clear to me.

Everything needs to be boiled down to <10 min. read time, or <18 min. TED talks.

themulticaster4y ago

Anyways, I definitely agree.

ChrisMarshallNY4y ago

Not recently (thank the Gods), but I used to work for a defense contractor, and I have dealt with many specification documents (like the Bluetooth spec).

wrycoder4y ago

Those are the extras. This is the post:

https://fsharpforfunandprofit.com/posts/recipe-part2/

dgb234y ago

frogulis4y ago

I think the article goes a little further than what you describe -- it would have you use a strong type that cannot represent illegal values.

There's a follow-up article by the same author (that I unfortunately can't find), in which she explains this point.

Basically newtype still has use, but it is not as airtight as a well-designed ADT.

runeks4y ago

> There's a follow-up article by the same author (that I unfortunately can't find), in which she explains this point.

I think this is it: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

sgift4y ago

dgb234y ago

I know of the advantages of static typing and very much appreciate them. My point was more about how the concept in the article may be translated to other types of tooling.

iamwil4y ago

A related mantra is to "Make impossible states impossible" https://www.youtube.com/watch?v=IcgmSRJHu_8

nathcd4y ago

adamlett4y ago

imoverclocked4y ago

I agree but I also think it’s on the right path. This seems partially like a “why Haskell” in disguise to me.

The better level of bad is, “I know what lexers/parsers are, now I’ll write something to basically implement a type-checking parser with the lexed+parsed tree as input.”

garethrowlands4y ago

Zababa4y ago

adamlett4y ago

flqn4y ago

mirekrusin4y ago

[0] https://github.com/appliedblockchain/assert-combinators

lloydatkinson4y ago

I wish it had actual proper examples. I've no idea how to use that.

uryga4y ago

from a look at the readme, you combine those `$.TYPE` things to build a validation function that checks if its argument matches some pattern (and throws an exception if it doesn't).

  import * as $ from '@appliedblockchain/assert-combinators'

  const validateFooBar = (
    $.object({
      foo: $.string,
      bar: $.boolean
    })
  )
  // probably roughly equivalent to
  /*
  const validateFooBar = (x) => {
    console.assert(
      typeof x === 'object' &&
      typeof x.foo === 'string' &&
      typeof x.bar === 'boolean'
    )
    return x
  }
  */


  const test1 = { foo: "abc", bar: false }
  const test2 = { foo: 0, quux: true } 
  const { foo, bar } = validateFooBar(test1) // ok
  const oops = validateFooBar(test2) // throws error

the source is pretty readable too if you want to get an idea how it works.

https://github.com/appliedblockchain/assert-combinators/blob...

mirekrusin4y ago

Yes, the difference between console.assert and assert-combinators is that assert combinators return well typed ts result (and are more terse/minimal).

hermanradtke4y ago

Check out https://github.com/gcanti/io-ts/blob/master/index.md instead. I find it more composable and you can define a codec and get a native type from it so you are only defining things once.

mirekrusin4y ago

Assert combinators are composable, light, terse (very little verbosity), types can be defined in single place, instead of type/interface definition you can use return type of assert function.

mirekrusin4y ago

Thank you, you are right. I’ll add examples and ping here.

ukj4y ago

Software Engineers: Parse, don't validate.

Mathematicians: Parsing is validation

https://gallais.github.io/pdf/draft_sigbovik21.pdf

Drup4y ago

To everyone in this subthread: sigbovik is a conference published every 1st of April.

This paper is an April's fool joke. I didn't think people could take that one seriously. I guess it's a good April's fool then. :)

ukj4y ago

So is the joke on Computer Scientists or Mathematicians? You decide ;)

Beware of bugs in the above code; I have only proved it correct, not tried it --Donald Knuth

Drup4y ago

Sigbovik's jokes are of the kind where the premise is completely bonkers. The rest of the development is made with the utmost rigor to highlight said bonkersitude, Reductio ad absurdum.

1 more reply

rualca4y ago

The wide adoption of Flask as Python's backend development framework of choice makes it quite clear that software developers have a hard time picking up April fool's jokes.

4111111111111114y ago

Or how great April fool's ideas can actually be if they turn out to be real

cjfd4y ago

Software engineers like efficiently running software. Mathematicians like beautiful definitions. Scientists like non-trivial discoveries.

This paper.... uh.... what exactly is it good for?

squiddev4y ago

[1]: http://www.sigbovik.org/ [2]: http://www.sigbovik.org/2021/proceedings.pdf

m3koval4y ago

SIGBOVIK is a parody of computer science conferences. It's a running joke hosted on April Fools Day every year at CMU - and apparently a quite convincing one. ;-)

Source: I attended SIGBOVIK a few times in grad school.

ukj4y ago

This paper is good for parsing/validating your source code (from the view-point of your compiler/interpreter).

Code is data after all.

pwdisswordfish84y ago

The point being, the converse of ‘parsing is validation’ is not true.

ukj4y ago

Then you have some formally inexpressible/impredicative notion of "validation" in mind. For posterity (lifting from the depths of the threads):

General case: Validating random data as input into some program.

Particular case: Validating random source code (data) as input into some compiler (program).

Do compilers parse or validate?

> "the converse of ‘parsing is validation’ is not true."

If that were the case then you should be able to give an example of a compiler validating random source code (data) but not parsing it.

What determines the validity of random input is precisely a compiler's ability to parse it.

nsajko4y ago

I think that you actually agree with the comment you responded to, it's just that you misinterpreted what it was trying to say.

1 more reply

ukj4y ago

The word "is" implies an isomorphism.

If you see it differently you are implicitly assuming a non-formalist perspective on what "validation" means. Tell us about it.

pwdisswordfish84y ago

‘A square is a rectangle’ means squares are isomorphic to rectangles?

1 more reply

thereare5lights4y ago

> The word "is" implies an isomorphism.

Are you talking about a bijective mapping or are you saying it's a synonym for identical?

Because the former doesn't make any sense here and the latter is not true.

Red is a color does not imply that all colors are red.

1 more reply

codetrotter4y ago

The word “is” is also often used informally to mean “is a kind of”.

1 more reply

Attummm4y ago

That's how the validation tool for Python Maat works. By creating a completely new dictionary. https://github.com/Attumm/Maat

errnesto4y ago

For js / typescript I like: https://github.com/paperhive/fefe

basically ist’s just functions that take a value of one type and return a other one

vorticalbox4y ago

There is also joi, zod, myzod just to name a few.

I personally use myzod as its fast it parsing, zero dependancies and you can infre types from your schemas.

catlifeonmars4y ago

Don’t forget https://github.com/gcanti/io-ts

mulmboy4y ago

How does Maat compare with pydantic? https://github.com/samuelcolvin/pydantic

Attummm4y ago

Pydantic is using classes and typehinting. The new dataclasses style. Currently Maat doens't have a parser for dataclasses, it could come in the future. Pydantic works great with typehinting.

Since validation is written as dictionaries its possible to store the validations in caching db such as Redis.

And since its simple its easy to extend for anyone use case. And there are no other dependencies.

Benchmarks of pydantic has Maat around twice as Pydantic.

Attummm4y ago

Unable to change my comment.

Benchmarks of pydantic has Maat around twice the speed of Pydantic

rmnclmnt4y ago

If only for conciseness, readability and speed, I'd take Pydantic over any day. Being able to express 80% of type checking using Python native type hints + dataclasses is just so intuitive!

And it's getting some wide adoption, for instance FastAPI which uses it for request validations.

Attummm4y ago

Engineering is about tradeoffs, even though both projects do validation.

The points you made are all very valid points.

At my employer we use both projects. If the data is very nested, or really large Maat is used.

1 more reply

k__4y ago

Could someone rewrite the examples in TypeScript?

Some points really elude me because Haskell uses many symbols and is very dense.

seanwilson4y ago

From the Twitter link:

More general advice would be "prefer strong static type checking over runtime checks" as it makes a lot of design and robustness problems go away.

It's weird how long it's taking for people to rediscover why strong static types were a good idea.

chriswarbo4y ago

I have the same feeling after spending a few years with Haskell, StandardML, Agda, Idris, Coq, etc.

(This is explored in more detail, including links to logic programming, in https://link.springer.com/chapter/10.1007%2F3-540-15975-4_33 )

np_tedious4y ago

> a list with at most one element

I've found scala or even LINQ to really hammer down this point, even to those who aren't into FP very much. Doing that map/flatmap makes it click for just about anyone

specialist4y ago

Ya, having 'foreach( null )' be a no-op is my #1 language feature request.

Not being a big fan of method chaining, a null saavy foreach would probably eliminate most of my null checks, need for Optional.

lukashrb4y ago

Static types are awesome for local reasoning, but they are not that helpful in the context of the larger system (this already starts at the database, see idempotency mismatch).

Code with static types is sometimes larger and more complex than the problem its trying to solve

tikhonj4y ago

The example with pattern matching doesn't have anything to do with static types. You'll have exactly the same problem if you pattern match against positional arguments in Python:

    match event.get():
        case Click((x, y)):
            handle_click_at(x, y)

(Example from PEP 636[1].)

[1]: https://www.python.org/dev/peps/pep-0636/#matching-sequences

lukashrb4y ago

You're absolutely right. I guess I mentioned pattern matching in particular because of the cited sentence from OP "I'm still waiting for pattern matching + algebraic data types".

> The same is true for the rest of the things you've mentioned: none are specific to static typing!

Sure, I could be wrong here. I frequently am. But could you point out why do you think that?

1 more reply

lolinder4y ago

This argument is common, but I've never understood how a dynamically typed language is supposed to avoid coupling algorithms to data structures.

Either way, if my function's assumptions about the data's shape don't turn out to be correct, it will break, whether or not I use a dynamic language.

[0] https://en.wikipedia.org/wiki/Robustness_principle

throwaway3464344y ago

This attitude comes from the assumption not that types are unhelpful; just the chances we've modelled every outcome into our view of the world and gotten that right is unlikely.

healsjnr14y ago

In my own anecdotal experience, I think it comes down to what the product you are working on needs, and how your team works.

Recently, I spent 3 years on Scala then switched jobs and spent 3 years in Ruby.

It was a shock to go back to dynamic languages, but after 3 months, I honestly couldn't tell which felt more productive or led to more stable high quality product.

In Scala, we had more rigour, but I also know I spent a lot of time on type design. Once things were sorted there was a lot of confidence in it, but generally, it took a lot longer to get there.

For certain systems that is absolutely worth it, for others (and in my case) it did feel like the evolution of the product meant this effort never really paid off.

giovannibonetti4y ago

When you said "idempotency mismatch" you were meaning impedance mismatch, right?

tome4y ago

Strange if so because it's the "Object-relational impedance mismatch" not the "Static type-relational impedance mismatch".

https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...

lukashrb4y ago

Your are right! Thank you for correcting me.

RHSeeger4y ago

wpietri4y ago

For sure. I do most of my work in situations of high volatility of domain and requirements and relatively high risk. (E.g., startups, projects in new areas.)

j1elo4y ago

2 more replies

yawaramin4y ago

> But in practice, they're just a bad match for projects where the defining characteristic is unstable ground. They force artificial clarity when the reality is murky.

The rebuttal to this is provided by another post from Alexis King: https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-typ...

> After doing a couple dozen, I'm seeing a pattern: developers using static tooling (e.g., Java, TypeScript) get circa half as much done as people using dynamic tooling (Python, plain JS).

1 more reply

giantrobot4y ago

2 more replies

mixmastamyk4y ago

This is the best comment on the subject and should be at the top rather the current dogmatic ones.

rowanG0774y ago

seanwilson4y ago

> The set of people using dynamic languages isn't just "those that haven't had their eyes opened yet to what static languages can do".

RHSeeger4y ago

steveklabnik4y ago

> It's weird how long it's taking for people to rediscover why strong static types were a good idea.

benrbray4y ago

touisteur4y ago

Jtsummers4y ago

1 more reply

DougBTX4y ago

1 more reply

k__4y ago

Is using "row polymorphism" the same as using a "structutal type system"?

I never heard about the former.

inbx04y ago

  { a: int, b: int }

to be used where

  { a: int }

2 more replies

yashap4y ago

3) Data Access Objects. Basically a representation of DB tables. Generally auto-generated from DB schemas, e.g. using libs like Prisma for TS or SQLBoiler for Go

dragonwriter4y ago

> It's weird how long it's taking for people to rediscover why strong static types were a good idea.

archsurface4y ago

jhgb4y ago

> > IME, people in dynamic languages almost never program this way, though—they prefer to use validation

I wonder how many people the author met.

ukj4y ago

Every programming paradigm is a good idea if the respective trade-offs are acceptable to you.

For example, one good reason why strong static types are a bad idea... they prevent you from implementing dynamic dispatch.

Routers. You can't have routers.

justinpombrio4y ago

Are you sure you know what dynamic dispatch is? Java has dynamic dispatch, and it is a statically typed language. In Java, it's often called "runtime polymorphism".

https://www.geeksforgeeks.org/dynamic-method-dispatch-runtim...

And using it doesn't give up any of Java's type safety guarantees. The arguments and return type of the method you call (which will be invoked with dynamic dispatch) are type checked.

ukj4y ago

In English there seems to be the eternal confusion between what things are and what we call them.

When a router does lookups from a static table it's "static routing".

When Java does lookups from a static table it's "dynamic dispatch".

The same type of computation is being characterised as both "static" and "dynamic".

When a router does lookups from a dynamic table it's "dynamic routing" - there is no equivalent in Java because making the dispatch table reflexive/mutable is precisely what violates type-safety!

1 more reply

ukj4y ago

Are you sure you know when type-checking and when dynamic dispatching happens?

Compile time is not runtime.

The Java compiler is not the JVM.

The compiler does type checking. The JVM does the dynamic dispatching.

Neither does both.

1 more reply

ImprobableTruth4y ago

Huh? Could you give a specific example? Because e.g. C++ and Rust definitely have dynamic dispatch through their vtable mechanisms.

ukj4y ago

Do you understand the difference between compile time and runtime?

Neither C++ nor Rust give you static type safety AND dynamic dispatch because all of the safety checks for C++ and Rust happen at compile time. Not runtime.

4 more replies

kortex4y ago

Sure you can. You just need the right amount of indirection and abstraction. I think almost every language has some escape hatch which lets you implement dynamic dispatch.

ukj4y ago

This is a trivial and obvious implication of Turing completeness. Why do you even bother making the point?

With the right amount of indirection/abstraction you can implement everything in Assembly.

But you don't. Because you like all the heavy lifting the language does for you.

First Class citizens is what we are actually interested in when we talk about programming language paradigm-choices.

https://en.wikipedia.org/wiki/First-class_citizen

3 more replies

wodenokoto4y ago

When I think of validation I think of receiving a data file and checking that all rows and columns are correct and generating a report about all the problems.

Does my thing have a different name? Where can I read up on how to do that best?

rmnclmnt4y ago

You can use the now widely adopted Great Expections[0] library, which fits exactly this use-case for data validation!

[0] https://greatexpectations.io

wodenokoto4y ago

Thanks for the link. It looks really nice.

I see they’ve raised a lot of money. Does anyone know what their revenue model is?

geofft4y ago

I think that is in fact validating in the sense that the article means it.

Here's validating a CSV in Python (which I'm using because it's a language that's, well, less excited about types than the author's choice of Haskell, to show that the principle still applies):

    def validate_data(filename):
        reader = csv.DictReader(open(filename))
        for row in reader:
            try:
                date = datetime.datetime.fromisoformat(row["date"])
            except ValueError:
                print("ERROR: Invalid date", row)
            if date < datetime.datetime(2021, 1, 1):
                print("ERROR: Last year's data", row))
            # etc.
        return errors

    def actually_work_with_data(filename):
        reader = csv.DictReader(open(filename))
        for row in reader:
            try:
                date = datetime.datetime.fromisoformat(row["date"])
            except ValueError:
                raise Exception("Wait, didn't you validate this already???")
            # etc.

So what you want is a function that takes the data and does both of these at the same time, because it's actually just a single task.

    @attr.s # or @dataclasses.dataclass, whichever
    class Order:
        name: str
        date: datetime.datetime
        ...

    def parse(filename):
        data = []
        errors = []
        reader = csv.DictReader(open(filename))
        for row in reader:
            try:
                date = datetime.datetime.fromisoformat(row["date"])
            except ValueError:
                errors.append(("Invalid date", row))
                continue
            if date < datetime.datetime(2021, 1, 1):
                errors.append(("Last year's data", row))
                continue
            # etc.
            data.append(Order(name=row["name"], date=date, ...))
        return data, errors

And if your file format changes, you make the change in one spot; you've kept the code DRY.

quickthrower24y ago

WJW4y ago

CuriousCosmic4y ago

foota4y ago

Data validation?

Vosporos4y ago

Wonderful piece, it has really opened my eyes on foreign data ingestion.

pansa24y ago

“Parse, don’t [just] validate”.

I first thought this article was recommending doing the latter instead of the former, but the actual recommendation (and I believe best practice) is to do both.

b3morales4y ago

You seem to suggest that it's possible to parse without validating, which I'm not sure I follow. Surely validation is just one of the phases or steps of parsing?

pansa24y ago

nsajko4y ago

The point is that validation is (or should/can be) a byproduct of parsing. I.e., you shouldn't "do both", rather the validation should be encompassed by the parsing, as much as it makes sense.

bruce3434344y ago

This still sounds like validation but with extra steps. (or less?)

_vufv4y ago

The post is saying:

- don't drop the info gathered from checks while validating, but keep track of it

- if you do this, you'll effectively be parsing

- parsing is more powerful that validating

"Extra steps" would be keeping track of info gathered from checks.

bruce3434344y ago

vvillena4y ago

justinpombrio4y ago

1 more reply

kortex4y ago

> verify and validate once, then put it in a specially marked datastructure

Not to steal vvillena's thunder, but that's pretty much the dictionary definition of "parsing"

> analyze (a string or text) into logical syntactic components, typically in order to test conformability to a logical grammar.

1 more reply

kortex4y ago

Validation is checking if something looks like a data structure. Parsing is smashing data into a data structure, and failing out if you can't do it.

At the end of parsing, you have a structure with a type. After validation, you may or may not have a structure with that type, depending on how you chose to validate.

But if your type doesn't catch all your invariants, yeah it does feel kinda just like validation.

dgb234y ago

It's more like putting the "extra steps" at the right place in the code.

gbrown_4y ago

Previous discussion https://news.ycombinator.com/item?id=21476261

AlexSW4y ago

I couldn't agree with this post more.

Another nice advantage is that the types are providing free/self- documentation, too.

TheAceOfHearts4y ago

Related discussion from last month which links to this article in the repo description: https://news.ycombinator.com/item?id=27166162

pengwing4y ago

I'd call it: pattern match, don't validate.