The development process is totally different when you write structured types first and then write your logic. 10/10 would recommend.
Usual caveat: this is what makes sense to me and my brain. Your experience may be different based on neurotype.
Unless you were writing very small throwaway scripts, in what world where you writing your logic first and thinking about your data structures later?
I’d argue that the more experience you get the more you write code for other people which involves adding lots of tooling, tests, etc. Even if the code works the first time, a more senior dev will make sure others have a “pit of success” they can fall into. This involves a lot more than just some “unit tests as an afterthought to keep the coverage up.”
Keeping the code simple, finding the right abstractions, untangling coupling, gets the most bang for the buck. See the “beyond pep8” talk for a enlightened perspective.
That said, lightweight testing and tools like pyflakes to prevent egregious errors helps an experienced dev write very productively. Typing helps the most with large, venerable projects with numerous devs of differing experience levels.
It immediately tells me that they've never worked on large software projects, and if they have they haven't worked on ones that lasted more than a few months.
I apologize to folks reading this for my rather aggressive tone but I've been writing software for a long time in numerous languages, and people with the unit tests as an afterthought attitude are typically rather arrogant in fool hardy.
The most recent incarnation I've encountered is the hotshot data scientist who did okay in a few Kaggle competitions using Jupyter notebooks, and thinks they can just write software the way they did for the competitions with no test of any kind.
I had one of these on my team recently and naturally I had to do 95% of the work to turn anything he produced into a remotely decent product. I couldn't even get the guy to use nbdev, which would have allowed him to use Jupyter to write tested, documented, maintainable code.
I hated working with those coders because they weren't really very good and their code was always the worst to maintain. They are the equivalent of a carpenter who brags about how quickly they can bang nails but can't build a stable structure to save their life.
The best argument I've heard for doing type annotation is for documentation purposes to help future devs. But I don't completely buy this either. I touch new codebases all the time and I rarely spend much time thinking about what types will be passed. I can only assume it comes with experience.
Type annotation actually ends up taking a hell of a long time to do and is of questionable benefit if some of the codebase is not annotated. People spend sometimes hours just trying to get the type checker to say OK for code that actually works just fine!
I'm currently using marshmallow in a project, specifically using the functionality that builds parsers from dataclasses.
I was curious what the differences were.
I haven't used pydantic's ORM integration, but I don't hesitate to use pydantic models everywhere as business logic classes unless I need ludicrous speed.
That's all opinion, but I'd definitely give pydantic a swing.
My cons are: - Uses some dsl to define types - Doesn’t marshal to model objects by default, but from DICT to DICT
My pro is: - much more configurable and powerful
However, it does have a strongly opinionated approach to casting that can sometimes yield non-obvious results. This behavior is documented and I would suggest new potential adopters of the library to explore this casting/coerce feature in the context of your product/app requirements.
For the most part, it's not an huge issue, but I've run into a few surprising cases. For example, sys.maxint, 0, '-7', 'inf', float('-inf') are all valid datetime formats.
- https://pydantic-docs.helpmanual.io/usage/models/#data-conve... - https://gist.github.com/mpkocher/30569c53dc3552bc5ad73e09b48...
That's absolutely a valid and useful annotation. It tells me, and autocomplete, that "x" is probably a str, more likely than not, but I need to be aware that it might not be.
In our app, django essentially sits between a compute cluster and a front end. The pydantic objects are used to define the work to be done on the compute cluster.
There's beautiful clarity in the articulation, and the essence is easy to grasp yet powerful. It reminds me a bit of Scott Wlaschin's Railway Oriented Programming (ROP) [0]. As a technique, ROP nicely complements "parse don't validate". As an explanation, it's similarly simple yet wonderfully effective.
I've a real admiration for people who can explain and present things so clearly. With ROP, for example, the reader learns the basics of monads without even realising it.
I feel that we don't put enough value, these days, on the ability to write clear, articulate exposition. Also, I believe that many people are not willing to read articles, books, or papers, of any meaningful length.
Everything needs to be boiled down to <10 min. read time, or <18 min. TED talks.
Anyways, I definitely agree.
The essential point of this blog post is to avoid "shotgun parsing", where parsing/validating is done just from a procedural standpoint, where it matters when exactly it happens. In the paper "Out of the Tar Pit" it is asserted that this leads to "accidental complexity" (AKA "pain and anxiety"), which is something every programmer has experienced before, possibly many times.
I've become a fan of declarative schema to (json-schema/OpenApi, clojure spec etc.) to express this kind of thing. Usually this is used at the boundaries of an application (configuration, web requests etc.) but there are many more applications for this within the flow of data transformations. If you apply the "parse don't validate" principle you turn schema-validated (sic!) data into a new thing. Whether that is a "ValidatedData" type or meta data, a pre-condition or runtime check says more about the environment you program rather than the principle in discussion. The benefit however is clear: Your code asserts that it requires parsed/validated data where it is needed, instead of when it should happen.
There's a follow-up article by the same author (that I unfortunately can't find), in which she explains this point.
As an example, returning a NonZero newtype over Int is not as type safe as using an ADT that lacks a zero value altogether. Using a NonEmpty newtype over List is not as type safe as using the NonEmpty ADT that has an element as part of its structure.
Basically newtype still has use, but it is not as airtight as a well-designed ADT.
I think this is it: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...
Every time I read something like this my mind translates it to "after building an ad-hoc compiler you can do all the things a compiler can do. Just not as well, but you can do it." -- Same with "I don't need a compiler, my tests stop all this kind of bugs"
I’ve run across DSLs that have three or more layers of parsing and validation. Embedding different languages within each other (eg: JSON snippets within your own DSL) definitely leads to the issues the article talks about.
Also, growing your own parser without understanding standard lexer/parser basics seems far more common than it ought to be. I’m not talking brilliant design, rather the extremely naive one-character-at-a-time-in-a-really-complex-loop variety of design.
The better level of bad is, “I know what lexers/parsers are, now I’ll write something to basically implement a type-checking parser with the lexed+parsed tree as input.”
This article is basically stating, “Why not just get your parser to do it all for you in one swell foop?” When I have refactored code to follow this kind of design, I have never regretted the outcome.
import * as $ from '@appliedblockchain/assert-combinators'
const validateFooBar = (
$.object({
foo: $.string,
bar: $.boolean
})
)
// probably roughly equivalent to
/*
const validateFooBar = (x) => {
console.assert(
typeof x === 'object' &&
typeof x.foo === 'string' &&
typeof x.bar === 'boolean'
)
return x
}
*/
const test1 = { foo: "abc", bar: false }
const test2 = { foo: 0, quux: true }
const { foo, bar } = validateFooBar(test1) // ok
const oops = validateFooBar(test2) // throws error
the source is pretty readable too if you want to get an idea how it works.https://github.com/appliedblockchain/assert-combinators/blob...
https://github.com/appliedblockchain/assert-combinators/blob...
They don’t go into deep category theory, you won’t find monads and friends, they are first level, straightforward any typescript developer can pick up in minutes - this is by design. It stops at combinators in typescript to solve very specific problem and nothing more. Haskell in ts is not the goal of this npm.
Mathematicians: Parsing is validation
This paper is an April's fool joke. I didn't think people could take that one seriously. I guess it's a good April's fool then. :)
So is the joke on Computer Scientists or Mathematicians? You decide ;)
Beware of bugs in the above code; I have only proved it correct, not tried it --Donald Knuth
This paper.... uh.... what exactly is it good for?
I suppose it could be kind of nice as some kind of undergraduate paper writing project kind of thing but it looks too professional for that.... I am kind of at a loss why this was written. Maybe it is some strange kind of satire....
[1]: http://www.sigbovik.org/ [2]: http://www.sigbovik.org/2021/proceedings.pdf
Source: I attended SIGBOVIK a few times in grad school.
Code is data after all.
General case: Validating random data as input into some program.
Particular case: Validating random source code (data) as input into some compiler (program).
Do compilers parse or validate?
> "the converse of ‘parsing is validation’ is not true."
If that were the case then you should be able to give an example of a compiler validating random source code (data) but not parsing it.
What determines the validity of random input is precisely a compiler's ability to parse it.
If you see it differently you are implicitly assuming a non-formalist perspective on what "validation" means. Tell us about it.
Are you talking about a bijective mapping or are you saying it's a synonym for identical?
Because the former doesn't make any sense here and the latter is not true.
Red is a color does not imply that all colors are red.
basically ist’s just functions that take a value of one type and return a other one
I personally use myzod as its fast it parsing, zero dependancies and you can infre types from your schemas.
Maat was created before dataclasses existed. For validation Maat offers the same. But it also allows for some really neat features such as validation on encrypted data. https://github.com/Attumm/Maat/blob/main/tests/test_validati...
Since validation is written as dictionaries its possible to store the validations in caching db such as Redis.
And since its simple its easy to extend for anyone use case. And there are no other dependencies.
Benchmarks of pydantic has Maat around twice as Pydantic.
Benchmarks of pydantic has Maat around twice the speed of Pydantic
And it's getting some wide adoption, for instance FastAPI which uses it for request validations.
The points you made are all very valid points.
At my employer we use both projects. If the data is very nested, or really large Maat is used.
Some points really elude me because Haskell uses many symbols and is very dense.
> IME, people in dynamic languages almost never program this way, though—they prefer to use validation and some form of shotgun parsing. My guess as to why? Writing that kind of code in dynamically-typed languages is often a lot more boilerplate than it is in statically-typed ones!
I feel that once you've got experience working in (usually functional) programming languages with strong static type checking, flakey dynamic code that relies on runtime checks and just being careful to avoid runtime errors makes your skin crawl, and you'll intuitively gravitate towards designs that takes advantage of strong static type checks.
When all you know is dynamic languages, the design guidance you get from strong static type checking is lost so there's more bad design paths you can go down. Patching up flakey code with ad-hoc runtime checks and debugging runtime errors becomes the norm because you just don't know any better and the type system isn't going to teach you.
More general advice would be "prefer strong static type checking over runtime checks" as it makes a lot of design and robustness problems go away.
Even if you can't use e.g. Haskell or OCaml in your daily work, a few weeks or just of few days of trying to learn them will open your eyes and make you a better coder elsewhere. Map/filter/reduce, immutable data structures, non-nullable types etc. have been in other languages for over 30 years before these ideas became more mainstream best practices for example (I'm still waiting for pattern matching + algebraic data types).
It's weird how long it's taking for people to rediscover why strong static types were a good idea.
One trick I've found very useful is to realise that Maybe (AKA Option) can be though of as "a list with at most one element". Dynamic languages usually have some notion of list/array, which we can use as if it were a Maybe/Option type; e.g. we can follow a 'parse, don't validate' approach by wrapping a "parsed" result in a list, and returning an empty list otherwise. This allows us to use their existing 'map', 'filter', etc. too ;)
(This is explored in more detail, including links to logic programming, in https://link.springer.com/chapter/10.1007%2F3-540-15975-4_33 )
If we want to keep track of useful error messages, I've found Scala's "Try" type to be useful ('Try[T]' is isomorphic to 'Either Throwable T'). Annoyingly, built-in sum type; the closest thing is usually a tagged pair like '[true, myFoo]'/'[false, myException]', which is pretty naff.
I've found scala or even LINQ to really hammer down this point, even to those who aren't into FP very much. Doing that map/flatmap makes it click for just about anyone
Not being a big fan of method chaining, a null saavy foreach would probably eliminate most of my null checks, need for Optional.
Static types are awesome for local reasoning, but they are not that helpful in the context of the larger system (this already starts at the database, see idempotency mismatch).
Code with static types is sometimes larger and more complex than the problem its trying to solve
They tightly couple data to a type system, which (can) introduce incidental complexity >(I'm still waiting for pattern matching + algebraic data types) This is a good example, if you pattern match to a specific structure (e.g. position of fields in your algebraic data type), you tightly coupled your program to this particular structure. If the structure change, you may have to change all the code which pattern matches this structure.
match event.get():
case Click((x, y)):
handle_click_at(x, y)
(Example from PEP 636[1].)In both Python and statically typed languages you can avoid this by matching against field names rather than positions, or using some other interface to access data. This is an important design aspect to consider when writing code, but does not have anything to do with dynamic programming. The only difference static typing makes is that when you do change the type in a way that breaks existing patterns, you can know statically rather than needing failing tests or runtime errors.
The same is true for the rest of the things you've mentioned: none are specific to static typing! My experience with a lot of Haskell, Python, JavaScript and other languages is that Haskell code for the same task tends to be shorter and simpler, albeit by relying on a set of higher-level abstractions you have to learn. I don't think much of that would change for a hypothetical dynamically typed variant of Haskell either!
[1]: https://www.python.org/dev/peps/pep-0636/#matching-sequences
> The same is true for the rest of the things you've mentioned: none are specific to static typing!
Sure, I could be wrong here. I frequently am. But could you point out why do you think that?
When using a data structure, I know what set of fields I expect it to have. In TypeScript, I can ask the compiler to check that my function's callers always provide data that meets my expectations. In JavaScript, I can check for these expectations at runtime or just let my function have undefined behavior.
Either way, if my function's assumptions about the data's shape don't turn out to be correct, it will break, whether or not I use a dynamic language.
It seems that most of the people who make this argument against static typing are actually arguing against violations of the Robustness Principle[0]: "be conservative in what you send, be liberal in what you accept".
A statically typed function that is as generous as possible should be no more brittle against outside change than an equally-generous dynamically typed function. The main difference is that the statically typed function is explicit about what inputs it has well-defined behavior for.
Even a lot of JSON or XML parsing, you throw it into a parser and take what you need; if an unrelated field isnt what you expected, just move on with things rather than stop everything because the library author forgot about extension or openness possibilities (ie: an xs:any in a schema).
This attitude comes from the assumption not that types are unhelpful; just the chances we've modelled every outcome into our view of the world and gotten that right is unlikely.
When you get to complex systems with state changes to data and strict, well defined policies, rules engines, etc? Thats where dynamic languages often start adding all of that validation layer, to assert for right now you should act more like a type system and it's important - it probably has financial or security or other risks.
Recently, I spent 3 years on Scala then switched jobs and spent 3 years in Ruby.
It was a shock to go back to dynamic languages, but after 3 months, I honestly couldn't tell which felt more productive or led to more stable high quality product.
In Ruby, we had all the issues people point out about dynamic languages, but the product didn't lean heavily on complex data structures or algorithms. We embraced complexity and failure and get good processes, designs and practices to deal with this.
In Scala, we had more rigour, but I also know I spent a lot of time on type design. Once things were sorted there was a lot of confidence in it, but generally, it took a lot longer to get there.
For certain systems that is absolutely worth it, for others (and in my case) it did feel like the evolution of the product meant this effort never really paid off.
https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...
I do believe that, for long lasting, larger projects, static typing tends to make the code easier to maintain as time goes on. But not every project is like that. In fact, not every project uses a single language. Some use statically typed languages for some parts, and dynamically typed for others (this is common in web dev).
Static typing really appeals to me on a personal level. I enjoy the process of analysis it requires. I love the notion of eliminating whole classes of bugs. It feels way more tidy. I took Odersky's Scala class for fun and loved it.
But in practice, they're just a bad match for projects where the defining characteristic is unstable ground. They force artificial clarity when the reality is murky. And they impose costs that pay off in the long run, which only matters if the project has a long run. If I'm building something where we don't know where we're going, I'll reach for something like Python or Ruby to start.
This has been brought home to me by doing interviews recently. I have a sample problem that we pair on for an hour or so; there are 4 user stories. It involves a back end and a web front end. People can use any tools they want. My goal isn't to get particular things done; it's to see them at their best.
After doing a couple dozen, I'm seeing a pattern: developers using static tooling (e.g., Java, TypeScript) get circa half as much done as people using dynamic tooling (Python, plain JS). In the time when people in static contexts are still defining interfaces and types, people using dynamic tools are putting useful things on the page. Making a change in the static code often requires multiple tweaks in situations where it's one change in the dynamic code. It makes the extra costs of static tooling really obvious.
That doesn't harm the static-language interviewees, I should underline. The goal is to see how they work. But it was interesting to see that it wasn't just me feeling the extra costs. And those costs are only worth paying when they create payoffs down the road.
On the other hand, as the initial code grows and grows, the cost of moving it all to a saner language grows too... discouraging a rewrite. So we end up with very complex production software that started as dynamic and is still dynamic.
The rebuttal to this is provided by another post from Alexis King: https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-typ...
> This story sounds compelling, but it isn’t true. The flaw is in the premise: static types are not about “classifying the world” or pinning down the structure of every value in a system. The reality is that static type systems allow specifying exactly how much a component needs to know about the structure of its inputs, and conversely, how much it doesn’t.
Static typing helps you to safely model the parts of the system that you actually do know about, while allowing you to leave the unknown parts loosely defined. And when business requirements change, the compiler helps you to make those changes in a safe, guided way. This is why people often report the experience of doing large refactorings driven by compiler error messages, then running and finding that the change works correctly on the first try. A confidence few report in dynamically-typed languages.
> After doing a couple dozen, I'm seeing a pattern: developers using static tooling (e.g., Java, TypeScript) get circa half as much done as people using dynamic tooling (Python, plain JS).
I hope you understand and accept the fact that interview code challenges are not very representative of real-world software engineering. On-the-spot interview code is throwaway; real production code is not.
When the problem space is less well defined the type-related boiler plate adds a lot of friction. It's not impossible to overcome that friction but it slows down progress. When you're under a tight deadline development velocity is often more valuable than absolute correctness or even overall runtime efficiency.
An delivered product that works is usually more valuable than an undelivered product that's more "correct" or efficient. A development project is just a cost (for various values of cost) until it ships.
This was more aimed at people who are new to the idea of parsing over validating. In a strong statically typed language, the type system would naturally guide you to use this approach so if this isn't natural to you then time in other languages would probably be worthwhile.
I don't think it's weird. Most of those languages were not popular in industry for various reasons, and the ones that were (especially in say, the 90s) did not have particularly capable static type systems. The boilerplate/benefit ratio was all off.
The way I describe this dichotomy personally is, I would rather use Ruby than Java 1.5. I would rather use Rust than Ruby. (Java 1.5 is the last version of Java I have significant development experience in, and they've made the type system much more capable since those days.)
I never heard about the former.
But even if these were the same thing and we want to be a bit pendantic since this is HN after all, structural type systems often support some kind of subtyping or row polymorphism, but it's not a strict requirement for a type system to be "structural". You could have a structural type system that doesn't allow
{ a: int, b: int }
to be used where { a: int }
is expected. How practical such a type system would be... I don't know. Flow type checker for JavaScript makes a distinction between "exact" types, i.e. object must have exactly the properties listed and no more, and "inexact" types where such subtyping is allowed.Now, even in the rare case where I write some Python, JS or PHP, I write it in a very statically typed style, immediately parsing input into well-thought-out domain classes. And for backend services, I almost always go with 3 layers of models:
1) Data Transfer Objects. Map directly to the wire format, e.g. JSON or Protobuf. Generally auto-generated from API specs, e.g. using Open API Generator or protoc. A good API spec + code gen handles most input validation well
2) Domain Objects. Hand written, purely internal to the backend service, faithfully represent the domain. The domain layer of my code works exclusively with these. Sometimes there’s a little more validation when transforming a DTO into a domain model
3) Data Access Objects. Basically a representation of DB tables. Generally auto-generated from DB schemas, e.g. using libs like Prisma for TS or SQLBoiler for Go
Can’t imagine going back to the “everything is a dictionary” style for any decent sized project, it becomes such a mess so quickly. This style is a little more work up front, when you first write the code, but WAYYYYYY easier to maintain over time, fewer bugs and easier to modify quickly and confidently, with no nasty coupling of your domain models to either DB or wire format concerns. And code gen for the DTO and DAO layers makes it barely more up-front work.
Its weird how long its taken for languages with static typing and type systems designed for correctness (and designed well for that end) rather than princupally for convenience of compilation to be available that are generally usable (considering licensing model, features, ecosystem, etc.)
I wonder how many people the author met.
For example, one good reason why strong static types are a bad idea... they prevent you from implementing dynamic dispatch.
Routers. You can't have routers.
https://www.geeksforgeeks.org/dynamic-method-dispatch-runtim...
And using it doesn't give up any of Java's type safety guarantees. The arguments and return type of the method you call (which will be invoked with dynamic dispatch) are type checked.
When a router does lookups from a static table it's "static routing".
When Java does lookups from a static table it's "dynamic dispatch".
The same type of computation is being characterised as both "static" and "dynamic".
When a router does lookups from a dynamic table it's "dynamic routing" - there is no equivalent in Java because making the dispatch table reflexive/mutable is precisely what violates type-safety!
Compile time is not runtime.
The Java compiler is not the JVM.
The compiler does type checking. The JVM does the dynamic dispatching.
Neither does both.
Neither C++ nor Rust give you static type safety AND dynamic dispatch because all of the safety checks for C++ and Rust happen at compile time. Not runtime.
With the right amount of indirection/abstraction you can implement everything in Assembly.
But you don't. Because you like all the heavy lifting the language does for you.
First Class citizens is what we are actually interested in when we talk about programming language paradigm-choices.
Does my thing have a different name? Where can I read up on how to do that best?
I see they’ve raised a lot of money. Does anyone know what their revenue model is?
Here's validating a CSV in Python (which I'm using because it's a language that's, well, less excited about types than the author's choice of Haskell, to show that the principle still applies):
def validate_data(filename):
reader = csv.DictReader(open(filename))
for row in reader:
try:
date = datetime.datetime.fromisoformat(row["date"])
except ValueError:
print("ERROR: Invalid date", row)
if date < datetime.datetime(2021, 1, 1):
print("ERROR: Last year's data", row))
# etc.
return errors
def actually_work_with_data(filename):
reader = csv.DictReader(open(filename))
for row in reader:
try:
date = datetime.datetime.fromisoformat(row["date"])
except ValueError:
raise Exception("Wait, didn't you validate this already???")
# etc.
Yes, it's a kind of silly example, but - the validation routine is already doing the work of getting the data into the form you want, and now you have some DRY problems. What happens if you start accepting additional time formats in validate_data but you forget to teach actually_work_with_data to do the same thing?The insight is that the work of reporting errors in the data is exactly the same as the work of getting non-erroneous data into a usable form. If a row of data doesn't have an error, that means it's usable; if you can't turn it into a directly usable format, that necessarily means it has some sort of error.
So what you want is a function that takes the data and does both of these at the same time, because it's actually just a single task.
In a language like Haskell or Rust, there's a built-in type for "either a result or an error", and the convention is to pass errors back as data. In a language like Python, there isn't a similar concept and the convention is to pass errors as exceptions. Since you want to accumulate all the errors, I'd probably just put them into a separate list:
@attr.s # or @dataclasses.dataclass, whichever
class Order:
name: str
date: datetime.datetime
...
def parse(filename):
data = []
errors = []
reader = csv.DictReader(open(filename))
for row in reader:
try:
date = datetime.datetime.fromisoformat(row["date"])
except ValueError:
errors.append(("Invalid date", row))
continue
if date < datetime.datetime(2021, 1, 1):
errors.append(("Last year's data", row))
continue
# etc.
data.append(Order(name=row["name"], date=date, ...))
return data, errors
And then all the logic of working with the data, whether to actually use it or to report errors, is in one place. Both your report of bad data and your actually_work_with_data function call the same routine. Your actual code doesn't have to parse fields in the CSV itself; that's already been done by what used to be the validation code. It gets a list of Order objects, and unlike a dictionary from DictReader, you know that an Order object is usable without further checks. (The author talks about "Use a data structure that makes illegal states unrepresentable" - this isn't quite doable in Python where you can generally put whatever you want in an object, but if you follow the discipline that only the parse() function generates new Order objects, then it's effectively true in practice.)And if your file format changes, you make the change in one spot; you've kept the code DRY.
The argument is that if you need to interact with or operate on some data you shouldn't be designing functions to validate the data but rather to render it into a useful output with well defined behaviour.
Say I have a string that’s supposed to represent an integer. To me, “Validate” means using a regex to ensure it contains only digits (raising an error if it doesn’t) but then continuing to work with it as a string. “Parse” means using “atoi” to obtain an integer value (but what if the string’s malformed?) and then working with that.
I first thought this article was recommending doing the latter instead of the former, but the actual recommendation (and I believe best practice) is to do both.
- don't drop the info gathered from checks while validating, but keep track of it
- if you do this, you'll effectively be parsing
- parsing is more powerful that validating
"Extra steps" would be keeping track of info gathered from checks.
I also find this a confusing use of the word "parse", and it's not explained in the post, and I think "parse, don't validate" is a poor slogan as a result. The traditional slogan is "make illegal states unrepresentable", though that's a bit narrower of a concept.
Not to steal vvillena's thunder, but that's pretty much the dictionary definition of "parsing"
> analyze (a string or text) into logical syntactic components, typically in order to test conformability to a logical grammar.
Parsing is taking some collection of symbols, and emitting some other structure that obeys certain rules. Those symbols need not be text, they can be any abstract "thing". A symbol could be a full-blown data structure - you can parse a List into a NotEmptyList, where there's some associated grammar with the NEL that's a stricter version of the List grammar.
At the end of parsing, you have a structure with a type. After validation, you may or may not have a structure with that type, depending on how you chose to validate.
But I think the big win is, parsers are usually much easier to compose (since they themselves are structured functions) and so if you start with the type first, you often get the "validation" behavior aspect of parsing for "free" (usually from a library). Maybe title should have been "Parse, don't manually validate."
But if your type doesn't catch all your invariants, yeah it does feel kinda just like validation.
I found myself replacing the configuration parsing code in a C++ project that was littered with exactly the validation issues described, and converted it to that which the author advocates. The result was a vastly more readable and maintainable codebase, and it was faster and less buggy to boot.
Another nice advantage is that the types are providing free/self- documentation, too.
Gotta go and program more Elixir...