Type Inference That Sticks (opens in new tab)

(jaredforsyth.com)

67 pointsjaredly3y ago36 comments

36 comments

Imagine debugging a type inference problem when there's all this hidden state

No, I'm sorry, but type annotations are where the conversation happens. It's very very important in my view that type inference is a pure - ideally easy to mentally model - function of the source code you're looking at. Type annotations let you negotiate with and probe that function, and then once you understand what it's doing and have gotten it in a good place you can remove all the ones that aren't needed

Adding (hidden!) state to that process sounds like my worst nightmare as a programmer

jaredlyOP3y ago

So I think "easy to mentally model" gets harder and harder to achieve as the type system becomes more powerful. If your conclusion is that we keep the type system limited in power, that's valid! but not what I'm exploring.

Another point that I failed to make in the post (thanks for the pushback!) is that type inference algorithms leave you high & dry in the presence of a type error. Then you're left with lots of hidden state (the internal steps the algorithm took before arriving at the error) and the final conclusion, which, if you're lucky, points you to two places in the code where the inferred types are in disagreement.

With my proposed system, all types are annotated all the time (to be shown or hidden via editor flag, or on hover), and the annotations are updated in response to actions taken within the editor. The "algorithm" becomes extremely simple, with almost no intermediate steps.

Of course, proof will be in the pudding, if I can actually achieve a pleasant editing experience :)

brundolf3y ago

> all types are annotated all the time (to be shown or hidden via editor flag, or on hover), and the annotations are updated in response to actions taken within the editor

So there are two possible reasons I can see for automatically "annotating" all the types:

- Caching that info to easily show in the editor (which is a good feature, but many editors already do this without having to modify the source; it happens entirely editor-side)

- Using the "current" inferred type as a jumping-off point for determining the "next" inferred type

The second was my original interpretation, and what sounded so distressing as a user

Consider this situation:

1. Your program and the inferred types are in one state

2. You modify some code which changes an inferred type

3. You change the (visible) code back to what it was previously, but now the inferred type is different because it was partially based on the previous inferred type

This is what sounds like a nightmare, assuming I understand correctly that it's possible on the described system. The inferred types are now a state machine, where it matters not just what code is on screen, but how it got there.

1 more reply

Someone3y ago

> Another point that I failed to make in the post (thanks for the pushback!) is that type inference algorithms leave you high & dry in the presence of a type error. Then you're left with lots of hidden state (the internal steps the algorithm took before arriving at the error) and the final conclusion, which, if you're lucky, points you to two places in the code where the inferred types are in disagreement.

How much state gets hidden is 100% on the quality of the implementation, isn’t it? There’s nothing that forbids an implementation from dumping that hidden state in whatever form it wants to (as an example, compare C++ diagnostics of different compilers or different versions of a compiler)

> The "algorithm" becomes extremely simple, with almost no intermediate steps.

I don’t see how it can become simple merely be the way things are presented. If you write f(foo,qbar,baz,quux) in a language where overload resolution is complex and the compiler has 20 overloads to pick from, but can’t find a ‘best’ one, it still has to show you what overloads it considered, how it ranked them and why, and which ones were left.

So, if you want type errors to be extremely simple, I think you need a simple language (type-wise)

SloopJon3y ago

Although I personally think type inference is the killer feature that makes static typing accessible (almost always auto), I've seen hostility to it, in particular among C++ and C# programmers. The point of this project, as you say, is to be as explicit as possible in the code, while at the same time presenting a view of the code that is less cluttered.

My initial impression is that if I were to use a system like this, I would want any type annotation that differs from the default inference to be visible. The only choice left, then, is when to display redundant annotations, which is where a lot of the disagreement around type inference is, even without this scheme.

lambdaxymox3y ago

Type inference occasionally bites one's hand when pushing data across an FFI boundary. In one instance I was writing some graphics code for a project and for some reason the colors in the rendering model were coming out all wrong on the screen (TL;DR it ended up looking like two of the color channels were missing from some texture maps) and it turns out that Rust inferred the type of the buffer as a vector of f64s instead of a vector of f32s. Putting the type in specifically fixed the problem promptly.

Lesson learned: Sometimes type inference fails, and always annotate your types at FFI boundaries!

brundolf3y ago

Yeah, it's important to annotate anything that interacts with the outside world, because that code/API isn't available for the compiler to reason about before run-time. Same goes with HTTP endpoints (on both the client and the server)

tel3y ago

While I can imagine worlds which improve upon this (working in Agda and Coq is a great way to see a glimpse of it) I honestly feel like my dominant way of working is conversational with the compiler.

I tend to write code for a bit, and then ask it one of two questions (1) what is the type of this thing or (2) is there any part of this that looks wrong?

In particular (2) is obviously type checking (which is great to run in a fast background loop like `cargo watch -x check`) and (1) is some system for "please insert an annotation on this variable with your current inferred type". This happens near constantly, and a very common "move" I make typing is to isolate some section and give it a temporary variable name just so I can receive that annotation.

The flow there could be better, to be sure.

The proposal here is kind of fascinating with regard to question (1), making it less interactive and more reactive. It reminds me of, say, working with Observable, where you make edits throughout your document and watch the dependents react.

To that end, being able to draw circles around fragments of code and have a persistent monitor that reactively outputs either "error: [reason]" or the type of that variable as I edit the code seems great.

Though I do feel a little uncomfortable with that being literally embedded in the code. Code being reactively edited by a program makes me itch with fear of file corruption.

gnulinux3y ago

> While I can imagine worlds which improve upon this (working in Agda and Coq is a great way to see a glimpse of it) I honestly feel like my dominant way of working is conversational with the compiler.

Agda programmer here. Not sure what you mean here. Although Agda does save a binary cache file to minimize type-checking latency (.agdai, "agda interface") with compressed type information, it does not have any kind of hidden state. This is very different than what the OP suggests.

Other than this, I agree with you. When I write code in a typed language I have a conversation with the compiler. I keep asking, is this ok, what do you think about this, what type is this... Which is why I find OP misguided. You can have a conversation with the compiler without hidden state.

tel3y ago

Sorry, not referring to the whole post wrt Agda, just the vision of interactivity that proof search and tactics give you.

jbandela13y ago

I think JetBrains actually does something like this. When writing Rust or Kotlin or C++ (these are the ones I have used) the editor will annotate the inferred types with the actual type and you can see it in the editor (with a slightly different font to differentiate).

realharo3y ago

VS Code also supports inlay hints for both inferred types, and parameter names https://code.visualstudio.com/docs/typescript/typescript-edi...

I think it's turned off by default for most languages though - probably considered unnecessary, as you can always just hover over the variable. Also, unlike the JetBrains IDEs, I don't think you can Ctrl+Click on such a hint to navigate to its definition in VS Code (like you can in normal code).

MrJohz3y ago

You can if the LSP enables it, I believe. In rust-analyzer, it's definitely possible to do this, for example, but I think that was a new feature added somewhere in the last few releases.

cma3y ago

Regular Visual Studio has this now too, at least for C++.

jaredlyOP3y ago

Yeah, it is definitely nice to have inferred types surfaced to you, but in these systems they are still re-computed on every textual edit, as opposed to being "sticky" (persisted in the source tree).

brundolf3y ago

Lots of IDEs do that for lots of languages these days, and it's wonderful, but I don't think that's the kind of thing they're talking about here in the article

IshKebab3y ago

Yeah they're talking about recording the type inferences in the source code. Interesting idea but I feel like it might fall down the same way all "not just text" ideas fall down - the whole coding world is built around plain text. Git, diff, GitHub, IDEs, etc.

Some inferred types in Rust can be pages long. Have fun with those merge conflicts!

rom-antics3y ago

I've seen a few experiments like this that store extra information behind the scenes. My first question is always how does it work with source control? Dealing with diffs/merges of that underlying serialization format would get fatiguing quick. I don't think the world will move over to something like this until there's a good answer there.

jaredlyOP3y ago

Source control does look very different in a projectional language, as git diffs no longer make much sense (viewing a pull-request with the source tree's JSON blob is essentially useless).

Unison is the language that's gone farthest with this, as far as I know; their solution to diffs & merges is to handle everything from within their CLI, bypassing git entirely. I imagine I'll do something similar.

dangoor3y ago

I don't know anything about how you're storing the source, but if the data elided from view could be stored in files next to the source that were more readable than JSON, you could potentially take advantage of the whole git ecosystem still. People could put code up for review and the reviewers could even confirm if the additional (normally out of sight) information made sense to them.

FWIW, I'm totally on board with the idea of focusing on the interactive experience, but history has shown it to be very difficult for language environments not based on files to cross the chasm into more common use. (If you don't care about lots of users, then ignore this entire comment! Plus, it's always _possible_ that something will cross over, but we've got decades of people trying...)

Anyhow, I'm really excited to see where you go with this!

1 more reply

CamJN3y ago

If I may, I’d suggest instead of shunning git, which is widely preferred as people’s revision control, perhaps consider writing a diff plugin for git to render out the diffs nicely, and include that tool with the compiler/interpreter install.

rictic3y ago

This hits on the biggest problem with projectional languages, and why text has dominated: they don't interoperate well with any existing developer tooling.

So the language implementors have to write everything the developer needs: code editor, source control, code review, code browser, etc

You're completely isolated from the wider ecosystem of development, and the benefits so far don't seem worth it. As described here, you save on the reference resolution phase of linking, which is nice, it should be faster and eliminate some classes of error.

The nice part about text as a common format is that it's common. If every language has to implement every tool then it's an N * M problem. When tools can be reused across languages they become N + M.

A user of a projectional language is giving up git, Github, their favorite code editor, and depending on language-specific tooling to replace all of that. And even when the lang-specific tooling is sufficient, it needs to be relearned and recustomized (think all the configuration just in a code editor: prefs, theme, extensions like VIM keybinds and Copilot, etc).

Is the benefit worth the cost?

majewsky3y ago

> bypassing git entirely. I imagine I'll do something similar.

So I'm assuming your language will have some sort of custom source code management system? If so, what do you do when a user wants to store a non-code file together with the source code, e.g. how web applications might have CSS files, fonts and image assets in the repo?

1 more reply

mteam883y ago

What about actual comments in the source code? Or type annotations if the language supports them...

mpweiher3y ago

This goes directly at one of the paradoxes of typing.

On the one hand, one of the main benefits of types, and the only one for which there is empirical evidence, is the (automatically checked) documentation effect. For this, the types have to be visible.

On the other hand, not having to type in the types all the time and having them inferred instead seems like a major convenience. For this type types cannot be visible.

Compared to that, the inefficiency of having to do the work over and over seems like a relatively minor issue, though compile times are getting ever more problematic.

So I really like the idea of having types that can be inferred and persisted, though I'd prefer the language to make some room for having "provisional" type annotations, so basically using all sources for getting the types: the user, inference, the running program (dynamic languages can find out about actual types at runtime).

layer83y ago

You already can have both, at least in certain contexts: In my IDE, when I write a function call, I can press Alt+Return to have the IDE assign the return value to a new explicitly-typed variable, with the type inferred from the call. Likewise, when for example the return type of the function changes, so that the type of the variable doesn’t match anymore, the IDE highlights that, and again I can press Alt+Return to have the declared type of the variable be updated to match. The point is, in those contexts I never have to actually type the variable type myself.

This is similar to what TFA proposes, and it doesn’t require storage of type information separately from the source code, or in some AST representation. The type information is simply “stored” in the plain-text source code itself, as type declarations/annotations.

I like this approach better than having the IDE merely render inferred type information alongside the source code, because (a) when the type information is part of the actual source code, then it is always visible, e.g. in diffs and source control tools, and (b) like TFA suggests it gives opportunity to interactively resolve type conflicts, and making those decisions “sticky”. With the right IDE tooling, you still wouldn’t need to type the types yourself mist of the time.

mpweiher3y ago

> In my IDE...

Sounds great! And very close to what I was thinking about.

What's your IDE?

> The type information is simply “stored” in the plain-text source code itself, as type declarations/annotations.

Yes, I think that's the right way to do it. I do believe that there needs to be some way to tell the difference between these three cases:

1. I put the type information there, that's really what I intended

2. The system put the type information there and I okayed it

3. This is what the system currently thinks, no human interaction

However, I believe this just from thinking about it, without having used such a system, so I may be completely wrong.

My background is also dynamically typed systems like Smalltalk, so the state of having no static type information available is also acceptable.

1 more reply

foota3y ago

An editor can allow seeing types without having them in the code.

revskill3y ago

If i got a weird typing issue and i have no clue about it, then i know my design is mostly wrong. That's one of usefulness of a static type checker for me.

mountainriver3y ago

This is really awesome, such a useful direction to pursue. Look forward to seeing where you take things

j / k navigate · click thread line to collapse

36 comments

brundolf3y ago

Imagine debugging a type inference problem when there's all this hidden state

Adding (hidden!) state to that process sounds like my worst nightmare as a programmer

jaredlyOP3y ago

Of course, proof will be in the pudding, if I can actually achieve a pleasant editing experience :)

brundolf3y ago

> all types are annotated all the time (to be shown or hidden via editor flag, or on hover), and the annotations are updated in response to actions taken within the editor

So there are two possible reasons I can see for automatically "annotating" all the types:

- Caching that info to easily show in the editor (which is a good feature, but many editors already do this without having to modify the source; it happens entirely editor-side)

- Using the "current" inferred type as a jumping-off point for determining the "next" inferred type

The second was my original interpretation, and what sounded so distressing as a user

Consider this situation:

1. Your program and the inferred types are in one state

2. You modify some code which changes an inferred type

3. You change the (visible) code back to what it was previously, but now the inferred type is different because it was partially based on the previous inferred type

1 more reply

Someone3y ago

> The "algorithm" becomes extremely simple, with almost no intermediate steps.

So, if you want type errors to be extremely simple, I think you need a simple language (type-wise)

SloopJon3y ago

lambdaxymox3y ago

Lesson learned: Sometimes type inference fails, and always annotate your types at FFI boundaries!

brundolf3y ago

tel3y ago

While I can imagine worlds which improve upon this (working in Agda and Coq is a great way to see a glimpse of it) I honestly feel like my dominant way of working is conversational with the compiler.

I tend to write code for a bit, and then ask it one of two questions (1) what is the type of this thing or (2) is there any part of this that looks wrong?

The flow there could be better, to be sure.

Though I do feel a little uncomfortable with that being literally embedded in the code. Code being reactively edited by a program makes me itch with fear of file corruption.

gnulinux3y ago

tel3y ago

Sorry, not referring to the whole post wrt Agda, just the vision of interactivity that proof search and tactics give you.

jbandela13y ago

realharo3y ago

VS Code also supports inlay hints for both inferred types, and parameter names https://code.visualstudio.com/docs/typescript/typescript-edi...

MrJohz3y ago

You can if the LSP enables it, I believe. In rust-analyzer, it's definitely possible to do this, for example, but I think that was a new feature added somewhere in the last few releases.

cma3y ago

Regular Visual Studio has this now too, at least for C++.

jaredlyOP3y ago

Yeah, it is definitely nice to have inferred types surfaced to you, but in these systems they are still re-computed on every textual edit, as opposed to being "sticky" (persisted in the source tree).

brundolf3y ago

Lots of IDEs do that for lots of languages these days, and it's wonderful, but I don't think that's the kind of thing they're talking about here in the article

IshKebab3y ago

Some inferred types in Rust can be pages long. Have fun with those merge conflicts!

rom-antics3y ago

jaredlyOP3y ago

Source control does look very different in a projectional language, as git diffs no longer make much sense (viewing a pull-request with the source tree's JSON blob is essentially useless).

dangoor3y ago

Anyhow, I'm really excited to see where you go with this!

1 more reply

CamJN3y ago

rictic3y ago

This hits on the biggest problem with projectional languages, and why text has dominated: they don't interoperate well with any existing developer tooling.

So the language implementors have to write everything the developer needs: code editor, source control, code review, code browser, etc

The nice part about text as a common format is that it's common. If every language has to implement every tool then it's an N * M problem. When tools can be reused across languages they become N + M.

Is the benefit worth the cost?

majewsky3y ago

> bypassing git entirely. I imagine I'll do something similar.

1 more reply

mteam883y ago

What about actual comments in the source code? Or type annotations if the language supports them...

mpweiher3y ago

This goes directly at one of the paradoxes of typing.

On the other hand, not having to type in the types all the time and having them inferred instead seems like a major convenience. For this type types cannot be visible.

Compared to that, the inefficiency of having to do the work over and over seems like a relatively minor issue, though compile times are getting ever more problematic.

layer83y ago

mpweiher3y ago

> In my IDE...

Sounds great! And very close to what I was thinking about.

What's your IDE?

> The type information is simply “stored” in the plain-text source code itself, as type declarations/annotations.

Yes, I think that's the right way to do it. I do believe that there needs to be some way to tell the difference between these three cases:

1. I put the type information there, that's really what I intended

2. The system put the type information there and I okayed it

3. This is what the system currently thinks, no human interaction

However, I believe this just from thinking about it, without having used such a system, so I may be completely wrong.

My background is also dynamically typed systems like Smalltalk, so the state of having no static type information available is also acceptable.

1 more reply

foota3y ago

An editor can allow seeing types without having them in the code.

revskill3y ago

If i got a weird typing issue and i have no clue about it, then i know my design is mostly wrong. That's one of usefulness of a static type checker for me.

mountainriver3y ago

This is really awesome, such a useful direction to pursue. Look forward to seeing where you take things

j / k navigate · click thread line to collapse