undefined | Better HN

0 pointsdec0dedab0de4y ago0 comments

I personally love it, and wish every library worked this way. My argument is why go out of my way to make it not work, when it would be easy to make it work. This is because I think of modules/packages as user facing programs that are easy to tie together, instead of simple building blocks.

What I really wish existed was a built in way to cast and validate, or normalize and validate. I never care if something is a string. I care that if I wrap it in str(), or use it in a fstring, the result matches a regex. Or if I run a handful of functions one of them returns what I need.

The only benefit I can see of type hints on their own is it makes it easy to change a callable's signature, but I think that's best avoided to begin with.

0 comments

nybble414y ago

> why go out of my way to make it not work, when it would be easy to make it work

The problem the DWIM approach to APIs is that when you go out of your way to "do something reasonable" with absolutely any kind of argument type, leaving the caller's intent implicit, you will sometimes run into combinations that "work" in unexpected—and often unwanted—ways.

For example, say you have a function which returns either a Person object or, in very rare cases, an error string. Moreover, you fail to check for the error string, and pass the result into another function which expects a Person object but will also take a name and look up the corresponding Person object in a table. Now if the first function fails you're left trying to look up an error string as a name, with no obvious signs (such as a type mismatch error) to show that anything is amiss.

It's important to make the intent explicit, and not just let the function guess. One option compatible with both statically- and dynamically-typed languages is to provide two functions, one requiring a Person object and another taking a name string. This is still perfectly ergonomic for the user and mitigates most of the potential for confusion.

dec0dedab0deOP4y ago

Well I only ever return one type from a function, I'm not a total madman. Sometimes I'll do one type or a None, if I'm trying to replicate the functionality of dict.get(). Any error string would be within an Exception, so that wouldn't be an issue, but even in your example it would show a stack trace to the function looking up the user, and would be much more valuable to troubleshoot than a type mismatch.

One option compatible with both statically- and dynamically-typed languages is to provide two functions, one requiring a Person object and another taking a name string. This is still perfectly ergonomic for the user and mitigates most of the potential for confusion.

In practice that is usually what I end up doing, but with a 3rd function that takes either and returns a Person object. In this particular case I would probably make the function be a method on the Person object, and have a class method to look up the Person.

Here is the scenario that annoyed me enough to turn me off static typing. I had a class that stored the IP address of a network device as an ipaddr.IPAddress object (now ipaddress in the standard library) and there were various subclasses for specific device types. One of the device types needed an SDK, and the init for the SDK class looked something like this

  def __init__(self, host, port=1234, scheme='https'):

      if not isinstance(ip, str):
         raise TypeError('invalid host')

      self.url = f"{scheme}://{host}:{port}"

If they didn't check the type it would have worked fine. Just like every other library we were using to connect to devices.

So after a bit of frustration we changed our base class

  def original__init__(self, ip_address):
      self.ip_address = ipaddr.IPAddress(ip_address)
  def new__init__(self, ip_address):
      ipaddr.IPAddress(ip_address) # just to validate
      self.ip_address = ip_address

and all was well with the world, but there was a dumb mistake waiting for us. A year or two later, after upgrading to 2.7 we started passing around unicode objects instead of strings to get ready for 3.x, as was the style at the time. Again that SDK broke, and only that SDK, because it insisted on checking the type. Sure it was our mistake this time for not having the original fix to be just casting it to str right before passing it to the SDK, but it was annoying and should have been unnecessary.

I understand that type hints are much better in this regard because it would only show an error in your tooling. But that brings me to another point.

I write my packages/classes/modules to mostly be used in a web app, or as scripts that run on a schedule. However, I also need to be able to write one-offs very quickly. When that happens my code that was previously a library for different applications, now becomes an application itself. Using the REPL, a jupyter notebook, or bpython, I will need to quickly get something done. In these scenarios I don't want to waste time remembering how to normalize the data being given to me. Especially If the code that provides such niceties is tucked away at a higher level for end users of the web app.

Like I said, I tend to just make a lookup function, and then have everything else be methods on the object. But that doesn't really help when it's parameters to a function. I really don't know what would make it better. Perhaps some kind of mix between function overloading and interfaces from other languages, and the magic *_validate() methods that Django uses. Maybe instead of type hints for return values we need value hints, that give an idea of what actual objects might look like. Then tooling could take into account if it would still work after validation and normalization. Of course it could be that there is no elegant and reliable way to do what I really want, but I can dream.

nybble414y ago

> Well I only ever return one type from a function, I'm not a total madman.

I'm sure your APIs are sane (at least to you). It's all the other developers you have to watch out for.

> … even in your example it would show a stack trace to the function looking up the user, and would be much more valuable to troubleshoot than a type mismatch.

A type mismatch would be caught earlier (even in a dynamic language) and the runtime exception should report the specific objects involved, so you still get the string which caused the problem.

> Here is the scenario that annoyed me enough to turn me off static typing.

To begin with, this example has nothing to do with static typing. It involves a runtime time check. In this case I would agree that the type check is too strict. Some languages have an interface or protocol for "string-like" objects (e.g. the to_str method in Ruby), and it would be better to use that rather than checking specifically for an instance of str. Objects which shouldn't be treated as strings just don't implement the protocol. Python has the __str__ magic method, but unfortunately it's not very useful in this regard since all objects implement it, even ones that are nothing like strings. It's more like Ruby's to_s method, used for formatting and debugging rather than as an indication that you have an actual string. The best recommendation I've seen for checking for "string-like" objects in Python is something like `str(x) == x`, though the extra comparison adds some overhead.

Of course that doesn't really help you since you were trying to pass an arbitrary non-string-like object (IPAddress) to a function expecting a string; the looser `str(x) == x` check would also have failed. The call might have "just worked" without the condition, or it might have failed spectacularly. In assuming that it would work without the type check you're depending on the implementation using string interpolation rather than, say, concatenating the strings with the + operator, which requires actual strings and not IPAddress objects since the + operator doesn't do implicit conversion like f-strings would. Static typing would have helped to limit these dependencies on unstable implementation details, letting you know that you need to fix the issue at the call site by passing `str(self.ip_address)` for the host parameter.

j / k navigate · click thread line to collapse

0 comments

nybble414y ago

> why go out of my way to make it not work, when it would be easy to make it work

dec0dedab0deOP4y ago

  def __init__(self, host, port=1234, scheme='https'):

      if not isinstance(ip, str):
         raise TypeError('invalid host')

      self.url = f"{scheme}://{host}:{port}"

If they didn't check the type it would have worked fine. Just like every other library we were using to connect to devices.

So after a bit of frustration we changed our base class

  def original__init__(self, ip_address):
      self.ip_address = ipaddr.IPAddress(ip_address)
  def new__init__(self, ip_address):
      ipaddr.IPAddress(ip_address) # just to validate
      self.ip_address = ip_address

I understand that type hints are much better in this regard because it would only show an error in your tooling. But that brings me to another point.

nybble414y ago

> Well I only ever return one type from a function, I'm not a total madman.

I'm sure your APIs are sane (at least to you). It's all the other developers you have to watch out for.

> … even in your example it would show a stack trace to the function looking up the user, and would be much more valuable to troubleshoot than a type mismatch.

A type mismatch would be caught earlier (even in a dynamic language) and the runtime exception should report the specific objects involved, so you still get the string which caused the problem.

> Here is the scenario that annoyed me enough to turn me off static typing.

j / k navigate · click thread line to collapse