This may interest you. This is probably going to be the 'official' way to get what you're talking about.
mypy and mypyc are interesting but their compile-time checks and optimizations are still hampered by Python's dynamic language semantics.
Smalltalk, for example you can completely change the structure of a class by sending a become: message.
What I think is missing is a bit of more PyPy love, and the Truffle and OpenJ9 Python support efforts.
I think the killer language will be typescript with access to both the python and JavaScript ecosystems. We'll see what that looks like.
And of course if something changes the syntax, better anonymous functions will be the absolute first thing I would look for...
I have not used TypeScript, but looking at it's documentation the syntax for type annotations look identical. Would you be willing to expand on why you think its approach is better / how it's different?
I think this is an extremely good idea. Python is horrible but forced on a huge number of developers because of its ecosystem ... I think a bridging layer from typescript to python could be built in a way similar to swift’s Python Interop — and I don’t think it would require any special language support ...
I think could actually make a better/easier to use/more robust design than Swift by requiring all interactions with the python interpreter from node be async.
We build our employee database, and from there our IDM, from a singel XML file in a really shitty format + three txt files in even worse formats (they are single line output files from an old mainframe system predating sap). We used to do it in a rather complicated Microsoft SSIS workflow with a lot of C# services. All in all it’s a 30 minute nightly runtime. I recently replaced it with around 500 lines of Python and a 1-5 minute Runtime (sometimes at the beginning of a school year we’ll see changes to around 1000 positions).
Python eats the XML like it wasn’t shit. It takes things like terrible date formats, we’re talking the output of a SAP free-text box shitty, and ports then seamlessly into a SQL date field. This alone was a nightmare in C# and Python just does it.
Still, after two decades of strict types it feels dangerous.
The high-level-variant is a dynamic language with optional typing, which is good for scripting, fast prototyping, fast time-to-market, etc.
The low-level-variant is similar to the high-level-variant (same syntax, same features mostly, same documentation), but it has no garbage collector, typing is mandatory and it runs fast like C/C++/Rust. Compiled packages that are written in the low-level-variant can be used from the high-level-variant without additional effort at all. The tooling to achieve this comes with the language.
A language like this would be insane, IMHO.
At the same time, I’d love a stronger type system to avoid a bunch of the pitfalls that the dynamism of python has.
So count me in.
I don't know however if this approach could be extended to other domains - say making a web framework. Given, python classes let you do so much tinkering, any attempts to port existing code will probably need a lot of rewriting?
> I've been tracking nim, and would agree it's the most promising so far! I feel though that it's trying to be too flexible in many ways. Examples of this include allowing multiple different garbage collectors and encouraging heavy ast manipulation. I'm also afraid it is different enough to keep it from attracting a significant amount of developers from the Python community. Nonetheless, it's something I plan on using and contributing to, since it's the best option so far.
Though, now that another commenter pointed out mypyc: https://github.com/mypyc/mypyc I believe I'll invest my limited free-time in that project instead, as it will allow me to stay within the Python community and eco-system that I love so much.
Gives some good insight into where Nim is going in the future too.
What takes 3 lines in Python, takes 10-30 on Go.
- I hate it's module system and package eco-system story. - I don't like its syntax. - I don't like its error handling. - I'd much prefer gradual typing. - I want to maintain the ability to use interactive interpreters. - I don't like the fact that instead of being community driven it is Google driven.
But, anecdotally, I see go being used as a second language to Python more than anything else and at an ever accelerating rate.
Yes, this !
That's why I hate Django and some flask app the most for, the fact that by importing a module, you're implicitly creating a database connection, and a lot of other magic stuff, which mean that now I can't import a constant defined in said module outside of `python manage.py`
Also as said below in the article, suddenly it's much harder to handle smoothly the "the database is momentary unavailable" (because someone has put the line starting the database connection in the global space of a module somewhere)
I much prefer frameworks/modules for which code is executed only once you invoke their "setup" function
It does create an object that can (lazily) connect to the database, so it needs the required database drivers installed. It also needs the required information about _how_ to connect to the database, so it needs the settings loaded.
That's why you need to use `django.setup()` before, to tell it what settings to load. You should never be importing random Django models without this configured, simply because they cannot be used and will not work. We think an exception saying "don't do this, call django.setup()" is less confusing at import time is than "Databases not configured" at runtime. Not that it would even reach that, because you might be using a field from a third party application that needs to be initialized (i.e INSTALLED_APPS configured) or that relies on a configured settings (maybe an encrypted field that needs your SECRET_KEY available).
Stop making it hard, just write a management command. It's super easy.
Django _does_ have a "setup" function. You can't import and use Django database connections outside of a running application without it.
Flask also has a "run" method and does no i/o without it.
In practice, this means that any script that depends indirectly on Django code will incur a lengthy startup cost (from having to call setup()), and will fail to run if there's no database connection, even if the script itself doesn't need the db.
In our codebase, we have pretty strict developer-enforced rules about not doing I/O at the module level, usually through the use of simple "Lazy" wrappers for module-level objects. I'd be curious to know what other approaches people have taken with Python here.
I always treated this a bit like single underscore private functions/methods, i.e., follow a convention that produces code that's easy to reason about, even if it's not strictly enforced by the language/compiler. So in practice this equates to separating out modules that mutate global state, and placing the majority of logic in "strict" modules that only declare a bunch of "pure" classes/routines. So the "non strict" code is really just a thin layer of wiring gluing everything together. For instance my Celery task files tend to be very thin.
my_db_conn: Lazy[DbConn] = Lazy(lambda: make_db_conn(...))
and MyPy will tell you if you're doing something silly when you try to use it.
EDIT: After typing up this response and submitting I realize you were talking about their strict approach rather than ours. whoops :)
Someone made a change that took down production because of non-deterministic outcomes? How about break out whatever they were changing into it's own service? With proper fallbacks, breaking that part shouldn't take down all of production again.
To be clear, I'm not saying microservices will solve all their problems or be less work. I'm just saying that with an equal level of effort, they would probably get more overall reliability by having multiple services, they'd be able to use multiple languages, whatever is suited to the task at hand, be able to deploy even more often with less risk, and be able to isolate these types of "change on import" behavior to a much smaller surface on any given deployment.
Yeah, now you'll have 10 interconnected services, 10x the complexity, and everything will have the ability to take down all of large parts of production, plus all the extra pain points of a distributed system...
You'll have added complexity with the network calls, which is why I said it wouldn't be any less work, just different work.
As you keep moving along, some things that depend on that first service will start calling the new service directly, and some will still call it in the monolith. But your tracking will tell you how often and who is doing that, so you can find out why.
In the meantime, nothing will break, because the monolith is still a pass through proxy to your service.
However, at their scale and with their engineering resources, I can only imagine an attitude of "we can make this work" (the monolith) is easier to justify. The same goes for the micro-services approach (except here you have to justify changing what has been working so far?)
I'd love to read more about the history behind this approach at Instagram.
It's hard to know anything about the stdlib as it can be monkey patched, e.g. [1]
That said, you could solve this with diagnostics; calculate signatures of stdlib functions and classes to find any known safe ones that were patched. Run that check in your test suite to find problematic imports.
> If the utils module is strict, then we’d rely on the analysis of that module to tell us in turn whether log_to_network is safe.
I like this. It seems far more usable than proposals like adding const decorators.[2]
[1]: https://github.com/gevent/gevent/blob/master/src/gevent/monk...
[1] https://www.tedinski.com/2018/03/20/wizarding-vs-engineering...
This is precisely what gradually typed languages — like TypeScript, Flow, and typed Pythons — solve!
I talked about this on Software Engineering Radio last week: https://www.se-radio.net/2019/10/episode-384-boris-cherny-on....
You have the madness of thousand of developers flinging code at the universe due to the easiness of browsers, JS, and npm.
This results in great speed, but not great quality.
When your project/company now wants quality, you keep your code but transition to types. (In OSS space, Angular and Yarn projects have both done JS => TS migrations of some form.)
I’ve never worked on a program so small that readability didn’t matter. I consider it a crucial ingredient of expressiveness and development speed.
Though your perspective could explain a few of the more atrocious code bases I’ve seen.
There's two many people that have swallowed SOLID whole and can no longer see good engineering as a trade-off against other factors.
For example, being strict about having the smallest possible public API and making most methods private protects me from future breakage that might never be an issue (I might never upgrade) but forces me copy/paste vast globs of your code into my own if I need access to something you didn't anticipate. (and that's assuming I have access to your source. Worst case is that I have to reimplement things that already exist in the code I'm interfacing with)
Python got this right. Private methods are a weak or strong hint that you might want to think twice before calling them. But you're the boss at the end of the day.
I think this is why it's easy to point a thousand things built in python which people use every day (like instagram), while in, say, haskell, there are barely a handful (pandoc, facebook spam filter, etc.).
[0] https://martinfowler.com/bliki/DesignStaminaHypothesis.html
I've never heard of this before... I love it. Thanks for bringing this up.
I'm not familiar with this use of the term "expressiveness".
My understanding is that expressiveness (as per "On the expressive power of programming languages", Felleisen 1991 [0]) has to do with capabilities that a language has that separate it from another language. C is more expressive than Python in that it gives you direct access to memory management, whereas Python is more expressive than C in that it provides inheritance/OO. (These are just examples.)
Type safety, performance, and readability are all wholly separate from expressiveness, I think. A language's type system and performance benchmarks have nothing to do with the expressive power of a language outright, and "readability" is entirely subjective to begin with.
So: would you mind elaborating on what you mean, exactly, by "expressiveness of [a] language" here?
---
In fact, most of what you (and the linked article) are talking about has to do with the dynamic/static spectrum, not this "wizarding/engineering" spectrum you've coined (though I do kind of like the idea of that for discussing development methodologies).
The article is all about how the dynamically-typed nature of Python allowed for rapid iteration at the beginning of the Instagram project, but has since hindered further progress as they've grown larger. But now they feel they can't just rewrite it all in a statically-typed language because of the engineering overhead involved.
On this note, I want to go to your last point:
> I wish there was a language that let you move gradually from one end to the other, exactly when you need to.
With regard to the dynamic/static distinction, there are languages that allow you to move "gradually from one end to the other", and they are (aptly) called gradually-typed languages.
Gradual typing was invented by Jeremy Siek and his PhD student, Walid Taha, back in the mid-2000s at Indiana [1]. In this discipline, you can have a statically-typed codebase with local dynamically-typed regions. You get all of the static guarantees for everywhere that they can be made, and dynamic regions impose runtime checks to ensure consistency. (This connects closely to contracts, which are primarily worked on by Robby Findler at Northwestern, I think.)
Unfortunately (to me), it seems like a lot of these languages are implemented in terms of existing dynamically-typed languages. For example, Sam Tobin-Hochstadt (Indiana) created Typed Racket, which is (of course) built upon Racket but provides a gradual typing discipline. Wherever possible, static types are checked, and everywhere else utilizes contracts to guarantee runtime consistency.
Anyway, all this is to say: the technology exists, technically, but is in its infancy. There's no doubt it'll be some time before it sees widespread use throughout industry. Sam wrote up a brief overview for the SIGPLAN Perspectives blog recently, if you're interested [2].
[0] https://www.sciencedirect.com/science/article/pii/0167642391...
[1] https://wphomes.soic.indiana.edu/jsiek/what-is-gradual-typin...
[2] https://blog.sigplan.org/2019/07/12/gradual-typing-theory-pr...
I find that pythons OOP + functional aspects, combined with a good understanding of the language hits a sweet spot here. One that simply can't be reached in c/cpp/go/java/haskell, and which is much easier to reach than in js/rust/other langs where I think it is possible.
The wizarding/engineering spectrum was coined by the article I've linked to[2]. I think the post is exactly about that, first Instagram was wizarding and they had a suitable language for wizarding, now they're engineering, but their language is still only good for wizarding.
As I've said in a sister comment, it's not just about static typing, but metaprogramming/macros/side effects everywhere etc. There's more to the expressiveness/powerfulness than just types. While gradual typing is certainly an improvement, I think we need more research in this direction.
In what parallel universe is not Java immensely popular or not used for green projects?
Too many companies need devs but have engineers, or they need engineers but only have devs :/
(Except those people who claim software engineers aren’t real engineers)
There are currently maybe two ways to tackle this “problem”, without a strict mode:
1. Don’t import at the global module scope; but that’s a bit tedious.
2. Import with rename, like `import os as _os`, and then leave it to the principle of “we’re all consenting adults”. I.e. if anybody imports and used things that start with an underscore, it’s clearly their fault, not mine.
I think the first step here is to get away from the assumption that importing a module will have "interesting" side effects. This is not only a problem with Python...
I tend to create mini "dependency injection" frameworks that create a pattern for loading module code at some point well after import. This patterns tends to reduce to wrapping whatever code you have in the module in a function/closure instead of just running whenever.
Again, I like the idea of enforcing constraints with code, but I don't think it's a substitute for educating developers to avoid certain patterns and giving them infrastructure that makes the alternative easy.
Millions of lines of code in a monolith. 20s start up time. Meta monkey patching. One unit test per process... Yikes!
Software architecture, anyone?
Maybe Instagram should get a copy of Michael Feathers' book...
I added these ideas here: https://github.com/perl11/cperl/issues/406
well, if you ask me to write language X, I would definitely make mistakes for the first couple of weeks/months/years, that is why you need code review, mentoring and education plans for your hires.
> Here’s another thing we often find developers doing at import time: fetching configuration from a network configuration source.
MY_CONFIG = get_config_from_network_service()
I am pretty sure this an anti-pattern, if this code passed the code review, you should make your review process more strict. def myview(request):
SomeClass.id = request.GET.get("id")
> Likely you’ve already spotted the problemWell, yes, why would you do this? why would this pass code review? why do we we have linters and other checks for dynamic languages
> It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language.
It seems we are here blaming python for shortcomings of a monolith also, instead of chunking out specific businesses modules to separate services/micro-services.
TO be honest the strict mode seems interesting, but I believe the problems they seem to be facing can be solved by a couple of changes to their pocess and code:
- everyone gets a mentor if they are not experienced in python or django
- code review atleast by two experienced python developers(does not count if you have coded for Java for 20 years)
- teams should try to move their logic outside the monolith(it sounds like they have a monolith)
- write CI tests to measure how much time it takes to import a file, if it takes more than T(line count * LINE_PROCESSING_THRESHOLD) you have to fix your code.
- prepare config and load it before running the actual server, no network call for getting config
All in all, python is suitable for big companies also, the thing is if don't care about the best practices, you would also have problems when you are a small startup, but in a big co it would make it impossible to move forward, trick is to independent of the company size follow best practices and have code review.
Clearly, Instagram's solution saves them time. That means faster code reviews which incidentally makes them more accurate. Your post doesn't really make sense.
It's also important to... use pytest fixtures instead of arbitrarily patching around in tests.
> But if we moved the log_to_network call out into the outer log_calls function, [...] this would no longer compile as a strict module.
My current understanding is that the log_calls method would NOT get executed during module load time!?!
Why would having a side effect in this function violate the intention of __strict__ ?
That's incorrect. log_calls gets executed on import because it's a decorator, so equivalent to `hello_world = log_calls(hello_world)` at the top-level (which does also get executed).
log_to_network in the _wrapped() definition doesn't get executed until hello_world gets called; but outside of the definition of _wrapped does get executed.
Those optimizations won't mean much for cpython, since Cpython doesn't try to run things fast, but for something like pypy this could be a big deal.
The quote is probably wrong, but it is right in spirit.
That's bananas.
Nothing Instagram does requires that much code.
Also, that much Python code means you're doing it wrong.
Python is too expressive to require mega-LoC for that site.
You could implement an OS, relational DB, spreadsheet, and optimizing compiler all in less than that.
You are right in that it’s certainly a high LoC count for Python, but still...
(And for the record, Linux is ~37 million lines of actual code, Postgres ~2 million, and gcc ~8 million)
There's nothing absurd about one of the most visited websites on earth to be a couple million LOC.
> So that's a third pain point for us. Mutable global state is not merely available in Python, it's underfoot everywhere you look: every module, every class, every list or dictionary or set attached to a module or class, every singleton object created at module level. It requires discipline and some Python expertise to avoid accidentally polluting global state at runtime of your program.
> One reasonable take might be that we’re stretching Python beyond what it was intended for. It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language.
> But we’re past the point of codebase size where a rewrite is even feasible. And more importantly, despite these pain points, there’s a lot more that we like about Python, and overall our developers enjoy working in Python. So it’s up to us to figure out how we can make Python work at this scale, and continue to work as we grow.
Those are literal quotes from the article. That is quite damning. How did they get to this point? By starting when Python was appropriate, and taking it day by day.
My guess (based on my experiences) is that companies wind up in this position from having inexperienced people building early versions of products instead of hiring experienced engineers (who are usually more expensive).
I would categorize it as a subset of dynamic typing, and that's what Wikipedia says too.
For me, it's not "status anxiety". It's simply not worth the effort.
The last couple static analysis tools I ran on my programs, I spent a while getting the tool to not-crash (because even though the authors obviously had a static analysis tool themselves, they either didn't bother to run it on their own code, or it wasn't good enough to find actual issues). These tools flagged only a couple issues, and almost all of them were places where it couldn't really cause any problems, but the type system was not strong enough for me to prove why it couldn't go bad. So I spent a while sorting through false-positives.
I'm not going to spend hours with a tool to find only a couple (real) bugs, which no user has ever reported seeing, and which I've gotten no automated crash reports about. I have much better uses for my time.