- researchers conduct experiments that may fail
- they often produce PoC instead of final products
- it's tempting to produce tonnes of low quality code (since it just an experiment/PoC!)
- often researchers are not software engineers so they don't really care about code quality / tests
How to find a good trade-off between high coding standards and not getting in the way of research? Is it possible to move smoothly from PoC to production solution without rewriting everything from scratch? How to share code between experiments / PoCs?
Addressing the specific points:
> - researchers conduct experiments that may fail
Very true! Often many, many, many experiments...
> - they often produce PoC instead of final products
This is going to vary a lot depending on where you are working. We were responsible for developing, deploying, and for some time supporting whatever we developed.
> - it's tempting to produce tonnes of low quality code (since it just an experiment/PoC!)
Yep, especially in early stages.
> - often researchers are not software engineers so they don't really care about code quality / tests
This really depends on what stage in the lifecycle of a research project we were in. We were responsible for deploying the final code, so at the end of the day it had to be of the same quality as something someone with the title of software engineer would generate.
Code quality is another problem entirely. I agree code quality can get out of control as soon as the PoC is promoted to something resembling "production."
My suggestions are:
- First, if you get frustrated at researchers for code quality, let them calmly know why you are upset. If they are being inefficient, many would love to hear tips to keep it from happening. Let them know when the things they are doing might affect large groups of people.
- Don't try to write tests for everything. This just slows you down, getting away from the good things above. Write tests for things that are frequently broken, and absolutely required to work, such as core functionality. If something gets broken 2 or 3 times, you should definitely have a test.
- Make your tests as high level as possible. Compute power is cheap, and despite what you might hear from the TDD/unit testing crowd, your tests don't need to run in 2 seconds to be useful. I like to have tests that emulate users, because as you change the logic of how you're doing things, you still have tests to back you up.
- Add lots of additional logging. This helps document the code (since the messages should be useful and say what is going on), and provides great info for debugging issues after they've already occurred. I've been saved by good logging more times than I can remember, especially on different OS/environments that aren't the test environment.
- Don't worry too much about edge cases. Just print a log line or crash out if it's something ridiculous you've gotten yourself into, which is a lot more friendly than figuring out some horrendous bug mired in retry logic that has masked the original issue.
- Insist on version control, but not code reviews. Code reviews can really slow you down. Instead, fix problems after they come up. You haven't shipped, right?
- Run the build and tests in a simple CI loop that runs overnight. Don't worry about testing each commit, just know if it works or doesn't work. Fix the problems.
These last two are related:
- Feel free to just start over. Delete huge amounts of code, and try a different approach.
- If you have gone past the point of no return (you don't want to start over), then start production-izing the code. Again, aim at the problems to start, not some coverage metric. Look over all the code and reduce redundancy. It's a lot easier to review code once it's all there, rather than bit by bit.
- I've seen many cases where researchers refused to share their code because they knew it wasn't up to any reasonable standard. This is a red flag. If they are embarrassed by their code, I tend to discount their alleged results entirely.
- Even in research, people should be required by the organization to follow some kind of process. Use version control (like git or even svn. This basic step is still not universal), put in pull requests, get code reviewed from someone else.
- For that purpose, every research organization should have someone on staff who can do a competent review. They do not need to specialize in the researcher's field. They just need to know a code smell when they see it.
- Every researcher I have known will resist this strenuously. That is a sign of how much they need it.
- When publishing research results, code and data should always be required. Otherwise, the results cannot be judged. (A lot of people like it that way. They should not be accommodated).
I could go on but I'll be nice and stop here.
Your researcher got a result. Great. What is their objective evidence that the result is real rather than an artifact of a bug in their code? If the code is garbage, you can't trust the result, no matter how much of a breakthrough the result would be if true.
That doesn't mean that the code needs to be production-ready. It does mean that the code needs to be clean enough to be trustworthy. (Tests can be included in this evaluation.)
If the code's going to be product-ized... maybe ask the researcher which parts of the code they think are the most troublesome. Start by re-writing those pieces, from scratch, with production levels of rigor. Then, as other parts prove troublesome, rewrite those too. Don't band-aid them, rewrite them. Keep the interfaces, unless the interface itself is part of the problem.
Core environment is still the Jupyter Notebook. So should remain familiar to most data scientists.
Zero to JupyterHub with Kubernetes
Something small but meaningful that I believe in are tools like versoneer (https://github.com/warner/python-versioneer/) which bump the version of your code on every commit.
Then, embed this version string in all output. Figures, serialized data, whatever.
It is very powerful to be able to point at a figure and say "this graph was produced by precisely this code". If you're feeling particularly anal, include the hashes of the datasets that generated it too.
Are you a mathematician simulating a dynamical system? A theoretical computer scientist exploring the effects of parameters that are difficult to nail down analytically?
Are you a computer scientist working on a new sort of system? Is the point of that system to support a long-running research agenda, or to demonstrate the feasibility of a general notion/idea?
Or are you a software engineer supporting a natural scientist (e.g., in a large bio/neuro/chem/physics lab)?
Are you the PhD student, the research scientist, the supporting engineer, or the PI?
But in any case, the correct answer will start with interrogating the purpose/role of the software in your research project. And that answer could range from "hack out the MATLAB and sanity check" all the way to "lives are on the line; practice extreme rigor". And certainly not excluding "convince your funding agency/PI that it's time to hire a professional"!
We also made good use of tagging features in our project management toolset to make report writing easier at the end of the project.
- Scientists try their best to be good programmers, but are scientists first.
- Someone the scientist knows, or someone on the team with more programming knowledge, turns what the scientist produced into something maintainable at some point.
- If they're lucky, the grant will have resources for a script/software maintainer.
Scientists are scientists. I know a few who can do things with awk that probably should never be done, but they use the tools they know to get the data to look the way they need.
Other researchers care about the final code which is used to generate the results. So, in my book it's ok if there's a large gap between the code that lead to the initial idea and the code which was used to show the idea in practice (i.e. the code used to generate all graphs/tables within a submitted publication).
I think you can have both. Unit tests and good coding practices should make you faster once you have more than "1 screen"'s worth of code and you are relying on human memory when navigating and maintaining code.
I'm not a "unit test all the things" kind of person though.
- use version control
- write tests (high level, keep it simple)
- pull in data as if it was a dependency, versioner and stable
- use a CI server/service
- when publishing, code and data goes with the paper