The whole thing was built with read-only SQL scripts, Flask, and some JQuery.
Is your code opensource or only for internal use?
I wrote a Python script that uses openpyxl module to read His Excel docs, and report lab module to generate bar codes in a PDF document with appropriate spacers so that he can simply print it out, and stick them in boxes.
He is happy and so am I that I could save his time. It only took me 20 mins to write this script.
The overall 'productionisation' of our industry has led us into a cookie cutter style of work and away from genuine problem solving like that. Ironically that sort of productivity boosting work has been wrapped up in a nonsense 'process automation consultant' role that is inflated beyond sense and often dismissed by the receiving company as an unnecessary expense.
People can just . source the file in from a shared location and often find that their scripts just start to work better. It's not perfect, nothing's perfect. It's not even that clever. But when builds and deploys start to work twice as good, even with the remaining failures, well, that's something. None of the 65000 employees using it will ever know, but it feels good to know we were dropping 2/3 orders and now we're dropping 1/3.
I built a slack app that would keep track of my team's pages and what people did to respond to them. As new pages were triggered, the bot would show the on-call person what previous people had done to resolve the page.
Lots of issues we used to see being reported were either already fixed or had been config issues. In order to (somewhat) quickly find existing fixes/comments for issues that we get reported, I built a search tool (webapp) which scraped the bugs and comments in those bugs in order to find any relevant information around your query and listed them in order of matching probability.
Was a pretty cool learning experience to build that out. I had deployed it on a personal remote VM that devs were granted, have no idea if people are still using it.
Used the same experimentation framework for automated javascript binary releases, so at some point i could release 5 times a week, with no issues. Now i left the team, people took that on and continuing like tic toc.
Showed them how to use powerdrill (data drilling, analysis tool), and taught them metrics. It is surpising how little people care what their work is really about eventually, and bringing them data driven mindset gave even more productivity boost.
Yes, it's really hacky and the whole thing is entirely silly and could have been solved by using more proper tools (i.e. not a defunct make software without wildcard support for input files or Excel for configuration), but I was VERY pleased when I got it working.
- No single aspec ratio
- Some photos had no one in it (picture of a chair, etc)
- Some photos had multiple people in the photo (!?)
- Some photos were of such poor quality that you couldn't make out the person.
It seemed some locations let the students provide their own photo. This is the first time we'd ever encountered data in this shape.
My company had two options: Print the data as-is (which would result in thousands of reprints) or hire some temp staff to sort through the photos.
I asked them to let me try and sort them over the weekend with a library I just learned about (OpenCV). I was able to write a custom OpenCV python script a little over a hundred lines long and ran it over the weekend to crop and sort the photos into several categories (based on face detection) leaving only a few thousand that had to be manually reviewed! That had a real dollar impact and felt really good.
I've also written a trivial PHP parser which was designed to match up class-definitions with comments above them:
https://blog.steve.fi/parsing_php_for_fun_and_profit.html
Both of these tools were designed to be invoked by CI/CD systems, to flag potential problems before they became live.
Most of my work involves scripting, or tooling, around existing systems and solutions. For example another developer-automation hack was to automatically add the `approved` label to pull-requests which had received successful reviews from all selected reviewers - on a self-hosted Github Enterprise installation.
Any good resources worth looking at?
Now, I have to generate different languages once the DSL is finalized. To achieve this I use Flask framework architecture. There we have routes with HTML templates. Here each generator has its own templates.
https://github.com/percolate/charlatan
Ended up saving us a lot of time writing mocks for tests.
Blue - A dead simple event based workflow execution framework.
I always find it easier to model systems from an event driven perspective. Especially when you have to move fast and evolve unpredictably. I wanted a framework anyone could learn to use within 5-10 minutes. At the same time it should be able to solve all kinds of use cases that require event based coordination between tasks in a distributed environment.
Works well for us for simple use cases (eg. data processing workflows) and complex ones (eg. our entire retail order fulfilment system).
Each shipping line offers a tracking service through one of these methods -- email, RSS or website form. Our container numbers are collected into a Google Spreadsheet via our freight forwarders. Our employees use an antiquated ERP with no API.
The script collects relevant container numbers from the Google spreadsheet, scrapes the update and the scrapes the ERP system to enter the update.
Many other specialized calculators and templates, which tend to be more foolproof than Excel.
I built a WebUSB Postal Scale and WebUSB Label Printer so our e-commerce company could print carrier shipping labels with just one click.
It took the process of fulfilling an order down to ~10 seconds per order.
What I did was designing and developing a command line utility/daemon for performing one-off and regular backups of production data. The solution is able to:
- work with a 24/7 live Cassandra cluster, containing tens of nodes
- exert tolerable and tuneable performance/latency footprint on the nodes
- backup and restore from hundreds of GBs to multiple TBs of data as fast as possible, given the constraints of the legacy data model and concurrent load from online players; observed throughput is 5-25 MB/s, depending on the environment
- provides highly flexible declarative configuration of the subset of data to backup and restore (full table exports; raw CQL queries; programmatic extractors) with first-class support for foreign-key dependencies between extractors, compiled into a highly parallelizable execution graph
There was an "a-ha!" moment, when I realized, that this utility can be used not only for backups of production data, but for the whole range of day-to-day maintenance tasks, e.g.:
1) Restore a subset of production data onto development and test machines. This solves the issue of developers and QA engineers having to fiddle with the database, when they need to test something, whether it be a new feature or a bugfix for production. They can just restore a small subset of real, meaningful and consistent data onto their environment with just a bit of configuration and a simple command. Developers may do this manually when needed, and QA environment can be restored to a clean state automatically by CI server at night.
2) Perform arbitrary updates of graphs of database entities. It's a common approach to traverse Cassandra tables, possibly with a column filter, in order to process/update some of the attributes (e.g. iterate through all users and send a push notification to each of them). The more users there are, the longer it takes, and negatively affects the cluster's performance and latency for other concurrent operations. Having a tool like I described, one may clone user data onto a separate machine beforehand (e.g. at night), and then just run the maintenance operation somewhere during the day, on data that it is still reasonably up-to-date.
All in all, it was a fun experience of devops, which I'm quite fond of. With just a little creativity and out-of-the-box thinking, there are lots of ways to improve the typical workflow of working with data.