Ask HN: How to release features seamlessly as a Software engineer?

2 pointsAdityaSanthosh2y ago8 comments

I am at a medium-sized startup owning an internally used self-serve analytics service with a reasonable load. I am the sole owner of this service with thousands of lines of code. Since I am the only guy working on this, I take all the architecture decisions, do DevOps like setting up step functions, API Gateways, giving accesses, setting up staging infra, writing feature code, raising PRs, reviewing and deploying them myself, test on staging, and do sanity on production. However, I have fumbled often while releasing new features most recently today resulting in downtime for many users throughout the day. Even though I come up with creative ideas and write good code, the optics of my work are getting bad because of the bad releases. The reasons for the bad releases are 1. I did not do enough load-testing 2. Since this service is constantly updating, I frequently fumble with git. like accidentally pushing testing code/hardcoding onto prod 3. there are lots of flows in the service, so missing out on testing one of them. 4. other notable issues like bad queries from analytics team.

I am always the kind of guy who is creative, authentic and believes in move fast and break stuff. However, my company expects me to move fast but don't break anything. I was also told that I make small mistakes from time to time. (Most recently, I forgot to turn on a cron on prod). I do love my work and I think the decisions I make and the code I write are of high quality but these issues are affecting my optics. (I have trouble concentrating for long hours. I work in short bursts if that's worth anything). How do I overcome this situation, and improve my optics and my confidence? Thanks

8 comments

dumbo-octopus2y ago

> 1. I did not do enough load-testing

Load test constantly. My policy is to (almost) never develop using "sample data". Instead, I take a very large example of real world data (say 95th percentile of what is actually used in the wild) and develop with that as my backing data. If operations are slow enough for me to be annoyed in development, clearly they will be too slow for the (many more) people who have to work with the project once complete.

> 2. Since this service is constantly updating, I frequently fumble with git. like accidentally pushing testing code/hardcoding onto prod.

Lock the `main` branch, only allow commits to it from PR's. Review your own PR's.

> 3. There are lots of flows in the service, so missing out on testing one of them.

Does making a change in one flow tend to adversely affect seemingly unrelated others? That might be an engineering shortcoming you should address. Besides that, automated testing. Some stacks allow "recording" a flow, then automatically making sure that same flow can happen on every PR. See point 2.

> 4. other notable issues like bad queries from analytics team

There are no bad queries, only insufficient validation, timeouts, and/or load balancing.

AdityaSanthoshOP2y ago

> I take a very large example of real-world data (say 95th percentile of what is actually used in the wild) and develop with that as my backing data. If operations are slow enough for me to be annoyed in development, clearly they will be too slow for the (many more) people who have to work with the project once complete.

Interesting point. Will try to incorporate that.

> Does making a change in one flow tend to adversely affect seemingly unrelated others?

It doesn't happen that much, but because there is a lot of intersection between those flows, they are kind of interlinked(to reduce code duplication). But point noted, I will try to see if they can be separated.

> Lock the `main` branch, only allow commits to it from PR's. Review your own PR's.

Done.

> There are no bad queries, only insufficient validation and/or timeouts.

Validations are huge issue. When you have hundreds of variables and one of them throws DivisionByZero error or invalid data type, those are hard to catch

Loved these suggestions especially the first one. any more ideas?

dumbo-octopus2y ago

> I will try to see if they can be separated.

Not so fast, if you have shared code that is breaking that'd be a perfect place to start introducing automated testing. In general automated UI testing is more work and false-flags than it's worth, but the exception is heavily reused code. That said, if you have code that is technically reused, but there are so many parameters that no use site is the same and changing the way one parameter gets interpreted causes issues with another, yes that'd be a good thing to fix up.

> When you have hundreds of variables and one of them throws DivisionByZero error or invalid data type, those are hard to catch

What makes those hard to catch?

1 more reply

smokeydoe2y ago

This sounds a lot like my experience. I have had the same issues as solo owner of the tech projects. What helped for me was setting up a complete staging environment, where all new features are tested by either your business users, or better a person to do sole QA. I would advise your company to hire a contract QA person if you can. Then set up your deployments to go to staging first, when everything is tested you should have some CI integration to deploy exactly what is in staging to production. This is what I pitched to my clients and it works much better. Then if issues arise they are usually due to inadequate testing in staging. Be aware there may be kinks to iron out in your deployment process at first, but once its solid it should not require many changes.

AdityaSanthoshOP2y ago

On an unrelated Note, I admit I hated the idea of setting up processes because I enjoyed the freedom given to me by my manager to make architectural and code decisions on my own and move fast rather than following rigid practices. I am not sure if that mindset is good.

smokeydoe2y ago

I agree. I still push things to production occasionally. But testing bigger changes with all edge cases can take a lot of time for me on some projects. Having QA I am left with more time to work on features. A decent QA will find bugs you wouldn’t have and make the product better.

AdityaSanthoshOP2y ago

I created a staging setup, the CI/CD pipeline already, I pitched to my Engineering manager to get me a QA. I will push harder from now to smoothen the deployments.

j / k navigate · click thread line to collapse

8 comments

dumbo-octopus2y ago

> 1. I did not do enough load-testing

> 2. Since this service is constantly updating, I frequently fumble with git. like accidentally pushing testing code/hardcoding onto prod.

Lock the `main` branch, only allow commits to it from PR's. Review your own PR's.

> 3. There are lots of flows in the service, so missing out on testing one of them.

> 4. other notable issues like bad queries from analytics team

There are no bad queries, only insufficient validation, timeouts, and/or load balancing.

AdityaSanthoshOP2y ago

Interesting point. Will try to incorporate that.

> Does making a change in one flow tend to adversely affect seemingly unrelated others?

> Lock the `main` branch, only allow commits to it from PR's. Review your own PR's.

Done.

> There are no bad queries, only insufficient validation and/or timeouts.

Validations are huge issue. When you have hundreds of variables and one of them throws DivisionByZero error or invalid data type, those are hard to catch

Loved these suggestions especially the first one. any more ideas?

dumbo-octopus2y ago

> I will try to see if they can be separated.

> When you have hundreds of variables and one of them throws DivisionByZero error or invalid data type, those are hard to catch

What makes those hard to catch?

1 more reply

smokeydoe2y ago

AdityaSanthoshOP2y ago

smokeydoe2y ago

AdityaSanthoshOP2y ago

I created a staging setup, the CI/CD pipeline already, I pitched to my Engineering manager to get me a QA. I will push harder from now to smoothen the deployments.

j / k navigate · click thread line to collapse