> 1. I did not do enough load-testing
Load test constantly. My policy is to (almost) never develop using "sample data". Instead, I take a very large example of real world data (say 95th percentile of what is actually used in the wild) and develop with that as my backing data. If operations are slow enough for me to be annoyed in development, clearly they will be too slow for the (many more) people who have to work with the project once complete.
> 2. Since this service is constantly updating, I frequently fumble with git. like accidentally pushing testing code/hardcoding onto prod.
Lock the `main` branch, only allow commits to it from PR's. Review your own PR's.
> 3. There are lots of flows in the service, so missing out on testing one of them.
Does making a change in one flow tend to adversely affect seemingly unrelated others? That might be an engineering shortcoming you should address. Besides that, automated testing. Some stacks allow "recording" a flow, then automatically making sure that same flow can happen on every PR. See point 2.
> 4. other notable issues like bad queries from analytics team
There are no bad queries, only insufficient validation, timeouts, and/or load balancing.