Not IT but data analyst. Missed a 2% salary increase for our union members when projecting next year’s budget. $12 million mistake that was only caught once it was too late to fix.
I once deleted the whole production kubernetes environment trying to fix an update to prod give awry, at11pm. My saving grace was that our systems are barely used between 10pm-8am, and I managed to teach myself by reading enough docs and stack overflow comments to rebuild it and fix the initial mistake before 5am. Never learned how to correctly use a piece of stack that quickly before or since.
Yep. Ran a config as code migration on prod instead of dev. We introduced new safeguards for running against prod after that. And changed the expectations for primary on call to do dev work with down time. Shifted to improving ops tooling or making pretty charts from all the metrics. Actually ended up reducing toil substantially over the next couple quarters.
10/10 will absolutely still do something dumb again.
Every seasoned IT person, devOps or otherwise has accidentally made a catastrophic mistake. I ask that in interviews :D
Mine was replacing a failed hard drive in array.
shit.
I accidentally rm’ed /bin on a remote host located in another country, and had to wait for someone to get in and fix it.
Not IT but data analyst. Missed a 2% salary increase for our union members when projecting next year’s budget. $12 million mistake that was only caught once it was too late to fix.
I deleted all of our DNS records. As it turns out, you can’t make money when you can’t resolve dns records :P
I once deleted the whole production kubernetes environment trying to fix an update to prod give awry, at11pm. My saving grace was that our systems are barely used between 10pm-8am, and I managed to teach myself by reading enough docs and stack overflow comments to rebuild it and fix the initial mistake before 5am. Never learned how to correctly use a piece of stack that quickly before or since.
Nothing focuses the mind more than the panicked realisation that you have just hosed the production systems
I pushed a $1 bln test trade through production instead of my test environment… that was a sweaty 30 minutes
Yep. Ran a config as code migration on prod instead of dev. We introduced new safeguards for running against prod after that. And changed the expectations for primary on call to do dev work with down time. Shifted to improving ops tooling or making pretty charts from all the metrics. Actually ended up reducing toil substantially over the next couple quarters.
10/10 will absolutely still do something dumb again.