Well, shit.

phudgins@lemmy.world · 2 days ago

Well, shit.

WhatsHerBucket@lemmy.world · 2 days ago

Every seasoned IT person, devOps or otherwise has accidentally made a catastrophic mistake. I ask that in interviews :D

partial_accumen@lemmy.world · 2 days ago

Mine was replacing a failed hard drive in array.

Check array health, see one failed member
popped out the hot swappable old drive , popped in the new one
Check array health to make sure the array rebuild is underway
See array now has TWO failed member, and realize I feel the drive in my hand still spinning down

shit.

WhatsHerBucket@lemmy.world · 2 days ago

I accidentally rm’ed /bin on a remote host located in another country, and had to wait for someone to get in and fix it.

manny_stillwagon@mander.xyz · 2 days ago

Not IT but data analyst. Missed a 2% salary increase for our union members when projecting next year’s budget. $12 million mistake that was only caught once it was too late to fix.

piefood@feddit.online · 1 day ago

I deleted all of our DNS records. As it turns out, you can’t make money when you can’t resolve dns records :P

pticrix@lemmy.ca · 2 days ago

I once deleted the whole production kubernetes environment trying to fix an update to prod give awry, at11pm. My saving grace was that our systems are barely used between 10pm-8am, and I managed to teach myself by reading enough docs and stack overflow comments to rebuild it and fix the initial mistake before 5am. Never learned how to correctly use a piece of stack that quickly before or since.

martinb@lemmy.sdf.org · 1 day ago

Nothing focuses the mind more than the panicked realisation that you have just hosed the production systems

LordOfLocksley@lemmy.world · 2 days ago

I pushed a $1 bln test trade through production instead of my test environment… that was a sweaty 30 minutes

Botzo@lemmy.world · 2 days ago

Yep. Ran a config as code migration on prod instead of dev. We introduced new safeguards for running against prod after that. And changed the expectations for primary on call to do dev work with down time. Shifted to improving ops tooling or making pretty charts from all the metrics. Actually ended up reducing toil substantially over the next couple quarters.

10/10 will absolutely still do something dumb again.