A guy in our data center couldn’t figure out who owned a particular machine that he needed to work on. So his solution to figure it out was to let them come to him. He went and pulled out the network cable and waited. He was escorted out a little while later. The moral of the story is don’t go disabling production machines on purpose.
I guess it depends on where you work. This was a large datacenter for a very large health insurance company. They made it a point later that day to remind people that it was a fireable offense to mess with production machines like that on purpose. And evidently the service he disabled was critical enough that it didn’t take long for the hammer to come down. There were plenty of ways to find out who owned the machine, he just chose the easiest and got fired on the spot for it.
Well I am not him, so I can’t tell you whether or not he actually “could” have figured it out. The options to figure it out did exist, but he chose not to use them giving it the appearance that he “couldn’t”. Are you this much fun at parties?
I don’t understand how that is even possible.
Are there no logs? No documentation? Does everyone share an admin user with full rights?
I mean, there has to be a way to find out who accessed the machine last time.
You’d be surprised with inheriting tech debt. Quite often there’s no documentation, the last person to log in to the system is an admin that quit 3 years ago, but it doesn’t much matter because that’s only for a direct console login which normal users don’t do when accessing the application. With tribal knowledge gone and no documentation, only when you pull the network for a bit do you discover that there was this one random script running on it that was responsible for loading up all the needed data in the current system, when 9 of the other 10 times those scripts were no longer needed.
In a perfect world you’d have documentation, architecture and data flow diagrams for everything, but “ain’t nobody got time for that” and it doesn’t happen.
Had that the other way around recently. A docker container failed to come back up after I had updated the host OS.
Was about ready to restore the snapshot, when I looked further back in the logs on a hunch.
Turns out that container hadn’t worked before the update either. The software’s developer is long gone, and no one could tell me what it was supposedly doing.
You’d be surprised. I had some security devices that I was actively using get shut down simply because some paperwork didn’t get filled out properly and the data center team claimed they had no documentation on them.
I read that as “lazy to the point of unprofessionalism”. I’m super lazy too, but it just means I try to automate the absolute shit out of everything I do to the greatest degree possible.
Where I worked we had a very important time sensitive project. The server had to do a lot of calculations on a terrain dataset that covered the entire planet.
The server had a huge amount of RAM and each calculation block took about a week. It could not be saved until the end of the calculation and only that server had the RAM to do the work. So if it went down we could lose almost a weeks work.
Project was due in 6 months and calculation time was estimated to be about 5 1/2 months. So we couldn’t afford any interruptions.
We had bought a huge UPS meant for a whole server rack. For this one server. It could keep the server up for three days. That way even if wet lost power over the weekend it would keep going and we would have time to buy a generator.
One Friday afternoon the building losses power and I go check on the server room. Sure enough the big UPS with a sign saying only for project xyz has a bunch of other servers plugged into it.
I quickly unplug all but ours. I tell my boss and we go home at 5. Latter that day the power comes back on.
On Monday there are a ton of departments bitching that they came in an their servers were unplugged. Lots of people wanted me fired. My boss backed me and nothing happened but it was stressful.
At a startup a long time ago, I was working on the weekend and brought my 3 year old with me. We had a customer coming in next week and this one machine was 5 days into a 7 day model build.
We had to go into that office to help someone with something unrelated. The little shit saw the blinking light and headed straight for the button.
On this computer (HP 710), it didn’t shut off until you released the button. He actually was just pressing it but got spooked when I tried to get to it.
The next day our CEO told the guys that built that app that it had to be made so it could recover from crashes and restart from where it left off.
Yeah, I’ve done that before – after asking literally everyone in IT, plus our external consultants, and getting the go-ahead from my team lead and the head of IT.
A guy in our data center couldn’t figure out who owned a particular machine that he needed to work on. So his solution to figure it out was to let them come to him. He went and pulled out the network cable and waited. He was escorted out a little while later. The moral of the story is don’t go disabling production machines on purpose.
Honestly we do that when we ask and no one speaks up. Lovingly called the “scream test” as we wait to see who screams.
I guess it depends on where you work. This was a large datacenter for a very large health insurance company. They made it a point later that day to remind people that it was a fireable offense to mess with production machines like that on purpose. And evidently the service he disabled was critical enough that it didn’t take long for the hammer to come down. There were plenty of ways to find out who owned the machine, he just chose the easiest and got fired on the spot for it.
So it wasn’t accurate when you said he “couldn’t” figure it out.
Well I am not him, so I can’t tell you whether or not he actually “could” have figured it out. The options to figure it out did exist, but he chose not to use them giving it the appearance that he “couldn’t”. Are you this much fun at parties?
He couldn’t figure it out, a competent person could have without unplugging it.
Scream tests are a last resort though.
Sounds like it was a last resort if he “couldn’t figure out” whose machine it was.
I don’t understand how that is even possible.
Are there no logs? No documentation? Does everyone share an admin user with full rights?
I mean, there has to be a way to find out who accessed the machine last time.
You’d be surprised with inheriting tech debt. Quite often there’s no documentation, the last person to log in to the system is an admin that quit 3 years ago, but it doesn’t much matter because that’s only for a direct console login which normal users don’t do when accessing the application. With tribal knowledge gone and no documentation, only when you pull the network for a bit do you discover that there was this one random script running on it that was responsible for loading up all the needed data in the current system, when 9 of the other 10 times those scripts were no longer needed.
In a perfect world you’d have documentation, architecture and data flow diagrams for everything, but “ain’t nobody got time for that” and it doesn’t happen.
Had that the other way around recently. A docker container failed to come back up after I had updated the host OS.
Was about ready to restore the snapshot, when I looked further back in the logs on a hunch.
Turns out that container hadn’t worked before the update either. The software’s developer is long gone, and no one could tell me what it was supposedly doing.
company a gets bought by company b. company b fires 50% of company a.
even a scream test won’t get you answers because nobody is around that could complain nor know where the docs are.
You’d be surprised. I had some security devices that I was actively using get shut down simply because some paperwork didn’t get filled out properly and the data center team claimed they had no documentation on them.
I read that as “lazy to the point of unprofessionalism”. I’m super lazy too, but it just means I try to automate the absolute shit out of everything I do to the greatest degree possible.
Where I worked we had a very important time sensitive project. The server had to do a lot of calculations on a terrain dataset that covered the entire planet.
The server had a huge amount of RAM and each calculation block took about a week. It could not be saved until the end of the calculation and only that server had the RAM to do the work. So if it went down we could lose almost a weeks work.
Project was due in 6 months and calculation time was estimated to be about 5 1/2 months. So we couldn’t afford any interruptions.
We had bought a huge UPS meant for a whole server rack. For this one server. It could keep the server up for three days. That way even if wet lost power over the weekend it would keep going and we would have time to buy a generator.
One Friday afternoon the building losses power and I go check on the server room. Sure enough the big UPS with a sign saying only for project xyz has a bunch of other servers plugged into it.
I quickly unplug all but ours. I tell my boss and we go home at 5. Latter that day the power comes back on.
On Monday there are a ton of departments bitching that they came in an their servers were unplugged. Lots of people wanted me fired. My boss backed me and nothing happened but it was stressful.
I’d be super gluing those plastic toddler plug covers all over that thing.
fuck those other departments.
At a startup a long time ago, I was working on the weekend and brought my 3 year old with me. We had a customer coming in next week and this one machine was 5 days into a 7 day model build.
We had to go into that office to help someone with something unrelated. The little shit saw the blinking light and headed straight for the button.
On this computer (HP 710), it didn’t shut off until you released the button. He actually was just pressing it but got spooked when I tried to get to it.
The next day our CEO told the guys that built that app that it had to be made so it could recover from crashes and restart from where it left off.
Yeah, I’ve done that before – after asking literally everyone in IT, plus our external consultants, and getting the go-ahead from my team lead and the head of IT.
If you fear reprisal for a scream test then you need to make it look like an accident.