Netflix can absolutely afford to hire people to work after hours.
So yes, they absolutely can and almost certainly do; their NOC-equivalent. What you're not going to get for overnights is a subject matter expert on the myriad particular Netflix bits from that one person. I can absolutely tell you that my team knows wtf is in the few million LOC in our particular widget, but have absolutely NFC what goes into the things that feeds it, or the internals of the N other widgets in the infrastructure. I have APIs. I know what comes out, or is supposed to come out, and I know what we put in, or are supposed to put in. Thassit.
So, you hire someone to do after hours, who knows enough to handle minor stuff, knows the difference between major and minor, and knows who to poke if something major happens and shit is on fire. Regrettably, every few months I'm the one in the barrel for my team, but if you do this right/have a robust release process/good documentation/sane deployment practices, you basically never get paged. And if you do get paged, shit's on fire, yo, and you need an expert on the widget.
Pls note, this is not "nah your developers can just get bothered whenever an alert fires". That's what your NOC is for. They do nothing but look at dashboards, monitor alerts, handle shit if they can so nobody gets woken up, and if someone does, they do incident management. It's very easy for someone to cheap out and not hire people to do that, and push the load onto the devs for "free", and that's wrong. But you're never ever going to find individuals who are developer-level SMEs on all those bits without actually being the people working on those bits, and there's too much to hold in one head when a single client is probably 3-4mill LOC all up, and you have N clients, and now the services (also easy multi-million LOC) that feed them etc...
Netflix can absolutely afford to hire people to work after hours.
So yes, they absolutely can and almost certainly do; their NOC-equivalent. What you're not going to get for overnights is a subject matter expert on the myriad particular Netflix bits from that one person. I can absolutely tell you that my team knows wtf is in the few million LOC in our particular widget, but have absolutely NFC what goes into the things that feeds it, or the internals of the N other widgets in the infrastructure. I have APIs. I know what comes out, or is supposed to come out, and I know what we put in, or are supposed to put in. Thassit.
So, you hire someone to do after hours, who knows enough to handle minor stuff, knows the difference between major and minor, and knows who to poke if something major happens and shit is on fire. Regrettably, every few months I'm the one in the barrel for my team, but if you do this right/have a robust release process/good documentation/sane deployment practices, you basically never get paged. And if you do get paged, shit's on fire, yo, and you need an expert on the widget.
Pls note, this is not "nah your developers can just get bothered whenever an alert fires". That's what your NOC is for. They do nothing but look at dashboards, monitor alerts, handle shit if they can so nobody gets woken up, and if someone does, they do incident management. It's very easy for someone to cheap out and not hire people to do that, and push the load onto the devs for "free", and that's wrong. But you're never ever going to find individuals who are developer-level SMEs on all those bits without actually being the people working on those bits, and there's too much to hold in one head when a single client is probably 3-4mill LOC all up, and you have N clients, and now the services (also easy multi-million LOC) that feed them etc...
Over here we have a very robust and nice incident management procedure where each service team has their own on-call schedule. So if there is an incident and we know that we need someone from Service A to take a look, we know where to poke to quickly get a response. Then if it turns out that it wasn't Service A but that we should look instead into B, we can bring in that team's on-call and so on.
It's great, makes the teams feel responsible for the code they put into production, and shit gets fixed quickly because no one is trying to understand what service A does in the middle of the night while shit stopped working.
I mean yeah, I'm talking about "every few months" kind of thing, because it's split amongst the entire team working on a widget. I would assume as org and team scale up and down, this gets better and worse. I'm on a not huge, but decently sized development team (call it 8-10 pizzas worth of people all told, though not all in one room so to speak) so it works out fine. If I had 3 people on my team, I'd be less pleased by far.
We alternate week on week off and get paid a stipend when we're on, plus get back time in lieu if we have to go onsite. Time in lieu kinda sucks, but still, it's a nice little bump in pay. We're not huge, 500 person org and there are weeks without calls.
Netflix can absolutely afford to hire people to work after hours.
So yes, they absolutely can and almost certainly do; their NOC-equivalent. What you're not going to get for overnights is a subject matter expert on the myriad particular Netflix bits from that one person. I can absolutely tell you that my team knows wtf is in the few million LOC in our particular widget, but have absolutely NFC what goes into the things that feeds it, or the internals of the N other widgets in the infrastructure. I have APIs. I know what comes out, or is supposed to come out, and I know what we put in, or are supposed to put in. Thassit.
So, you hire someone to do after hours, who knows enough to handle minor stuff, knows the difference between major and minor, and knows who to poke if something major happens and shit is on fire. Regrettably, every few months I'm the one in the barrel for my team, but if you do this right/have a robust release process/good documentation/sane deployment practices, you basically never get paged. And if you do get paged, shit's on fire, yo, and you need an expert on the widget.
Pls note, this is not "nah your developers can just get bothered whenever an alert fires". That's what your NOC is for. They do nothing but look at dashboards, monitor alerts, handle shit if they can so nobody gets woken up, and if someone does, they do incident management. It's very easy for someone to cheap out and not hire people to do that, and push the load onto the devs for "free", and that's wrong. But you're never ever going to find individuals who are developer-level SMEs on all those bits without actually being the people working on those bits, and there's too much to hold in one head when a single client is probably 3-4mill LOC all up, and you have N clients, and now the services (also easy multi-million LOC) that feed them etc...
Still literally not my problem, hire developers for that team for after hours or actually pay those people to be on call.
The problem is always not wanting to pay, always.
bowen on
not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
I mean, that's what I said, no? If you're not the NOC (aka "paid to sit there and stare at blinky lights all night") or the relevant system SME, you shouldn't be being paged for a thing. And if you are the SME on a system/on that team, it's rather reasonable that, should something happen requiring the input of an expert, you get the escalation if needful.
This requires your NOC exist, be competent enough to determine "what we need to page out for" vs "what we know how to handle/isn't urgent", and an org willing to spend on these sorts of systems, of course, but like... Yeah. That's how to do it properly. Not doing it properly will make it suck. Same with sharing the rotation; it needs to be spread amongst a team equally, none of this "every couple days" thing. A week every few months? Fine. Every other? That's a compensation discussion.
I mean, that's what I said, no? If you're not the NOC (aka "paid to sit there and stare at blinky lights all night") or the relevant system SME, you shouldn't be being paged for a thing. And if you are the SME on a system/on that team, it's rather reasonable that, should something happen requiring the input of an expert, you get the escalation if needful.
This requires your NOC exist, be competent enough to determine "what we need to page out for" vs "what we know how to handle/isn't urgent", and an org willing to spend on these sorts of systems, of course, but like... Yeah. That's how to do it properly. Not doing it properly will make it suck. Same with sharing the rotation; it needs to be spread amongst a team equally, none of this "every couple days" thing. A week every few months? Fine. Every other? That's a compensation discussion.
Sorry there was far too many acronyms for me to parse it without busting out the acronym finder so I assume it was a "ahhhhhhh I did it it's not a big deal" thing I see 8 times out of 10 when I bring this up somewhere.
not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
Nope, I hate on call. Did it for three years once, continually, and can't hear a phone ring without a twitch as a result. I'll just accept it as a necessity when done properly, and when done properly, it can be made to suck as minimally as possible. It still sucks, and I don't like my week in the barrel, but... I get why. And I won't throw a fit as long as they hold up their end, and respect my time.
Yeah same, like, I'm more than happy to be on call for those weeks but I want time and a half if I take calls and I want to be paid for the other 16 hours of my day I'm "on wait".
It's really not my fault you have a service that promises some level of 9s of uptime and you're undercharging for what that actually costs.
not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
Oh yeah. I think if you don't have a follow the sun staffing plan and dedicated NOC then your management probably just straight up hasn't aligned uptime requirements and priorities. Either it's not actually that important or there is something really busted. I'm never going to do true overnight on call again. I still wake up in a panic when the phone rings from when I did it.
He knows the steps, as in the right place to go in the UI, but the conceptual understanding was lacking.
I'm sure if I had told him, "Map the Z: drive to \\server\path" he could have done it.
But the problem was that he encountered a situation where he needed to access a file that is on \\server\path, which he knows as Z:, on a computer without a Z: drive, and got blocked.
every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.
I'm sorry I haven't touched Windows for about 11 years now but back when I did, browsing shares via \\SERVERNAME\Whatever was a pretty regular occurrence.
Is his next lesson going to be admin shares? Does he know they exist? If your policy is to remove them (I dunno what best practice is nowadays), does he know how to do that?
Oh man I remember when I first learned about admin shares.
I was really handy in the hospital I worked at since most of the "installed programs" were just desktop shortcuts to various webapps.
Made some of the old guard look like clowns by doing work for which they had booked an entire day in like 30 seconds with a powershell script.
life's a game that you're bound to lose / like using a hammer to pound in screws
fuck up once and you break your thumb / if you're happy at all then you're god damn dumb
that's right we're on a fucked up cruise / God is dead but at least we have booze
bad things happen, no one knows why / the sun burns out and everyone dies
He knows the steps, as in the right place to go in the UI, but the conceptual understanding was lacking.
I'm sure if I had told him, "Map the Z: drive to \\server\path" he could have done it.
But the problem was that he encountered a situation where he needed to access a file that is on \\server\path, which he knows as Z:, on a computer without a Z: drive, and got blocked.
I'm sorry I haven't touched Windows for about 11 years now but back when I did, browsing shares via \\SERVERNAME\Whatever was a pretty regular occurrence.
Is his next lesson going to be admin shares? Does he know they exist? If your policy is to remove them (I dunno what best practice is nowadays), does he know how to do that?
Current best practice is to leave admin shares enabled and mitigate the risks via monitoring, Windows Firewall, and/or tight controls on the admin accounts.
every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.
I'm sorry I haven't touched Windows for about 11 years now but back when I did, browsing shares via \\SERVERNAME\Whatever was a pretty regular occurrence.
Is his next lesson going to be admin shares? Does he know they exist? If your policy is to remove them (I dunno what best practice is nowadays), does he know how to do that?
Current best practice is to leave admin shares enabled and mitigate the risks via monitoring, Windows Firewall, and/or tight controls on the admin accounts.
Good cuz admin shares saved my life like 80 times back in the day.
Trying to sort out an issue with Outlook from a migration someone else did from Exchange 2013->2019. Outlook search gives the "something went wrong" message, but only when searching by Current Folder or Current Mailbox. If I choose Subfolders or All Outlook Items, the search works properly.
The fun stuff that happens when migrating servers!
Because if you're going to attempt to squeeze that big black monster into your slot you will need to be able to take at least 12 inches or else you're going to have a bad time...
The reason those games are popular is because they're extremely relaxing, and the reason they're relaxing is because they have people removed from the "job" part of the game.
Either they removed people from an IT simulator and made it relaxing, which is a travesty, I want others to feel my pain, or they put them in, which makes the game not fun to play.
Only they shouldn't just be failure conditions, they should be constant reminders
Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it
You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.
Only they shouldn't just be failure conditions, they should be constant reminders
Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it
You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.
They both end up being your rival because the one you chose to help ends up having a bad experience with someone else in IT and you're all the same person to them.
+5
That_GuyI don't wanna be that guyRegistered Userregular
Only they shouldn't just be failure conditions, they should be constant reminders
Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it
You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.
Only they shouldn't just be failure conditions, they should be constant reminders
Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it
You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.
They both end up being your rival because the one you chose to help ends up having a bad experience with someone else in IT and you're all the same person to them.
To be fair, these particular problems aren't unique to IT. My sister who works HR (and spent about 15 years at a big international software company) had to deal with very similar situations. Any careers that service other parts of their organization probably get stuff like that.
lwt1973King of ThievesSyndicationRegistered Userregular
Problem with our accounting software not being able to add a vehicle so we find out that a vehicle was inactivated two years ago with the same name through a SQL query.
How are you supposed to find the vehicle as a normal user? Go to the vehicle menu, click on the show inactive vehicles indicator, and then stare in disbelief as there is no active/inactive column to differentiate them all so you'll have to click on each one to bring up the relevant information.
"He's sulking in his tent like Achilles! It's the Iliad?...from Homer?! READ A BOOK!!" -Handy
We have an app (that rhymes with CopenOclinica) where users are added to studies or sites (children of studies).
You have to go to the user after they are created and add a role. You choose the site / study from a drop down list. That is not sorted alphabetically either at the site or study level. And if you view a user you only see the site, not the study (unless you query the DB or view source and know parent study ids). So a user may have four of five sites for sister studies but who knows what study #5 is.
There are thousands of sites and studies. Oh and if someone is added at a study level you can only add them at a site level through the back end. But site access is restricted to certain permissions unless explicitly given even if you are an app admin.
Luckily even though I'm the SME and this is a high dollar validated 21 CFR 11 system...nobody seems to give two shits. And we are the hard ass do it right shop.
+2
lwt1973King of ThievesSyndicationRegistered Userregular
That is not sorted alphabetically either at the site or study level.
Is it sorted by creation date? I've seen that stupidity in the past.
Nope, that would at least make some degree of stupid sense. The order is consistent but nothing - even something stupid like a key or guid sorted alphabetically seems to have anything to do with the order. I've seen ordering in other applications where the numeric order goes:
1
10
11
2
...
But this just makes no sense, and its funny because there are other places where it orders by study and then site in alphabetic order so its not like they are incapable of sorting it properly its just not implemented on the page where you add roles.
0
That_GuyI don't wanna be that guyRegistered Userregular
I think there's an sql statement that lets you take any query results and ORDER BY whatever you want. Can't remember what it's called though...
If the app developers can't even present the data to users usefully, I'd give it 50/50 odds that would result in unindexed scans of a 10 million row table.
Just remember that half the people you meet are below average intelligence.
Posts
XBL:Phenyhelm - 3DS:Phenyhelm
So yes, they absolutely can and almost certainly do; their NOC-equivalent. What you're not going to get for overnights is a subject matter expert on the myriad particular Netflix bits from that one person. I can absolutely tell you that my team knows wtf is in the few million LOC in our particular widget, but have absolutely NFC what goes into the things that feeds it, or the internals of the N other widgets in the infrastructure. I have APIs. I know what comes out, or is supposed to come out, and I know what we put in, or are supposed to put in. Thassit.
So, you hire someone to do after hours, who knows enough to handle minor stuff, knows the difference between major and minor, and knows who to poke if something major happens and shit is on fire. Regrettably, every few months I'm the one in the barrel for my team, but if you do this right/have a robust release process/good documentation/sane deployment practices, you basically never get paged. And if you do get paged, shit's on fire, yo, and you need an expert on the widget.
Pls note, this is not "nah your developers can just get bothered whenever an alert fires". That's what your NOC is for. They do nothing but look at dashboards, monitor alerts, handle shit if they can so nobody gets woken up, and if someone does, they do incident management. It's very easy for someone to cheap out and not hire people to do that, and push the load onto the devs for "free", and that's wrong. But you're never ever going to find individuals who are developer-level SMEs on all those bits without actually being the people working on those bits, and there's too much to hold in one head when a single client is probably 3-4mill LOC all up, and you have N clients, and now the services (also easy multi-million LOC) that feed them etc...
Over here we have a very robust and nice incident management procedure where each service team has their own on-call schedule. So if there is an incident and we know that we need someone from Service A to take a look, we know where to poke to quickly get a response. Then if it turns out that it wasn't Service A but that we should look instead into B, we can bring in that team's on-call and so on.
It's great, makes the teams feel responsible for the code they put into production, and shit gets fixed quickly because no one is trying to understand what service A does in the middle of the night while shit stopped working.
1x wasn't bad but honestly I am so done with it.
XBL:Phenyhelm - 3DS:Phenyhelm
Still literally not my problem, hire developers for that team for after hours or actually pay those people to be on call.
The problem is always not wanting to pay, always.
This requires your NOC exist, be competent enough to determine "what we need to page out for" vs "what we know how to handle/isn't urgent", and an org willing to spend on these sorts of systems, of course, but like... Yeah. That's how to do it properly. Not doing it properly will make it suck. Same with sharing the rotation; it needs to be spread amongst a team equally, none of this "every couple days" thing. A week every few months? Fine. Every other? That's a compensation discussion.
Sorry there was far too many acronyms for me to parse it without busting out the acronym finder so I assume it was a "ahhhhhhh I did it it's not a big deal" thing I see 8 times out of 10 when I bring this up somewhere.
It's really not my fault you have a service that promises some level of 9s of uptime and you're undercharging for what that actually costs.
the "no true scotch man" fallacy.
I'm sure if I had told him, "Map the Z: drive to \\server\path" he could have done it.
But the problem was that he encountered a situation where he needed to access a file that is on \\server\path, which he knows as Z:, on a computer without a Z: drive, and got blocked.
the "no true scotch man" fallacy.
XBL:Phenyhelm - 3DS:Phenyhelm
wow
what
wow
what
Is his next lesson going to be admin shares? Does he know they exist? If your policy is to remove them (I dunno what best practice is nowadays), does he know how to do that?
I was really handy in the hospital I worked at since most of the "installed programs" were just desktop shortcuts to various webapps.
Made some of the old guard look like clowns by doing work for which they had booked an entire day in like 30 seconds with a powershell script.
fuck up once and you break your thumb / if you're happy at all then you're god damn dumb
that's right we're on a fucked up cruise / God is dead but at least we have booze
bad things happen, no one knows why / the sun burns out and everyone dies
Uh...
Current best practice is to leave admin shares enabled and mitigate the risks via monitoring, Windows Firewall, and/or tight controls on the admin accounts.
the "no true scotch man" fallacy.
Good cuz admin shares saved my life like 80 times back in the day.
The fun stuff that happens when migrating servers!
The reason those games are popular is because they're extremely relaxing, and the reason they're relaxing is because they have people removed from the "job" part of the game.
Either they removed people from an IT simulator and made it relaxing, which is a travesty, I want others to feel my pain, or they put them in, which makes the game not fun to play.
the "no true scotch man" fallacy.
Only they shouldn't just be failure conditions, they should be constant reminders
Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it
the "no true scotch man" fallacy.
the "no true scotch man" fallacy.
You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.
They both end up being your rival because the one you chose to help ends up having a bad experience with someone else in IT and you're all the same person to them.
Users, am I right?
To be fair, these particular problems aren't unique to IT. My sister who works HR (and spent about 15 years at a big international software company) had to deal with very similar situations. Any careers that service other parts of their organization probably get stuff like that.
How are you supposed to find the vehicle as a normal user? Go to the vehicle menu, click on the show inactive vehicles indicator, and then stare in disbelief as there is no active/inactive column to differentiate them all so you'll have to click on each one to bring up the relevant information.
You have to go to the user after they are created and add a role. You choose the site / study from a drop down list. That is not sorted alphabetically either at the site or study level. And if you view a user you only see the site, not the study (unless you query the DB or view source and know parent study ids). So a user may have four of five sites for sister studies but who knows what study #5 is.
There are thousands of sites and studies. Oh and if someone is added at a study level you can only add them at a site level through the back end. But site access is restricted to certain permissions unless explicitly given even if you are an app admin.
Luckily even though I'm the SME and this is a high dollar validated 21 CFR 11 system...nobody seems to give two shits. And we are the hard ass do it right shop.
Is it sorted by creation date? I've seen that stupidity in the past.
Nope, that would at least make some degree of stupid sense. The order is consistent but nothing - even something stupid like a key or guid sorted alphabetically seems to have anything to do with the order. I've seen ordering in other applications where the numeric order goes:
1
10
11
2
...
But this just makes no sense, and its funny because there are other places where it orders by study and then site in alphabetic order so its not like they are incapable of sorting it properly its just not implemented on the page where you add roles.
Yep, that's likely it.
If the app developers can't even present the data to users usefully, I'd give it 50/50 odds that would result in unindexed scans of a 10 million row table.