As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/
Options

[sysadmin] on-call schedule - Always you

SeidkonaSeidkona Had an upgradeRegistered User regular
Welcome to a life of wrangling computers.

Often a lot of them.
Sometime's it's sleepless and thankless yet it seems to call to some of us for some reason?

The old thread was really old and I won't lie and tell anyone I will re-make this with new resources.

Mostly just huntin' monsters.
XBL:Phenyhelm - 3DS:Phenyhelm
Seidkona on
«13456721

Posts

  • Options
    SeidkonaSeidkona Had an upgrade Registered User regular
    Maybe I will work on the first post might have a bit of time between jobs depending.

    Mostly just huntin' monsters.
    XBL:Phenyhelm - 3DS:Phenyhelm
  • Options
    dporowskidporowski Registered User regular
    bowen wrote: »
    Netflix can absolutely afford to hire people to work after hours.

    So yes, they absolutely can and almost certainly do; their NOC-equivalent. What you're not going to get for overnights is a subject matter expert on the myriad particular Netflix bits from that one person. I can absolutely tell you that my team knows wtf is in the few million LOC in our particular widget, but have absolutely NFC what goes into the things that feeds it, or the internals of the N other widgets in the infrastructure. I have APIs. I know what comes out, or is supposed to come out, and I know what we put in, or are supposed to put in. Thassit.

    So, you hire someone to do after hours, who knows enough to handle minor stuff, knows the difference between major and minor, and knows who to poke if something major happens and shit is on fire. Regrettably, every few months I'm the one in the barrel for my team, but if you do this right/have a robust release process/good documentation/sane deployment practices, you basically never get paged. And if you do get paged, shit's on fire, yo, and you need an expert on the widget.

    Pls note, this is not "nah your developers can just get bothered whenever an alert fires". That's what your NOC is for. They do nothing but look at dashboards, monitor alerts, handle shit if they can so nobody gets woken up, and if someone does, they do incident management. It's very easy for someone to cheap out and not hire people to do that, and push the load onto the devs for "free", and that's wrong. But you're never ever going to find individuals who are developer-level SMEs on all those bits without actually being the people working on those bits, and there's too much to hold in one head when a single client is probably 3-4mill LOC all up, and you have N clients, and now the services (also easy multi-million LOC) that feed them etc...

  • Options
    DrovekDrovek Registered User regular
    dporowski wrote: »
    bowen wrote: »
    Netflix can absolutely afford to hire people to work after hours.

    So yes, they absolutely can and almost certainly do; their NOC-equivalent. What you're not going to get for overnights is a subject matter expert on the myriad particular Netflix bits from that one person. I can absolutely tell you that my team knows wtf is in the few million LOC in our particular widget, but have absolutely NFC what goes into the things that feeds it, or the internals of the N other widgets in the infrastructure. I have APIs. I know what comes out, or is supposed to come out, and I know what we put in, or are supposed to put in. Thassit.

    So, you hire someone to do after hours, who knows enough to handle minor stuff, knows the difference between major and minor, and knows who to poke if something major happens and shit is on fire. Regrettably, every few months I'm the one in the barrel for my team, but if you do this right/have a robust release process/good documentation/sane deployment practices, you basically never get paged. And if you do get paged, shit's on fire, yo, and you need an expert on the widget.

    Pls note, this is not "nah your developers can just get bothered whenever an alert fires". That's what your NOC is for. They do nothing but look at dashboards, monitor alerts, handle shit if they can so nobody gets woken up, and if someone does, they do incident management. It's very easy for someone to cheap out and not hire people to do that, and push the load onto the devs for "free", and that's wrong. But you're never ever going to find individuals who are developer-level SMEs on all those bits without actually being the people working on those bits, and there's too much to hold in one head when a single client is probably 3-4mill LOC all up, and you have N clients, and now the services (also easy multi-million LOC) that feed them etc...

    Over here we have a very robust and nice incident management procedure where each service team has their own on-call schedule. So if there is an incident and we know that we need someone from Service A to take a look, we know where to poke to quickly get a response. Then if it turns out that it wasn't Service A but that we should look instead into B, we can bring in that team's on-call and so on.

    It's great, makes the teams feel responsible for the code they put into production, and shit gets fixed quickly because no one is trying to understand what service A does in the middle of the night while shit stopped working.

    steam_sig.png( < . . .
  • Options
    SeidkonaSeidkona Had an upgrade Registered User regular
    Last job I was on a 1x a month on call. Previous to that it was ever other week.

    1x wasn't bad but honestly I am so done with it.

    Mostly just huntin' monsters.
    XBL:Phenyhelm - 3DS:Phenyhelm
  • Options
    dporowskidporowski Registered User regular
    I mean yeah, I'm talking about "every few months" kind of thing, because it's split amongst the entire team working on a widget. I would assume as org and team scale up and down, this gets better and worse. I'm on a not huge, but decently sized development team (call it 8-10 pizzas worth of people all told, though not all in one room so to speak) so it works out fine. If I had 3 people on my team, I'd be less pleased by far.

  • Options
    NosfNosf Registered User regular
    edited October 2021
    We alternate week on week off and get paid a stipend when we're on, plus get back time in lieu if we have to go onsite. Time in lieu kinda sucks, but still, it's a nice little bump in pay. We're not huge, 500 person org and there are weeks without calls.

    Nosf on
  • Options
    bowenbowen How you doin'? Registered User regular
    edited October 2021
    dporowski wrote: »
    bowen wrote: »
    Netflix can absolutely afford to hire people to work after hours.

    So yes, they absolutely can and almost certainly do; their NOC-equivalent. What you're not going to get for overnights is a subject matter expert on the myriad particular Netflix bits from that one person. I can absolutely tell you that my team knows wtf is in the few million LOC in our particular widget, but have absolutely NFC what goes into the things that feeds it, or the internals of the N other widgets in the infrastructure. I have APIs. I know what comes out, or is supposed to come out, and I know what we put in, or are supposed to put in. Thassit.

    So, you hire someone to do after hours, who knows enough to handle minor stuff, knows the difference between major and minor, and knows who to poke if something major happens and shit is on fire. Regrettably, every few months I'm the one in the barrel for my team, but if you do this right/have a robust release process/good documentation/sane deployment practices, you basically never get paged. And if you do get paged, shit's on fire, yo, and you need an expert on the widget.

    Pls note, this is not "nah your developers can just get bothered whenever an alert fires". That's what your NOC is for. They do nothing but look at dashboards, monitor alerts, handle shit if they can so nobody gets woken up, and if someone does, they do incident management. It's very easy for someone to cheap out and not hire people to do that, and push the load onto the devs for "free", and that's wrong. But you're never ever going to find individuals who are developer-level SMEs on all those bits without actually being the people working on those bits, and there's too much to hold in one head when a single client is probably 3-4mill LOC all up, and you have N clients, and now the services (also easy multi-million LOC) that feed them etc...

    Still literally not my problem, hire developers for that team for after hours or actually pay those people to be on call.

    The problem is always not wanting to pay, always.

    bowen on
    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    dporowskidporowski Registered User regular
    I mean, that's what I said, no? If you're not the NOC (aka "paid to sit there and stare at blinky lights all night") or the relevant system SME, you shouldn't be being paged for a thing. And if you are the SME on a system/on that team, it's rather reasonable that, should something happen requiring the input of an expert, you get the escalation if needful.

    This requires your NOC exist, be competent enough to determine "what we need to page out for" vs "what we know how to handle/isn't urgent", and an org willing to spend on these sorts of systems, of course, but like... Yeah. That's how to do it properly. Not doing it properly will make it suck. Same with sharing the rotation; it needs to be spread amongst a team equally, none of this "every couple days" thing. A week every few months? Fine. Every other? That's a compensation discussion.

  • Options
    bowenbowen How you doin'? Registered User regular
    dporowski wrote: »
    I mean, that's what I said, no? If you're not the NOC (aka "paid to sit there and stare at blinky lights all night") or the relevant system SME, you shouldn't be being paged for a thing. And if you are the SME on a system/on that team, it's rather reasonable that, should something happen requiring the input of an expert, you get the escalation if needful.

    This requires your NOC exist, be competent enough to determine "what we need to page out for" vs "what we know how to handle/isn't urgent", and an org willing to spend on these sorts of systems, of course, but like... Yeah. That's how to do it properly. Not doing it properly will make it suck. Same with sharing the rotation; it needs to be spread amongst a team equally, none of this "every couple days" thing. A week every few months? Fine. Every other? That's a compensation discussion.

    Sorry there was far too many acronyms for me to parse it without busting out the acronym finder so I assume it was a "ahhhhhhh I did it it's not a big deal" thing I see 8 times out of 10 when I bring this up somewhere.

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    dporowskidporowski Registered User regular
    Nope, I hate on call. Did it for three years once, continually, and can't hear a phone ring without a twitch as a result. I'll just accept it as a necessity when done properly, and when done properly, it can be made to suck as minimally as possible. It still sucks, and I don't like my week in the barrel, but... I get why. And I won't throw a fit as long as they hold up their end, and respect my time.

  • Options
    bowenbowen How you doin'? Registered User regular
    Yeah same, like, I'm more than happy to be on call for those weeks but I want time and a half if I take calls and I want to be paid for the other 16 hours of my day I'm "on wait".

    It's really not my fault you have a service that promises some level of 9s of uptime and you're undercharging for what that actually costs.

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    DarkewolfeDarkewolfe Registered User regular
    Oh yeah. I think if you don't have a follow the sun staffing plan and dedicated NOC then your management probably just straight up hasn't aligned uptime requirements and priorities. Either it's not actually that important or there is something really busted. I'm never going to do true overnight on call again. I still wake up in a panic when the phone rings from when I did it.

    What is this I don't even.
  • Options
    FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    I just had to walk our new systems administrator through how to map a shared drive in Windows.

    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • Options
    FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    He knows the steps, as in the right place to go in the UI, but the conceptual understanding was lacking.

    I'm sure if I had told him, "Map the Z: drive to \\server\path" he could have done it.

    But the problem was that he encountered a situation where he needed to access a file that is on \\server\path, which he knows as Z:, on a computer without a Z: drive, and got blocked.

    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • Options
    SeidkonaSeidkona Had an upgrade Registered User regular
    It hurts

    Mostly just huntin' monsters.
    XBL:Phenyhelm - 3DS:Phenyhelm
  • Options
    bowenbowen How you doin'? Registered User regular
    Feral wrote: »
    I just had to walk our new systems administrator through how to map a shared drive in Windows.

    wow

    what

    wow

    what

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    ThawmusThawmus +Jackface Registered User regular
    I'm sorry I haven't touched Windows for about 11 years now but back when I did, browsing shares via \\SERVERNAME\Whatever was a pretty regular occurrence.

    Is his next lesson going to be admin shares? Does he know they exist? If your policy is to remove them (I dunno what best practice is nowadays), does he know how to do that?

    Twitch: Thawmus83
  • Options
    AiouaAioua Ora Occidens Ora OptimaRegistered User regular
    Oh man I remember when I first learned about admin shares.

    I was really handy in the hospital I worked at since most of the "installed programs" were just desktop shortcuts to various webapps.

    Made some of the old guard look like clowns by doing work for which they had booked an entire day in like 30 seconds with a powershell script.

    life's a game that you're bound to lose / like using a hammer to pound in screws
    fuck up once and you break your thumb / if you're happy at all then you're god damn dumb
    that's right we're on a fucked up cruise / God is dead but at least we have booze
    bad things happen, no one knows why / the sun burns out and everyone dies
  • Options
    LD50LD50 Registered User regular
    Feral wrote: »
    He knows the steps, as in the right place to go in the UI, but the conceptual understanding was lacking.

    I'm sure if I had told him, "Map the Z: drive to \\server\path" he could have done it.

    But the problem was that he encountered a situation where he needed to access a file that is on \\server\path, which he knows as Z:, on a computer without a Z: drive, and got blocked.

    Uh...

  • Options
    FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    Thawmus wrote: »
    I'm sorry I haven't touched Windows for about 11 years now but back when I did, browsing shares via \\SERVERNAME\Whatever was a pretty regular occurrence.

    Is his next lesson going to be admin shares? Does he know they exist? If your policy is to remove them (I dunno what best practice is nowadays), does he know how to do that?

    Current best practice is to leave admin shares enabled and mitigate the risks via monitoring, Windows Firewall, and/or tight controls on the admin accounts.

    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • Options
    ThawmusThawmus +Jackface Registered User regular
    Feral wrote: »
    Thawmus wrote: »
    I'm sorry I haven't touched Windows for about 11 years now but back when I did, browsing shares via \\SERVERNAME\Whatever was a pretty regular occurrence.

    Is his next lesson going to be admin shares? Does he know they exist? If your policy is to remove them (I dunno what best practice is nowadays), does he know how to do that?

    Current best practice is to leave admin shares enabled and mitigate the risks via monitoring, Windows Firewall, and/or tight controls on the admin accounts.

    Good cuz admin shares saved my life like 80 times back in the day.

    Twitch: Thawmus83
  • Options
    MyiagrosMyiagros Registered User regular
    Trying to sort out an issue with Outlook from a migration someone else did from Exchange 2013->2019. Outlook search gives the "something went wrong" message, but only when searching by Current Folder or Current Mailbox. If I choose Subfolders or All Outlook Items, the search works properly.

    The fun stuff that happens when migrating servers!

    iRevert wrote: »
    Because if you're going to attempt to squeeze that big black monster into your slot you will need to be able to take at least 12 inches or else you're going to have a bad time...
    Steam: MyiagrosX27
  • Options
    DrovekDrovek Registered User regular
    If you can have powerwashing simulators and trucking simulators, why not IT Support simulators?

    steam_sig.png( < . . .
  • Options
    ThawmusThawmus +Jackface Registered User regular
    Okay, I'm crying foul here.

    The reason those games are popular is because they're extremely relaxing, and the reason they're relaxing is because they have people removed from the "job" part of the game.

    Either they removed people from an IT simulator and made it relaxing, which is a travesty, I want others to feel my pain, or they put them in, which makes the game not fun to play.

    Twitch: Thawmus83
  • Options
    FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    If the IT Expansion doesn't make you feel like crawling under your desk and clutching a bottle of whiskey after your first game, it's unrealistic

    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • Options
    FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    There should be Kobayashi Maru no-win scenarios

    Only they shouldn't just be failure conditions, they should be constant reminders

    Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it

    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • Options
    FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    Semi-related, every time I think I understand dynamic routing protocols like EIGRP, something bizarre happens that proves to me that I do not.

    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • Options
    DrovekDrovek Registered User regular
    Feral wrote: »
    There should be Kobayashi Maru no-win scenarios

    Only they shouldn't just be failure conditions, they should be constant reminders

    Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it

    You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.

    steam_sig.png( < . . .
  • Options
    LD50LD50 Registered User regular
    Drovek wrote: »
    Feral wrote: »
    There should be Kobayashi Maru no-win scenarios

    Only they shouldn't just be failure conditions, they should be constant reminders

    Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it

    You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.

    They both end up being your rival because the one you chose to help ends up having a bad experience with someone else in IT and you're all the same person to them.

  • Options
    That_GuyThat_Guy I don't wanna be that guy Registered User regular
    edited October 2021
    Edit: Oops, wrong thread

    Users, am I right?

    That_Guy on
  • Options
    H3KnucklesH3Knuckles But we decide which is right and which is an illusion.Registered User regular
    edited October 2021
    Drovek wrote: »
    Feral wrote: »
    There should be Kobayashi Maru no-win scenarios

    Only they shouldn't just be failure conditions, they should be constant reminders

    Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it

    You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.
    LD50 wrote: »
    Drovek wrote: »
    Feral wrote: »
    There should be Kobayashi Maru no-win scenarios

    Only they shouldn't just be failure conditions, they should be constant reminders

    Like the "seven perpendicular lines" sketch, except there's no way to make the request go away, you just keep getting pestered for it

    You should get two conflicting orders from different people, who complain when you haven't done what they asked but never talk between them. So whichever you end up not doing sets you up against your rival for the rest of the game.

    They both end up being your rival because the one you chose to help ends up having a bad experience with someone else in IT and you're all the same person to them.

    To be fair, these particular problems aren't unique to IT. My sister who works HR (and spent about 15 years at a big international software company) had to deal with very similar situations. Any careers that service other parts of their organization probably get stuff like that.

    H3Knuckles on
    If you're curious about my icon; it's an update of the early Lego Castle theme's "Black Falcons" faction.
    camo_sig2-400.png
  • Options
    lwt1973lwt1973 King of Thieves SyndicationRegistered User regular
    Problem with our accounting software not being able to add a vehicle so we find out that a vehicle was inactivated two years ago with the same name through a SQL query.

    How are you supposed to find the vehicle as a normal user? Go to the vehicle menu, click on the show inactive vehicles indicator, and then stare in disbelief as there is no active/inactive column to differentiate them all so you'll have to click on each one to bring up the relevant information.

    "He's sulking in his tent like Achilles! It's the Iliad?...from Homer?! READ A BOOK!!" -Handy
  • Options
    zagdrobzagdrob Registered User regular
    We have an app (that rhymes with CopenOclinica) where users are added to studies or sites (children of studies).

    You have to go to the user after they are created and add a role. You choose the site / study from a drop down list. That is not sorted alphabetically either at the site or study level. And if you view a user you only see the site, not the study (unless you query the DB or view source and know parent study ids). So a user may have four of five sites for sister studies but who knows what study #5 is.

    There are thousands of sites and studies. Oh and if someone is added at a study level you can only add them at a site level through the back end. But site access is restricted to certain permissions unless explicitly given even if you are an app admin.

    Luckily even though I'm the SME and this is a high dollar validated 21 CFR 11 system...nobody seems to give two shits. And we are the hard ass do it right shop.

  • Options
    lwt1973lwt1973 King of Thieves SyndicationRegistered User regular
    zagdrob wrote: »
    That is not sorted alphabetically either at the site or study level.

    Is it sorted by creation date? I've seen that stupidity in the past.

    "He's sulking in his tent like Achilles! It's the Iliad?...from Homer?! READ A BOOK!!" -Handy
  • Options
    zagdrobzagdrob Registered User regular
    lwt1973 wrote: »
    zagdrob wrote: »
    That is not sorted alphabetically either at the site or study level.

    Is it sorted by creation date? I've seen that stupidity in the past.

    Nope, that would at least make some degree of stupid sense. The order is consistent but nothing - even something stupid like a key or guid sorted alphabetically seems to have anything to do with the order. I've seen ordering in other applications where the numeric order goes:

    1
    10
    11
    2
    ...

    But this just makes no sense, and its funny because there are other places where it orders by study and then site in alphabetic order so its not like they are incapable of sorting it properly its just not implemented on the page where you add roles.

  • Options
    That_GuyThat_Guy I don't wanna be that guy Registered User regular
  • Options
    InfidelInfidel Heretic Registered User regular
    Could be table order (aka the order isn't based on anything in the actual data at all or visible, but literally how it is organized by the DBMS).

    OrokosPA.png
  • Options
    schussschuss Registered User regular
    Infidel wrote: »
    Could be table order (aka the order isn't based on anything in the actual data at all or visible, but literally how it is organized by the DBMS).

    Yep, that's likely it.

  • Options
    LD50LD50 Registered User regular
    edited November 2021
    I think there's an sql statement that lets you take any query results and ORDER BY whatever you want. Can't remember what it's called though...

    LD50 on
  • Options
    SiliconStewSiliconStew Registered User regular
    LD50 wrote: »
    I think there's an sql statement that lets you take any query results and ORDER BY whatever you want. Can't remember what it's called though...

    If the app developers can't even present the data to users usefully, I'd give it 50/50 odds that would result in unindexed scans of a 10 million row table.

    Just remember that half the people you meet are below average intelligence.
Sign In or Register to comment.