Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define some first-line and second-line support processes #1068

Closed
3 tasks done
choldgraf opened this issue Mar 8, 2022 · 6 comments
Closed
3 tasks done

Define some first-line and second-line support processes #1068

choldgraf opened this issue Mar 8, 2022 · 6 comments
Assignees

Comments

@choldgraf
Copy link
Member

choldgraf commented Mar 8, 2022

Background and proposal

There are often cases where our support process is under-documented. For example, a few questions that people weren't sure how to answer:

  • How should we prioritize support requests?
  • What should we do if a request is not immediately "closeable"? Or if it requires ongoing follow-up work?
  • How can we communicate our inability to fulfill a request?
  • What kind of communication should we use throughout the support process?

We should document some rough guidelines for these common questions, and also provide references to documentation about how to carry out first-line support.

Implementation guide and constraints

Another way to approach this is to ask "what are some common support situations, and what should we do in each situation?" We can draw from our experiences thus far to agree on some team practices to follow.

Updates and ongoing work

Add items below as we learn more

Refs

@choldgraf
Copy link
Member Author

cc @sgibson91 who I believe has some notes about support processes at other organizations that we could use for comparison.

@sgibson91 sgibson91 changed the title Define some first-tier and second-tier support processes Define some first-line and second-line support processes Mar 9, 2022
@sgibson91
Copy link
Member

sgibson91 commented Mar 9, 2022

Notes on Managed Service Provider (MSP) first-line support protocols

I had a chat with my partner who has done first-line support in a couple of companies and this are my notes from that discussion.

Note: This will be from an industry perspective and will not be entirely applicable to us. We should pick and choose what we think would work for us.

Responding to tickets

  • We should setup Freshdesk to send an auto-response to all incoming tickets saying we have received the ticket and issuing a reference number (if it does not already do this?) (Sarah is not sure what emails get sent to clients when certain buttons are clicked and this fuels some of her anxiety.)
  • First response from the Support Steward should be within a given timeframe of the ticket coming in, corrected for active support hours (i.e., what timezone is the person providing support in vs. the support requester). An entirely unrealistic for us example might be: First response from Support Steward should be within half an hour of the ticket coming in.
  • Escalating to third parties (e.g., GCP or Azure...)
    • Support Steward would provide a message to the client indicating they've escalated the ticket to a third party and are waiting on a response
    • Ticket state is "On Hold"

Prioritisation of tickets

A priority queue with all integers set to maximum is just a queue...

P5: Service request - This is an "I would like..."-style ticket
P4: "I have a problem and it affects only me"
P3: More than 5 people or a whole department can't do the same thing.* (Or request comes from a VIP)
P2: Issue that affects multiple departments but not the whole company. Perhaps more than one service is down.
P1: No one can work. Everything is on fire.

  • Note: If there is a temporary work-around for a P3 problem, you can apply this fix and downgrade the ticket to a P4 while working on the real fix. Often this is when tickets get pulled out into projects with proper project management as this can affect SLAs agreed with the client. This project work is then billed back to the client.

SLAs regarding timeframes for carrying out support-related tasks

These are the timeframes we agree with a client for resolving problems. They should be defined in number of working hours within our active support hours bracket (I'm cheating in the example below).

These are another set of example timeframes that would be unrealistic for us to adopt

P5: Resolve within 5 working days unless it needs to be a project
P4: Resolve within 3 working days
P3: Resolve within 1 working day
P2: Maximum 4 working hours to resolution
P1: ~1-2 working hours to resolution

Communications during P2/P1 events

P2: Provide updates every 2 hours
P1: Provide updates every hour

Have a comms template for these events. From what was described to me, this is very similar to the top half of our Hub Incident issue template. Includes a timeline of the event, the symptom reported, etc. These comms should still go out regularly, even if the update is still "we're investigating".

What channels do these comms go out on? Mailing lists, forum posts, twitter?

After the event is resolved, compile and release an autopsy report. This is very similar to the second half of our Hub Incident template. Covers what went wrong, how did it go wrong, what are the steps to prevent it happening again or minimise disruption if prevention isn't possible.

Support Tiers

  • Pay as you Go
  • Tier 1
    • "break & fix" (something is broken, we fix it)
    • Charge £X per Y hours per month
  • Tier 2
    • break & fix
    • provide some recommendations for improvement
    • Charge £X+dX per Y+dY hours per month
  • Tier 3
    • Unlimited support hours per month
    • Extra services such as 24/7 monitoring, automatic upgrades

This is potentially something we could fit into our alpha service pricing matrix

Guidelines for what is in scope of our support model

We need to define these (maybe on a per client basis?) and be strict about it.

The "maximum capitalist company" thing to do might be:

  • If a problem is found to be a client's fault
    • We offer guidance to help them fix it
    • If they really want us to do the work, fine but we have a flat hourly rate of £X

I'm not 100% convinced that's the right thing for 2i2c to do, just presenting it here as one aspect I learned about.

One guideline I do think we should implement is that clients should give us at least two weeks notice for when a requested change (low priority) needs to be ready by, e.g., if we are managing their image and they request a package. We should avoid the situation of a request for a package comes in and 2 days later, the client says they really need it. In the meantime, we've been battling timeout builds/pushes or whatever. If a low-priority request comes in with less than two weeks notice, then we make no promises to have it ready by the time the client needs it.

Disadvantage

(And a pretty big one IMO) This style of MSP support requires time-tracking in order to bill the client for, e.g., overtime on support hours, projects generated from support tickets, etc.

@choldgraf
Copy link
Member Author

Some info from a friend at a tech company

I also had a conversation with a friend at a tech company about this that runs internal and external software services (not mentioning company or friend's name for privacy purposes). Here's a short breakdown of their process:

Roles they use

  • An engineering team broadly understands the infrastructure behind the service. They do a combination of development and operations.
  • An operations role is a member of this team that will attempt to resolve all operational issues first. This rotates weekly through the team.
  • A support person is a dedicated role that is always held by the same (non-engineer) person. This person communicates with the customers and forward information to the Operations Role on the engineering team for resolution.
  • A project manager largely oversees timelines and planning around development efforts, but may decide to change deadlines if enough support issues pop up that the team doesn't have time to meet them.
  • A team manager is more like a "line manager", they help with the team processes but focus more of their efforts on making sure team members are supported, on a good career path, etc.

When a support request comes in

Here's what happens:

  • The support person interacts with the person that made the request - they respond and try to identify what's going on.
  • If an action needs to be taken, they contact the operations member for that week, who tries to resolve the issue.
  • If they cannot resolve the issue, they speak to the Project Manager, who identifies another member of the team to help with the issue.
  • If they cannot find another person to assign to the issue, it goes into a list of open operations issues with no owner.
  • Each week, they have a triage meeting led by the support person. The outcome of that meeting is that every operations / support issue must have an owner.
  • In this meeting, the Project Manager may help estimate capacity of team members and suggests the person to take on an issue, if none of the engineers want to pick up the issue. Apparently there are often "awkward silences" where they wait for somebody to volunteer to do something 😆

How this interfaces with development

They do development in parallel with these operations tasks, and each team member has ongoing projects with deadlines associated with them.

Occasionally, there are enough operational tasks that they realize they won't hit their deadlines. When this happens, the Project and Team Managers discuss with one another and agree on a plan forward, potentially to move back the deadlines for their projects.

@damianavila
Copy link
Contributor

Thank you both for sharing these pieces of information!
Everything described here pretty much aligns with my own previous experiences!
I think several of the pieces detailed above could be adopted with adjustments accordingly to our current state and our mission/vision.

@choldgraf
Copy link
Member Author

Update

@yuvipanda had some great ideas in #1154 for steps in this direction, along with process notes shared from Sarah. I think we should make a quick push to document some of that, since the content is mostly there. We could also use this as an opportunity to update our SLA docs a little bit to make them more clear.

@choldgraf
Copy link
Member Author

I'm going to close this one and say it was completed by merging the following PR:

We can continue to iterate on these team processes over time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants