Incident Management

When to use it

  • When the issue is affecting a production scope service :
  • When there is anything directly impacting users
  • Or when the issue is not visible yet but will impact users soon

What to do

Assess what is the problem

What is impacted, is the issue serious ? : maximum 15 minutes, start communication even if not completely aware of what is going on

Communication

  • Prepare and send the first Incident notification in the next 10 minutes if possible :(incident start template link)
    • It contains :
      • What service is impacted ?
      • The scope of the issue, how it is impacted ? Complete downtime or degraded situation ?
      • Who is impacted ? → Add people to the email when there is one
      • For how long it will be down ? Or at least an explanation of what cause uncertainty to the resolution time
      • What has been started ?
      • What are the next steps of the debug ?
      • Advise next communication will be in : 1H
  • Look at the time to schedule the next communication mark in 1H

Debugging

  • Continue debugging.
    • Either you solve the issue :
      • Take 10 minutes to send the Incident Solved notification :(Incident Solved template link) : preferably highlighting the last resolution steps in green
    • Or still not solved : 10 minutes before the scheduled mark, you start to append to your last communication what is the current state of the debugging (on the Teams group or on the last email that you modify with the updates, preferably with the new elements highlighted in orange)
  • Repeat the lasts steps of debugging and communication until this is solved, or until we reach 4H

When the issue takes more than 4H

  • After 4H without resolution :
    • Notify anyone else ? (management?)
    • Get some external help on the issue ? (best to have a readily list of people contacts with their expertise when this is available)
    • Decide to switch on a workaround ? : restore from backups, or reinstall a server ?
  • After a day without resolution :
    • Notify anyone else ? (management?)
    • Switch to a complete workaround ? (Reinstall a server from scratch, create a new VPS in Infomaniak and install manually without backups)
Edit this page
Back to top