What is Problem Management?
Understanding the purpose, process, and benefits of problem management in IT service management.
Understanding the purpose, process, and benefits of problem management in IT service management.
To manage possible service interruptions, organizations use the incident management process whose definition and best practices we’ve discussed elsewhere. However, incidents are often just symptoms indicating an underlying “disease.” If you suspect a shared cause behind several incidents, you might need a separate process to determine this cause. ITIL (Information Technology Infrastructure Library) uses the term problem management to identify and eliminate the underlying causes of several incidents. In contrast, incident management’s main goal is to identify and restore services quickly. Problem management aims to prevent recurring incidents. In addition, if a long-term fix isn’t ready, problem management recommends a temporary solution and supports its implementation. ITIL emphasizes the importance of treating problems and incidents as distinct activities.
Problem management is a process in which IT teams work to identify and solve the root causes of problems or possible problems that could affect their organization. By doing this, they can prevent similar issues from happening again or reduce their impact on the business, which helps to improve the quality and stability of the IT services provided.
Problem management looks at problems throughout their lifecycle. IT specialists identify, register, and classify the problem and collect evidence to determine the cause. They attempt to resolve the issue and escalate it only if necessary. If the changes are complex or expensive, a temporary workaround may be suggested. Workarounds do not fix the problem permanently but let users avoid its consequences. The team should create a knowledge base record for every workaround, so specialists can refer to it. Known issues or errors are problems with registered workarounds.
Here is what the problem management lifecycle looks like as a step-by-step process:
Let’s look closer at each of these stages.
You can identify problems reactively or proactively. Establish a practice where IT specialists open a problem record under certain circumstances. Alternatively, use automated monitoring tools that notify you of critical changes in the system.
Identifying problems before they become urgent is not the top priority in problem management. Instead, activities like reviewing records of past incidents, having regular meetings with team leaders, and interviewing users require analysis and management. Even though it takes effort to anticipate potential issues before they impact, users can result in fewer unplanned interruptions and more dependable service. Detecting significant problems early can be essential for a business.
Just like in incident logging, for a problem, the very fact that you logged it, and logged all the relevant details, is crucial. Registering all incoming information, like the service type, classification, and prioritization data, helps to get a broader picture when investigating the problem later. Also, remember to collect links to the related incident records–your problem management software might have a special referencing mechanism for this purpose.
ITIL recommends using a shared approach to problem and incident classification because it’s easier to work with the unified records system. You can reuse prioritization guidelines from your incident management process as well. Mind how severe the problem is, how many incidents it caused, and how many people and assets it affected.
ITIL suggests a few methods that proved highly efficient in problem investigation: think of Ishikawa diagrams, Kepner and Tregoe approach, and brainstorming. We are sharing more on problem management best practices in another article.
Fundamentally, after investigating a problem, you may either find a resolution technique or realize that no permanent resolution exists. In each of these cases, recommended activities for the IT team are different. So let’s talk about each of these situations.
Ideally, a permanent solution can be developed to resolve a problem once and for all. When an issue arises, the agent will initially attempt to resolve it on their own. Change Management is a process that involves implementing changes in the organization that may impact different groups of people and require approval at various levels. According to ITIL, problem management involves detecting problems and taking responsibility for initiating organizational changes.
For convenience, software providers integrate problem management and change management modules. In Alloy Navigator, for example, invoking changes is possible through a button in the problem record interface.
However, a permanent resolution is only occasionally possible, and teams try to find at least a workaround. This temporary solution will allow enjoying the service to the fullest even though something is not working correctly. After the problem management team finds a workaround for a problem, they carefully describe it and pass it to the incident resolution team.
The incident resolution team might handle the information on known errors differently. They may create a Known Errors Database (KEDB), either generally available or only available to agents. Agents can search KEDB to find a resolution path for an incident. Tagging KEDB records with keywords helps to find relevant records quickly. Establishing a specific system object type for known errors is a best practice. The form used to store information on known errors should allow for quick recall when similar incidents occur.
A “side effect” of implementing a KEDB is that it helps onboard less experienced team members into the IT support process because they can help users quickly by referring them to a KEDB article. KEDB management is a part of the broader knowledge management process in the organization.
This stage involves several steps. Firstly, updating stakeholders on the resolution result. Next, entering all necessary data in the problem record. Finally, transferring the record to another status to remove it from the team’s agenda.
Knowledge management is a critical component of problem management. When an issue arises, it’s important to have a well-organized knowledge base to identify and resolve it quickly and effectively. The knowledge base may include information about known errors and their associated workarounds and a list of frequently asked questions and their solutions.
By leveraging knowledge management in problem management, teams can reduce the time and effort required to resolve issues. When a problem is identified, the team can search the knowledge base for similar incidents and see how they were resolved in the past. This helps identify patterns and common causes of problems, enabling the team to develop proactive measures to prevent similar issues.
Problem management and change management are two ITIL processes that are closely related and work together to improve the IT service management of an organization. The main objective of problem management is to find and fix the causes of problems that keep happening, while the purpose of change management is to make changes to IT systems in a way that causes the least amount of risk and disruption to the business.
The relationship between these two processes is essential for ensuring long-term success. When problem management identifies the root cause of recurring incidents, it often changes the IT system or processes. This is where change management comes in, making the necessary changes to the system. By implementing a well-defined change management process, organizations can ensure that changes are properly assessed, planned, and executed, minimizing the risk of introducing new problems or disruptions to the system.
As we already established, the objective of problem management is to eliminate the impact of errors that cause multiple incidents. By doing so, problem management achieves the following business results:
Because the number of incidents decreases and you receive fewer requests on recurring issues, your IT team spends less time on them. Which means you save money.
Thanks to problem management, problems threatening to interrupt a service are identified and prevented beforehand. As a result, services are more reliable and have a better reputation with the users.
Problem management helps to accumulate knowledge on the existing problems, the incidents reflecting them, and the workarounds. As soon as the specialist identifies a known error, they stop dwelling on the incident, and the resolution time is much shorter.
Contacting support to address the same issues again and again erodes users’ trust in the IT team. On the contrary, when incident agents not just fix incoming bugs but track incident trends and try to prevent recurring issues, users start to think of the IT team as a partner.
Investigating underlying problems will let your IT team see that you value their time and want it to be well-spent–not on resolving issues repeatedly. Moreover, many will find analytical tasks, such as problem diagnosis, more exciting than routine incident registration and resolution. The culture of asking “why?” will eventually inspire your IT team to approach their tasks more creatively.