How to Prioritize Security Findings When Everything Is Flagged as Critical
Stop drowning in critical alerts. Learn how to prioritize security findings by adding context, asset criticality, and attack paths to your triage process.
Introduction
For the modern senior engineer or tech lead, the "security dashboard" has become a source of profound cognitive dissonance. You log in to find a sea of red: 450 "Critical" vulnerabilities, 1,200 "High" alerts, and a backlog that seems to grow faster than your team can ship code. In the world of Cloud Security Posture Management (CSPM) and Vulnerability Management (VM), we have reached a breaking point where the term "Critical" has lost its semantic meaning. When everything is a priority, nothing is.
The fundamental issue is that most security tooling relies on the Common Vulnerability Scoring System (CVSS) as its primary—and often only—metric for prioritization. While CVSS provides a standardized way to measure the theoretical severity of a bug, it is entirely context-blind. It doesn't know if a vulnerable library is sitting on an air-gapped development server or a public-facing production gateway holding customer PII. It doesn't know if the vulnerability is even reachable via the network or if the vulnerable function is actually loaded into memory.
As tech leads, our responsibility isn't just to "fix bugs"; it is to manage risk. Every hour an engineer spends patching a theoretical CVSS 9.8 vulnerability that is unreachable is an hour they aren't spending on a CVSS 7.0 vulnerability that is actively being exploited in the wild. This post explores how to move beyond raw severity scores and build a context-aware prioritization model that distinguishes between "noise" and "existential risk."
The Failure of Raw CVSS Scores
CVSS was designed to provide a universal language for the technical characteristics of a vulnerability. It answers the question: "How bad is this bug in a vacuum?" It does not answer: "How much risk does this bug pose to my specific organization?"
The limitations of relying solely on CVSS are manifest in several ways:
- Theoretical vs. Practical Exploitability: A vulnerability might have a high score because it allows for Remote Code Execution (RCE), but the exploit might require a highly specific, non-default configuration that your environment doesn't use.
- Lack of Environmental Context: CVSS does not account for existing security controls. An RCE on a container with a read-only filesystem and no outbound network access is significantly less dangerous than the same RCE on a privileged container with internet access.
- Temporal Irrelevance: CVSS scores are often static. They don't fluctuate based on whether a functional exploit script has just been released on GitHub or if a ransomware group is currently targeting that specific CVE.
- Volume Overload: Because the bar for a "Critical" CVSS score (9.0+) is strictly technical, a single misconfigured base image can result in thousands of "Critical" flags across a microservices architecture, even if the risk is redundant or mitigated.
This reliance on raw scores is a primary driver of alert fatigue. As noted in Rectify Cloud's analysis of alert fatigue, when engineers are bombarded with low-signal alerts, their responsiveness to genuine threats diminishes. They begin to view security findings as "to-do list clutter" rather than urgent signals, creating a dangerous gap in the organization's security posture.
The Three Pillars of Risk-Based Prioritization
To build a triage process that actually works, we must layer three additional dimensions of context on top of the base CVSS score: Asset Criticality, Exploitability, and Attack Path Analysis.
1. Asset Criticality (The "So What?" Factor)
The first step in prioritization is defining the value of the asset at risk. A "Critical" vulnerability on a marketing landing page that contains no data and has no backend access is fundamentally different from a "Medium" vulnerability on a database containing plaintext customer records.
Effective asset criticality models should categorize assets based on:
- Data Sensitivity: Does the asset handle PII, PCI, HIPAA, or trade secrets?
- Business Continuity: Is this asset in the critical path for revenue generation?
- Identity and Access: Does the service account associated with this asset have high-privilege IAM roles (e.g.,
AdministratorAccessoriam:PassRole)? - Environment: Is this Production, Staging, or Development?
2. Exploitability and Reachability
Not all vulnerabilities are created equal in the eyes of an attacker. We must use real-world intelligence to determine if a vulnerability is likely to be exploited.
- EPSS (Exploit Prediction Scoring System): Unlike CVSS, EPSS uses data from real-world attacks to estimate the probability that a vulnerability will be exploited in the next 30 days. A CVSS 9.8 with an EPSS of 0.01% is often less urgent than a CVSS 7.2 with an EPSS of 80%.
- Reachability Analysis: This is a technical check to see if the vulnerable code is actually "reachable" from the internet. If a CVE affects a library used by a service, but the service is hidden behind a VPN and a WAF, and the specific vulnerable function is never called, the risk is significantly mitigated.
- KEV (Known Exploited Vulnerabilities): CISA maintains a catalog of vulnerabilities known to be exploited in the wild. Any finding that appears on the KEV list should immediately jump to the front of the remediation queue, regardless of its CVSS score.
3. Attack Path Analysis (Blast Radius)
Modern cloud security is a game of graphs, not lists. Attack path analysis looks at how an attacker could chain multiple minor issues together to reach a sensitive target.
For example, consider an S3 bucket with a "Medium" misconfiguration (publicly readable). On its own, it might contain only public assets. However, if that bucket contains a script used by an EC2 instance, and that EC2 instance has an IAM role that can write to a production database, that "Medium" finding is actually a step in a critical attack path. Prioritization must account for the "Blast Radius"—how much damage can be done if this specific finding is exploited?
Building a Context-Aware Scoring Model
To automate this, you need a formula that translates these qualitative factors into a quantitative "True Risk Score." A simplified model might look like this:
True Risk = (Base CVSS * Exploitability Factor) * Asset Criticality * Reachability Multiplier
Where:
- Exploitability Factor: Based on EPSS or KEV status (e.g., 1.5x if in KEV, 0.5x if no known exploit).
- Asset Criticality: Based on business value (e.g., 2.0x for Prod, 0.1x for Sandbox).
- Reachability Multiplier: (e.g., 1.0x if internet-facing, 0.2x if internal-only).
Technical Implementation: Automating Triage with Policy-as-Code
As a senior engineer, you don't want to do this manually. You can implement these rules using Policy-as-Code tools like Open Policy Agent (OPA) or by writing custom logic in your CI/CD pipeline that interacts with your CSPM’s API.
Below is a conceptual example of a JSON structure that represents a context-enriched security finding. This data format allows for programmatic triage.
{
"finding_id": "VULN-2023-4567",
"resource_id": "arn:aws:ec2:us-east-1:123456789012:instance/i-0abcd1234efgh5678",
"base_severity": {
"cvss_score": 9.8,
"label": "CRITICAL"
},
"context_factors": {
"is_internet_facing": true,
"has_known_exploit": true,
"epss_percentile": 0.94,
"asset_criticality": "HIGH",
"contains_pii": true,
"attack_path_depth": 2
},
"calculated_risk_score": 98.5,
"priority_level": "P0",
"routing": {
"team": "platform-engineering",
"slack_channel": "#alerts-platform-critical",
"jira_project": "PLAT"
}
}By enriching the finding with context_factors, you can move away from a "fix everything" mentality to a "fix the P0s" mentality. In this example, despite the high CVSS score, the decision to label this a "P0" is driven by the fact that it is internet-facing, has a high EPSS score, and contains PII.
Designing the Triage Workflow
Once you have a scoring model, you need a process to route findings to the right people. One of the biggest mistakes organizations make is routing all security findings to a central "Security Team." This creates a bottleneck and separates the fix from the person who understands the code.
Step-by-Step Triage Process
- Ingestion & Enrichment: Pull findings from your CSPM/VM tools via API. Immediately cross-reference them with your CMDB (Configuration Management Database) for asset criticality and threat intel feeds for exploitability.
- Automated Filtering: Drop or "Snooze" findings that meet specific low-risk criteria (e.g., vulnerabilities in non-running kernels or development environments with no access to production data).
- Intelligent Routing: Use the
tagson the cloud resource to identify the owner. If an EC2 instance is taggedTeam: Payments, the finding should automatically generate a Jira ticket in the Payments team's backlog. - SLA Enforcement based on True Risk:
- P0 (Critical Risk): Immediate remediation (24-48 hours). Triggers an incident response if necessary.
- P1 (High Risk): Remediation within the current sprint.
- P2 (Moderate Risk): Remediation within 30-60 days.
- P3 (Low Risk): Backlog; fix if the resource is being updated anyway.
- Feedback Loop: Allow engineering teams to "Dispute" a finding if the reachability analysis is wrong. This feedback should be used to tune the automated scoring model.
Overcoming the "Everything is Critical" Culture
Changing how you prioritize findings is as much a cultural challenge as it is a technical one. Compliance frameworks (like SOC2 or PCI-DSS) often mandate that "all critical vulnerabilities must be patched within X days." If your tool says a finding is critical, auditors might expect a patch, regardless of your reachability analysis.
To solve this, you must document your prioritization methodology. When an auditor asks why a CVSS 9.8 wasn't patched in 48 hours, you should be able to produce a report showing:
- The vulnerability was not reachable from the network.
- The asset was categorized as "Low Criticality/Non-Production."
- Compensating controls (e.g., an IDS/IPS or a WAF) were in place.
- The risk was formally accepted by the technical lead.
By shifting the conversation from "vulnerability counts" to "risk reduction," you gain credibility with both auditors and your engineering staff. You are no longer asking your team to chase ghosts; you are asking them to defend the business.
The Impact of Effective Prioritization
When you successfully implement context-aware prioritization, several things happen:
- Reduced Burnout: Engineers stop ignoring security alerts because they know that when they do get an alert, it actually matters.
- Faster MTTR (Mean Time To Remediate): By narrowing the focus to the top 5% of truly dangerous findings, those findings get fixed much faster.
- Measurable Security Improvement: You can demonstrate to leadership that while the number of vulnerabilities might remain steady, the reachability of those vulnerabilities and the overall attack surface has decreased significantly.
As highlighted by the Rectify Cloud blog on alert fatigue, the goal is to protect the "human element" of your security posture. A tired, overwhelmed engineer is a security liability. A focused engineer with clear priorities is your best defense.
Key Takeaways for Tech Leads
To summarize the transition from CVSS-based to Risk-based prioritization, keep these points in mind:
- Stop treating CVSS as a Bible: It is a technical baseline, not a business priority.
- Invest in Asset Tagging: Your prioritization is only as good as your knowledge of your environment. If you don't know which assets are "Crown Jewels," you cannot prioritize.
- Leverage EPSS and KEV: Use real-world exploit data to filter out theoretical risks.
- Automate the Triage: Use Policy-as-Code to enrich and route findings without manual intervention.
- Focus on Reachability: If the bug can't be reached by an attacker, it shouldn't be a P0.
- Empower Teams: Route findings directly to service owners and give them the context they need to fix the issue quickly.
Conclusion
The "Critical" label in modern security tools has become a victim of its own success. By flagging everything that could be dangerous, these tools have made it harder to see what is dangerous. For senior engineers and tech leads, the path forward requires a disciplined approach to context-aware prioritization.
We must move beyond the flat list of vulnerabilities and embrace a multidimensional view of risk—one that accounts for the business value of the asset, the likelihood of exploitation, and the potential for lateral movement within the cloud environment. By building a triage process that integrates these factors, we can eliminate the noise of alert fatigue and ensure that our engineering capacity is always directed toward the threats that pose the greatest risk to our organizations.
In the end, security is not about achieving a zero-vulnerability dashboard; it is about making it as difficult and expensive as possible for an attacker to achieve their goals. Context-aware prioritization is the most effective tool we have to achieve that mission while maintaining the velocity and morale of our engineering teams. When you stop treating everything as critical, you finally gain the clarity needed to secure what truly matters.
This content was generated by AI.