Product

SOC 2 Gap Analysis for Cloud Infrastructure: What to Check Before Your Audit Starts

Walk through how to run a gap analysis on cloud infrastructure, the most common misconfigurations that surface, how to prioritize what to fix first, and what clean infrastructure looks like going into an audit.

March 11, 202614 min read

Introduction

Cloud infrastructure moves fast. Services spin up, configurations drift, engineers make permission changes to unblock a deployment, and by the time a SOC 2 audit arrives, the environment you thought you had and the environment that actually exists are often meaningfully different.

That gap — between assumed posture and actual posture — is what a cloud infrastructure gap analysis is designed to expose.

Unlike a gap analysis focused on policies or organizational controls, a cloud infrastructure gap analysis is technical. It looks at what is actually deployed, how it is configured, and whether the configuration matches what your SOC 2 controls say it should. It surfaces the misconfigurations, permission sprawl, and logging failures that will show up in an auditor's findings if you do not find them first.

This guide walks through how to run one systematically: what to check, what typically surfaces, how to prioritize the findings you uncover, and what clean cloud infrastructure actually looks like going into an audit. Whether you are running on AWS, GCP, Azure, or a multi-cloud combination, the framework applies.

Why Cloud Infrastructure Deserves Its Own Gap Analysis

Traditional SOC 2 gap analysis frameworks were designed with on-premises environments in mind. Shared servers, physical data centers, network perimeters. Cloud infrastructure does not work the same way, and the security failure modes are different.

In cloud environments, the most common security problems are not sophisticated attacks — they are configuration mistakes. An S3 bucket set to public. An IAM role with administrator permissions assigned to a Lambda function that needs read access to one table. A security group with port 22 open to 0.0.0.0/0 because someone was troubleshooting in production and forgot to close it. A CloudTrail log that stopped sending to the central account three months ago and nobody noticed.

These misconfigurations are not hypothetical. The majority of cloud security incidents in recent years have involved some form of misconfiguration rather than a novel exploit — a pattern consistent across all major cloud providers. And they are exactly what SOC 2 auditors look for, because they represent the gap between the security your organization says it operates and the security it actually operates.

A cloud-specific gap analysis gives you the chance to find those issues on your own terms, before an auditor, a security researcher, or a threat actor finds them for you.

Before You Start: Define the Scope

A gap analysis without a defined scope is just exploration. Useful exploration, perhaps, but not something you can build an audit remediation plan around.

Before running a single check, answer these questions.

Which cloud accounts and environments are in scope for the SOC 2 audit?

Most organizations separate production from development and staging environments. SOC 2 scope typically covers production — the environment that processes, stores, or transmits customer data. Confirm with your auditor which accounts are in scope. Shadow production environments, recently migrated services, and accounts connected to production via cross-account roles are common scope expansion sources that teams miss.

Which Trust Service Criteria are you being audited against?

The Security criterion is mandatory. If you have added Availability, you need to check infrastructure redundancy and failover configurations. If you have added Confidentiality, data classification and access to confidential data in cloud storage becomes relevant. Your gap analysis scope follows your audit scope.

Do you have an existing control set documented?

Your gap analysis is fundamentally a comparison between documented controls and actual configuration. If your controls say "all S3 buckets containing customer data will have Block Public Access enabled," your gap analysis checks whether that is true. Without documented controls, you are doing a security assessment rather than a compliance gap analysis — valuable, but a different exercise.

How to Run a Cloud Infrastructure Gap Analysis: The Systematic Approach

Layer 1: Identity and Access Management

IAM is where cloud security fails most visibly. It is also where the most SOC 2 control failures originate. Start here.

Inventory all principals with access to in-scope accounts. This includes human users, service accounts, roles, and cross-account trust relationships. Many organizations discover that the number of principals with production access is significantly larger than anyone expected — because access was granted for specific tasks and never revoked.

Check for root account usage. AWS root account, GCP primitive owner roles, and equivalent superuser principals should not be used for routine operations and should have no active access keys. Auditors check for this specifically. Verify that root account usage is logged and that there is a documented process for its emergency use only.

Review IAM policies for least privilege violations. The most common finding in this category is overly broad permissions — policies using wildcards (*) on actions or resources when specific permissions would suffice. Automated tools including AWS IAM Access Analyzer, Google Cloud Policy Analyzer, and third-party CSPM platforms can surface these violations efficiently. Pay particular attention to policies that grant write or delete permissions on data stores to services that should only need read access.

Check service account key age and usage. Long-lived service account credentials that have not been rotated — or are no longer being used at all — are a consistent finding. Export your service account key inventory and flag any key older than 90 days that does not have an automated rotation mechanism.

Review cross-account and federated access configurations. In multi-account AWS Organizations setups or GCP org hierarchies, cross-account roles and organization-level policies create access paths that are less visible than account-level IAM. Map these explicitly. Unexpected trust relationships between accounts are a significant finding.

Verify MFA enforcement. MFA should be enforced for all human users with console access to in-scope accounts. Check your identity provider configuration and your cloud provider's IAM settings to confirm MFA is required — not just available. In AWS, this is enforced via IAM policy conditions. In GCP and Azure, it is typically enforced at the identity provider level via conditional access policies.

Layer 2: Network Configuration

Audit security groups and firewall rules. Export all security groups or VPC firewall rules for in-scope environments and look for rules that allow unrestricted inbound access (0.0.0.0/0 or ::/0) on sensitive ports: 22 (SSH), 3389 (RDP), database ports (3306, 5432, 1433, 27017), and any management interfaces. These rules are among the most commonly flagged findings in cloud infrastructure audits.

Check for public-facing resources that should be private. This includes databases with public endpoints enabled, admin interfaces accessible from the internet, and Kubernetes API servers without network-level access restrictions. Use your cloud provider's network topology tools — AWS VPC Reachability Analyzer, GCP Network Intelligence Center — to verify actual network paths rather than relying on policy review alone.

Verify network segmentation between environments. Production and non-production environments should have network-level separation, not just logical separation via naming conventions. Check that VPC peering, Transit Gateway, or equivalent connections between environments have appropriately restrictive route tables and security group rules.

Review load balancer and API gateway configurations. Public-facing load balancers and API gateways should have WAF policies attached. Check that TLS 1.2 or higher is enforced and that legacy protocol support (TLS 1.0, TLS 1.1, unencrypted HTTP) is disabled for all customer-facing endpoints.

Layer 3: Data Storage Configuration

Audit object storage access controls. For AWS S3, check Block Public Access settings at both the account level and individual bucket level. Account-level Block Public Access does not override bucket policies that explicitly grant public access — both layers need to be verified. For GCP Cloud Storage and Azure Blob Storage, check equivalent public access settings and IAM bindings.

Verify encryption configuration. All in-scope storage — object storage, managed databases, disk volumes, message queues — should use encryption at rest. For managed databases (RDS, Cloud SQL, Azure SQL), verify that storage encryption is enabled and that the encryption key is managed appropriately. Customer-managed keys (CMK) versus provider-managed keys is a decision your security policy should specify, and the actual configuration should match.

Check database network exposure. Managed database instances should not have publicly accessible endpoints unless there is a documented and justified exception. Even with network-level controls, public endpoints expand the attack surface and are a consistent audit finding.

Review data lifecycle policies. Buckets and databases containing customer data should have lifecycle policies that enforce your documented retention schedule. Data that should be deleted after 90 days that is still retained at 18 months is a compliance gap, even if it is not a technical vulnerability.

Layer 4: Logging and Monitoring

Verify that audit logging is enabled and complete. In AWS, this means CloudTrail enabled in all regions with a multi-region trail, logging management events and S3 data events for buckets containing customer data. In GCP, Cloud Audit Logs — Admin Activity, Data Access, and System Event logs — should be enabled for all in-scope services. In Azure, Diagnostic Settings must be configured per resource to capture activity logs.

Check that logs are being delivered to a centralized destination. Logs enabled at the resource level but not forwarded to a SIEM or central log management platform are effectively invisible. Verify that the delivery pipeline — CloudWatch to S3, Cloud Logging to BigQuery, Azure Monitor to Log Analytics — is active and that the destination is receiving logs.

Audit log retention configuration. Log retention in cloud-native services defaults are often shorter than SOC 2 requires. CloudWatch log groups default to never expiring, but many organizations configure short retention windows for cost reasons and then extend them before an audit. Check that every in-scope log group or logging sink has a retention period of at least 12 months.

Test alert configurations. Verify that alerts are configured for the events your control documentation says you monitor: root account usage, IAM policy changes, security group modifications, failed authentication attempts, CloudTrail disablement. Run test events where possible to confirm alerts fire and reach their intended recipients.

Layer 5: Change Management and Infrastructure as Code

Review deployment pipeline configurations. Code that goes directly to production without passing through a defined deployment pipeline is a change management gap. Map your deployment paths and identify any that bypass your standard approval and testing process.

Check infrastructure as code for security misconfigurations. Terraform, CloudFormation, Pulumi, and similar IaC tools can encode misconfigurations that then get deployed consistently across environments. Static analysis tools — Checkov, tfsec, Terrascan — can scan your IaC repository and surface policy violations before they reach production.

Verify branch protection and approval requirements. For repositories containing IaC or application code that deploys to in-scope environments, check that branch protection rules require at least one reviewer approval for changes to main or production branches. Repositories where a single engineer can merge and deploy to production without review are a change management control gap.

The Most Common Misconfigurations That Surface

After running this analysis across an organization's cloud environment, certain findings appear with striking regularity. These are not theoretical risks — they are configurations that real audit programs discover in real environments.

Overly permissive IAM roles assigned to compute resources. EC2 instances, Lambda functions, and GKE workloads frequently carry IAM roles with far broader permissions than their workload requires. An application that writes logs to one S3 bucket should not have a role that grants write access to all S3 buckets in the account. This is the most common IAM finding and one of the most consequential.

CloudTrail or equivalent logging disabled in one or more regions. Global CloudTrail configurations cover all regions by default, but older configurations or manual changes can create gaps. A region where logging is disabled is a blind spot in your audit trail — and auditors specifically check for this.

Security groups with unrestricted outbound rules. Inbound restrictions are more commonly configured correctly than outbound. Unrestricted outbound from production compute resources makes data exfiltration easier and is a finding in most thorough assessments.

Secrets stored in environment variables or code repositories. Database credentials, API keys, and service account tokens stored as plaintext environment variables or committed to source code are a persistent finding. Secret management services — AWS Secrets Manager, GCP Secret Manager, Azure Key Vault — exist to solve this problem. Many environments have these services available but underused.

Publicly accessible snapshots. EBS snapshots, RDS snapshots, and equivalent database backups shared publicly or with all AWS accounts are a significant data exposure risk and a consistent audit finding.

Missing or misconfigured WAF. Public-facing applications and APIs frequently lack WAF protection, or have WAF attached but in detection-only mode rather than blocking mode. This is a particularly common finding for organizations that added WAF as a checkbox without tuning the rule sets.

Unencrypted data in non-primary storage. Primary databases are typically encrypted. Secondary storage — log archives, data exports, analytical copies in data lakes — often is not. Customer data that lives in an unencrypted S3 bucket because it was exported for a one-time analysis job and never cleaned up is a real finding with real compliance implications.

How to Prioritize What to Fix First

A thorough cloud infrastructure gap analysis will produce more findings than can be remediated simultaneously. Prioritization is not optional — it is the difference between a managed remediation effort and a chaotic one.

Critical — Remediate Immediately

These are findings that represent either an active data exposure risk or a fundamental control failure that the audit cannot proceed without addressing.

Examples: Publicly accessible storage containing customer data, database instances with public endpoints and no network-level access restrictions, disabled audit logging for in-scope accounts, root account access keys that are active and unused, and secrets stored in version control.

These findings should enter an emergency remediation track independent of your normal sprint cycle. They are not features competing for engineering time — they are active risks that need immediate resolution.

High — Remediate Before Observation Period Ends

These are findings that represent genuine control gaps that an auditor will flag but do not represent an immediate data exposure.

Examples: Overly permissive IAM roles on production compute, security groups with unrestricted inbound access on non-standard ports, MFA not enforced for specific user classes, log retention windows below your policy requirement, and missing WAF on public-facing endpoints.

These findings should be assigned to owners with defined remediation deadlines and tracked weekly until resolved.

Medium — Remediate During Observation Period with Documentation

These are findings that represent configuration drift, best practice gaps, or process-level issues rather than control failures.

Examples: Service account keys not rotated within policy window, IaC code with policy violations that have not yet been deployed, missing tags on resources that complicate data classification, and alerting rules that exist but have not been tested.

Document these in your gap register with remediation plans and owners. Demonstrate progress when the auditor requests it.

Low — Track and Schedule

These are informational findings and hardening recommendations that improve security posture but do not represent SOC 2 control failures.

Schedule these through your normal security backlog process. Document that you are aware of them.

What Clean Cloud Infrastructure Looks Like Going Into an Audit

Understanding what you are working toward is as important as knowing what to look for. Clean cloud infrastructure for a SOC 2 audit is not perfect infrastructure — it is infrastructure that consistently reflects the controls documented in your security program.

Specifically, it looks like this.

IAM policies are based on least privilege principles, with no wildcard permissions on production resources. All human users have MFA enforced. Service account key ages are within policy. Root accounts have no active keys and MFA is enforced.

Network configurations allow only documented traffic flows. No unrestricted inbound access on sensitive ports. Public-facing resources have WAF attached in blocking mode. Database instances have no public endpoints.

All data storage uses encryption at rest with key management matching your documented policy. No publicly accessible object storage buckets. Snapshots are private and covered by access controls consistent with the source data.

Audit logging is enabled in all in-scope regions and services, delivered to a central logging platform, and retained for at least 12 months. Alerts are configured, tested, and reaching their recipients. Log access itself is logged.

All production changes flow through a documented approval process. IaC repositories have branch protection enabled. Deployment pipelines enforce approval gates before production changes.

Secrets live in secret management services, not environment variables or code repositories.

That is the baseline. It is achievable, it is verifiable, and it is what a clean audit entry looks like.

Tooling That Helps

Running a cloud infrastructure gap analysis manually across a large environment is not realistic. These tools make the process systematic.

Cloud Security Posture Management (CSPM) platforms — Wiz, Orca Security, Lacework, and similar tools continuously scan cloud environments and map findings to compliance frameworks including SOC 2. They provide the fastest path to a comprehensive misconfiguration inventory.

Native cloud tools — AWS Security Hub with the CIS AWS Foundations Benchmark, Google Cloud Security Command Center, and Microsoft Defender for Cloud provide compliance-mapped findings without requiring third-party platforms. They are a reasonable starting point for organizations not yet using a dedicated CSPM.

IaC scanning tools — Checkov, tfsec, and Terrascan catch misconfigurations in Terraform, CloudFormation, and Kubernetes manifests before they are deployed.

IAM analysis tools — AWS IAM Access Analyzer, Google Cloud Policy Analyzer, and tools like Ermetic or Cloudsplaining surface effective permissions that are difficult to evaluate from policy documents alone.

No single tool covers everything. A combination of CSPM for runtime configuration, IaC scanning for pre-deployment review, and IAM analysis for permission evaluation covers the major categories.

Conclusion

A SOC 2 audit does not assess your intentions. It assesses your configuration. The infrastructure that exists on the day the auditor begins testing is the infrastructure your organization is judged on — regardless of what the security policy says it should look like.

A cloud infrastructure gap analysis run before that audit begins gives you the control over that assessment that you otherwise surrender to the audit process. You find the overly permissive roles before the auditor does. You close the logging gaps before they become evidence failures. You fix the public S3 buckets before they become a finding in a report that your customers will read.

The infrastructure that looks clean going into an audit is not infrastructure that was always perfect. It is infrastructure whose gaps were found early, prioritized deliberately, and fixed systematically. That is the outcome a thorough gap analysis makes possible.