Product

SOC 2 Backup and Disaster Recovery Requirements: What Auditors Check Under A1.2 and A1.3

Master SOC 2 A1.2 and A1.3 requirements. Learn how to prepare backup evidence, DR plans, and restoration tests to pass your audit without a dedicated infra team

April 27, 202613 min read

Introduction

For most senior engineers and tech leads, the phrase "SOC 2 audit" often conjures images of endless spreadsheets, tedious evidence collection, and bureaucratic hurdles that seem divorced from the reality of shipping code. However, when we look specifically at the Availability Trust Services Criteria (TSC)—specifically sections A1.2 and A1.3—the requirements transition from abstract policy to concrete systems engineering. These criteria focus on the bedrock of system reliability: backup management and disaster recovery (DR).

In a modern cloud-native environment, achieving SOC 2 compliance isn't just about passing an audit; it is about building a resilient architecture that can survive a regional outage, a ransomware attack, or a catastrophic human error. Sections A1.2 and A1.3 are the technical guardrails that ensure your organization doesn't just claim to be "high availability" but has the infrastructure, documentation, and proven testing to back it up.

Auditors today are increasingly technical. They are no longer satisfied with a PDF policy that says "we do backups." They want to see the cron jobs, the cross-region replication configurations, the Terraform scripts, and the restoration logs. They want to see that your Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) are not just arbitrary numbers in a document but are reflected in your monitoring dashboards.

This post will break down the technical nuances of A1.2 and A1.3, explore the specific evidence auditors look for, and provide a roadmap for building a compliant backup and DR program that scales without requiring a massive, dedicated infrastructure team. For a broader look at the overall compliance landscape, you can refer to Rectify Cloud’s guide on SOC 2 compliance.

Understanding CC A1.2: Backup and Recovery Infrastructure

The Common Criteria (CC) A1.2 focuses on the identification, protection, and maintenance of the resources required to meet the entity’s availability commitments. In simpler terms: Do you have the right stuff backed up, is it stored safely, and is it separated from your primary production environment?

Identifying Critical Assets

The first step in A1.2 compliance is data classification. An auditor will ask how you determined what needs to be backed up. From an engineering perspective, this means maintaining an up-to-date service catalog. You must include:

Production Databases: Relational (RDS, Cloud SQL) and NoSQL (DynamoDB, MongoDB).
Object Storage: S3 buckets or Azure Blob Storage containing customer-uploaded content.
Configuration and Code: While GitHub/GitLab often handle code, your CI/CD pipelines, secrets (AWS Secrets Manager/HashiCorp Vault), and Infrastructure as Code (IaC) state files must be part of the backup strategy.
System Images: If you are running legacy VMs or specific container registries, these images are critical for recovery.

Off-site and Remote Storage

A1.2 explicitly requires backups to be stored in a remote location. In the cloud era, "remote" doesn't mean a different floor of the data center; it means a different geographic region or at least a separate, isolated account. If your production environment is in us-east-1, your backups should ideally reside in us-west-2 or eu-central-1.

Auditors look for "Air-Gapped" or "Immutable" backups. In AWS, this might involve using S3 Object Lock or AWS Backup Vault Lock. This prevents a compromised administrative credential from deleting both the production data and the backups—a common tactic in modern ransomware attacks.

Automation vs. Manual Processes

One of the fastest ways to fail a SOC 2 audit under A1.2 is to rely on manual backups. If your process involves a senior engineer running a pg_dump every Friday and uploading it to a folder, you will receive a qualified opinion. Auditors look for scheduled, automated processes with alerting for failures.

Deep Dive into CC A1.3: Testing and Business Continuity

While A1.2 is about the existence of backups, A1.3 is about their utility. This criterion requires that the entity tests its recovery plans and backup data to ensure they are actually capable of meeting the organization's availability objectives.

The Business Continuity Plan (BCP)

A BCP is a high-level document that outlines how the business functions during a disaster. For tech leads, the more relevant subset is the Disaster Recovery Plan (DRP). The DRP must be a "living" document—not a static PDF that hasn't been updated since the company had three employees. It should include:

Roles and Responsibilities: Who is the incident commander? Who has the authority to declare a disaster?
Communication Channels: How do engineers communicate if Slack or Microsoft Teams is down? (e.g., a dedicated Signal group or an out-of-band Zoom link).
Step-by-Step Recovery Procedures: If the primary database is corrupted, what are the exact CLI commands or console steps to promote a replica or restore from a point-in-time snapshot?

Restoration Testing

This is where most organizations struggle. SOC 2 requires proof of at least one successful restoration test during the audit period. A "successful" test means more than just seeing a "Restore Complete" message. It means:

Restoring the data to a non-production environment.
Verifying the integrity of the data (e.g., running checksums or spot-checking records).
Ensuring the application can actually connect to and use the restored data.
Documenting the time it took to complete the restore to see if it aligns with your RTO.

Regularity of Testing

Auditors expect a cadence. While a full-scale regional failover test might happen once a year, automated restoration testing of individual database snapshots should ideally happen more frequently. Modern tooling allows for "automated recovery testing," where a script restores a snapshot to a temporary instance, runs a validation query, and then tears the instance down, logging the result for the auditor.

What Auditors Actually Check: The Evidence List

When the audit window opens, the auditor will request a specific set of evidence for A1.2 and A1.3. As a senior engineer, providing this in a clean, organized manner will significantly speed up the process.

1. The Backup Policy

This document defines the "what, when, and how" of your backups. It should specify:

Retention periods (e.g., daily backups kept for 30 days, monthly for 1 year).
Encryption requirements (backups must be encrypted at rest and in transit).
Ownership (who is responsible for the backup system).

2. Configuration Evidence (Screenshots or Code)

Auditors will want to see the settings in your cloud console. For example:

Screenshots of AWS Backup plans showing the schedule and retention.
Terraform or CloudFormation templates that define your backup infrastructure.
IAM policies showing that the backup storage has restricted access.

3. Backup Logs and Success Rates

You need to provide a population of backup logs for the entire audit period (usually 6 or 12 months). The auditor will likely pick a random sample of days and ask for proof that the backups ran successfully on those specific dates. If you have "red" days in your logs, you must show the corresponding ticket where the failure was investigated and remediated.

4. Restoration Test Documentation

This is a formal report. It doesn't need to be 50 pages, but it should include:

Date of the test.
The person who performed it.
The specific resource being restored (e.g., prod-db-cluster-2023-10-12).
The result (Pass/Fail).
The actual time taken vs. the target RTO.

Defining RTO and RPO: The Engineering Perspective

In the context of SOC 2, RTO and RPO are not just buzzwords; they are the metrics against which your availability is measured.

Recovery Time Objective (RTO): The maximum tolerable duration of downtime. If your RTO is 4 hours, your DR plan must be executable within that window.
Recovery Point Objective (RPO): The maximum tolerable amount of data loss, measured in time. If your RPO is 1 hour, you must take backups at least every hour (or use continuous replication).

For senior engineers, the challenge is aligning these business requirements with technical reality. A 15-minute RTO for a multi-terabyte database is impossible if you are relying on standard S3 restores. You would instead need a "Hot Standby" or "Pilot Light" architecture where a standby database is already running in a secondary region.

Example: Mapping RTO/RPO to Architecture

Tier	RPO Target	RTO Target	Technical Solution
Critical	< 1 minute	< 15 minutes	Multi-region Active/Active or Hot Standby
High	< 1 hour	< 4 hours	Cross-region snapshots with automated failover scripts
Standard	< 24 hours	< 24 hours	Daily backups with manual restoration procedures

Implementing Automated Backups with IaC

To satisfy auditors and ensure reliability, your backup infrastructure should be defined as code. This provides a clear audit trail of who changed the backup settings and when. Below is an example of an AWS Backup selection and plan defined in a JSON-like structure (similar to what you would see in CloudFormation or Terraform provider configurations).

{
  "BackupPlan": {
    "BackupPlanName": "SOC2-Compliant-Backup-Plan",
    "Rules": [
      {
        "RuleName": "DailyBackups",
        "TargetBackupVaultName": "ProductionVault",
        "ScheduleExpression": "cron(0 5 * * ? *)",
        "StartWindowMinutes": 60,
        "CompletionWindowMinutes": 180,
        "Lifecycle": {
          "DeleteAfterDays": 35
        },
        "CopyActions": [
          {
            "DestinationBackupVaultArn": "arn:aws:backup:us-west-2:123456789012:vault:DR-Vault",
            "Lifecycle": {
              "DeleteAfterDays": 35
            }
          }
        ]
      }
    ]
  },
  "BackupSelection": {
    "SelectionName": "CriticalResources",
    "IamRoleArn": "arn:aws:iam::123456789012:role/service-role/AWSBackupDefaultServiceRole",
    "ListOfTags": [
      {
        "ConditionType": "STRINGEQUALS",
        "ConditionKey": "backup-policy",
        "ConditionValue": "soc2-critical"
      }
    ]
  }
}

This configuration achieves several SOC 2 goals simultaneously:

Automation: It runs on a cron schedule.
Retention: It explicitly defines a 35-day lifecycle.
Remote Storage: It uses CopyActions to replicate the backup to a different region (us-west-2).
Identification: It uses tag-based selection, ensuring that any new resource tagged with backup-policy: soc2-critical is automatically included in the backup plan, reducing the risk of human error when launching new services.

Common Gaps and Pitfalls in A1.2 and A1.3

Even well-intentioned engineering teams often fall short during the audit. Here are the most common gaps:

1. The "Single Region" Trap

Many teams assume that because AWS or Azure is "the cloud," it is inherently indestructible. However, regional outages do happen. If your production data and your backups are in the same region, an auditor will flag this as a significant risk. You must demonstrate cross-region or cross-account redundancy.

2. Lack of "System Configuration" Backups

Engineers often focus solely on the database. But if your database is safe but your Kubernetes manifest files, Load Balancer configurations, and DNS settings are lost, your RTO will skyrocket. A1.2 requires "system configurations" to be backed up. Ensure your IaC repository is backed up and that you have a way to rebuild the network stack from scratch.

3. Monitoring the Monitors

What happens if your backup job fails? If the failure notification goes to a "no-reply" email or a Slack channel that no one checks, you have a gap. Auditors look for evidence that backup failures are treated as incidents. Integrate your backup alerts with PagerDuty or Opsgenie to ensure they are addressed immediately.

4. The "Success" Illusion

A backup job might report "Success" because it successfully created a file, but if that file is 0 bytes or corrupted, the backup is useless. Periodic integrity checks are vital. This is why A1.3 testing is so critical—it validates that the "Success" in the logs translates to "Restorable" in reality.

Building a BCDR Program Without a Dedicated Team

For startups and mid-sized companies, you don't need a 10-person SRE team to pass SOC 2. You can leverage managed services to do the heavy lifting.

Use Managed Backup Services

Instead of writing custom scripts to snapshot EBS volumes or dump SQL databases, use native services like AWS Backup, Azure Backup, or GCP Backup and DR. These services provide centralized management, built-in encryption, and standardized logging—all of which auditors love.

Implement Tag-Based Backups

Avoid manual lists of resources to back up. Implement a "Backup by Default" policy where every resource is automatically backed up unless it is explicitly tagged otherwise (e.g., backup: none). This ensures that as your infrastructure grows, your compliance coverage grows with it.

Automate the Evidence Collection

Tools like Vanta, Drata, or Tugboat Logic can integrate directly with your cloud provider's API. They can automatically pull the backup logs and configuration settings, saving you from taking hundreds of screenshots. This allows the engineering team to focus on building features rather than gathering evidence.

Conduct "Tabletop" Exercises

You don't always need to perform a full, destructive failover test to satisfy A1.3. A "Tabletop Exercise" is a valid form of testing where the engineering team sits down and walks through a disaster scenario step-by-step using the DRP. Document the minutes of this meeting, the gaps identified, and the follow-up actions. This proves to the auditor that the team is prepared and the plan is being actively reviewed.

The Role of Encryption and Access Control

While A1.2 and A1.3 are under the Availability category, they overlap significantly with Security. Your backups must be as secure as your production data.

Encryption at Rest: Ensure that all backup vaults are encrypted using KMS keys (AWS) or Key Vaults (Azure). Auditors will check for this.
Least Privilege: The IAM role used for backups should only have the permissions necessary to create and manage backups. Conversely, very few users should have the permission to delete a backup.
MFA Delete: For critical backup buckets (like S3), enable MFA Delete to prevent accidental or malicious deletion of the backup archives.

Conclusion

SOC 2 sections A1.2 and A1.3 represent the technical implementation of "trust." To a customer, your availability commitment is a promise that their data is safe and your service will be there when they need it. To an auditor, these sections are the evidence that you have considered the worst-case scenarios and built a system capable of weathering them.

By focusing on automation, cross-region redundancy, and rigorous testing, senior engineers can transform backup and disaster recovery from a compliance burden into a competitive advantage. A robust BCDR program reduces the "blast radius" of errors, provides a safety net for the engineering team, and ensures that a disaster remains a technical challenge to be solved rather than a business-ending event.

As you prepare for your audit, remember that the goal is not just to "pass" but to build a more resilient platform. Use the requirements of A1.2 and A1.3 as a blueprint for engineering excellence. When your backups are automated, your recovery is tested, and your documentation is clear, the audit becomes a simple validation of the high standards you've already set for your team. For more information on navigating the complexities of SOC 2 and other compliance frameworks, stay informed through resources like the Rectify Cloud blog.

This content was generated by AI.