Using Database Sharding to Enforce Data Residency
How to use database sharding strategies to keep personal data within required geographic boundaries.
What Is Geographic Database Sharding?
Database sharding is the practice of splitting a large database into smaller, independent pieces called shards. Geographic sharding takes this further by partitioning data based on the geographic region it belongs to, with each shard deployed in the corresponding jurisdiction.
This approach enforces data residency at the database level: EU customer data lives in an EU shard, Australian customer data lives in an Australian shard, and so on. The database architecture itself becomes a compliance control.
Why Sharding for Residency?
Advantages Over Application-Level Controls
| Approach | Enforcement Level | Risk of Violation |
|---|---|---|
| Application logic only | Application code | High (bugs, misconfigurations) |
| Network policies | Infrastructure | Medium (can be overridden) |
| Database sharding | Data storage | Low (data physically cannot leave the shard) |
| Combined approach | Multiple layers | Lowest |
When data residency is enforced at the database level, personal data physically resides in the correct jurisdiction. No application bug can accidentally serve German customer data from a US server because the data simply is not there.
Sharding Strategies for Residency
Strategy 1: Region-Based Hash Sharding
Assign each record to a shard based on the data subject's region:
Shard key: Country code or region identifier of the data subject
Shard map:
- Shard EU-1 (Frankfurt): Records where region = DE, AT, CH
- Shard EU-2 (Dublin): Records where region = IE, GB, NL, BE
- Shard EU-3 (Paris): Records where region = FR, ES, IT, PT
- Shard US-1 (Virginia): Records where region = US
- Shard APAC-1 (Sydney): Records where region = AU, NZ
Best for: Organizations with clear geographic segmentation of customers or users.
Strategy 2: Jurisdiction-Based Sharding
Shard based on the legal jurisdiction that applies, rather than physical location:
Shard key: Applicable jurisdiction
Shard map:
- Shard GDPR (eu-central-1): All data subject to GDPR regardless of specific country
- Shard LGPD (sa-east-1): All data subject to Brazil's LGPD
- Shard CCPA (us-west-2): All data subject to California's CCPA
- Shard DEFAULT (us-east-1): All other data
Best for: Organizations that need to comply with specific regulatory frameworks rather than country-level residency laws.
Strategy 3: Tenant-Based Sharding
For multi-tenant applications, shard by tenant with each tenant assigned to a region:
Shard key: Tenant ID
Tenant-to-shard mapping:
- Tenant A (German enterprise) -> Shard EU-Frankfurt
- Tenant B (US startup) -> Shard US-Virginia
- Tenant C (Australian company) -> Shard APAC-Sydney
Best for: B2B SaaS applications where each customer has specific residency requirements.
Implementation Guide
Step 1: Define Your Shard Topology
Map your residency requirements to physical database deployments:
- List all jurisdictions where data must remain
- Identify available database regions for your chosen database technology
- Design the shard map linking jurisdictions to database instances
- Plan for future jurisdictions (how will you add a new shard?)
Step 2: Choose Your Sharding Technology
Different databases offer different sharding capabilities:
| Database | Native Sharding | Geographic Control |
|---|---|---|
| PostgreSQL (Citus) | Yes | Manual shard placement by node location |
| MongoDB | Yes (zone sharding) | Tag-based zone assignment to specific regions |
| CockroachDB | Yes | Partition-by-region with regional survival goals |
| YugabyteDB | Yes | Tablespace-level geographic placement |
| MySQL (Vitess) | Yes | Shard-to-region mapping via topology |
| Cloud Spanner | Yes | Instance-level regional configuration |
Step 3: Implement the Shard Router
Build a routing layer that directs queries to the correct shard:
Key components:
- Shard resolver: Determines which shard holds the requested data based on the shard key
- Connection pool: Maintains connections to all shard instances
- Query router: Sends queries to the appropriate shard
- Cross-shard query handler: Manages queries that span multiple shards (with careful attention to residency implications)
Important: Cross-shard queries that combine personal data from multiple jurisdictions may create compliance issues. Limit cross-shard operations to non-personal data or aggregated/anonymized results.
Step 4: Handle Data Migration Between Shards
When a data subject's residency status changes (for example, a customer moves from Germany to Australia), you need a migration process:
- Identify all records belonging to the data subject in the source shard
- Transfer records to the destination shard
- Verify completeness of the transfer
- Delete records from the source shard
- Update the shard routing map
- Log the migration for audit purposes
Step 5: Manage Global Reference Data
Some data is needed across all regions but is not personal data:
- Product catalogs
- Configuration settings
- Currency and exchange rates
- Generic templates
Replicate this non-personal reference data to all shards without residency concerns. Keep a clear separation between personal data (shard-local) and reference data (globally replicated).
Step 6: Address Backup and Replication
Ensure that shard-level backups and replication respect residency:
- Configure backups for each shard within the same region
- If using read replicas, place them within the same jurisdiction as the primary shard
- Disable cross-region replication for personal data shards
- Test backup restoration to confirm data stays in the correct region
Operational Considerations
Monitoring
Monitor each shard independently and as a fleet:
- Shard-level performance metrics (latency, throughput, storage)
- Data distribution across shards (balance)
- Cross-shard query frequency and performance
- Replication lag within each shard
Scaling
Geographic shards may have uneven load:
- The EU shard may handle 60% of total traffic
- The APAC shard may handle 5%
- Scale each shard independently based on its workload
- Do not balance load by routing data to less-busy shards in other regions
Schema Changes
Rolling out schema changes across geographic shards requires coordination:
- Plan for per-shard migrations that may execute at different times
- Ensure application compatibility during the migration window
- Test migrations against each shard's data characteristics
Common Sharding Mistakes
- Choosing the wrong shard key: A shard key that does not cleanly map to jurisdictions creates data that cannot be cleanly assigned to a region
- Cross-shard joins on personal data: Joining data across shards to produce reports that combine personal data from multiple jurisdictions
- Ignoring the routing layer: A routing misconfiguration can send data to the wrong shard, violating residency
- Uneven shard distribution: All data ending up in one shard defeats the purpose and creates performance issues
- Neglecting shard-level backups: Backing up all shards to a central location undermines geographic isolation
When Sharding Is Not the Right Answer
Geographic sharding is powerful but adds significant complexity. It may not be the right approach if:
- Your data volumes are small and a single-region database suffices
- You only need to comply with one jurisdiction's residency requirements
- Your application does not require the performance benefits of sharding
- Your team lacks the operational expertise to manage a sharded database
For document hosting specifically, using a purpose-built platform like GlobalDataShield can be simpler than implementing geographic sharding yourself. GlobalDataShield handles the geographic data placement, replication controls, and residency enforcement at the infrastructure level, letting you focus on your application logic rather than database topology.
Ready to Solve Data Residency?
Get started with GlobalDataShield - compliant document hosting, ready when you are.