Transparent Aurora Postgres Global DB failover for Consumers

Manish Singh
3 min readJun 11, 2021

--

Overview

In previous post on AWS Aurora I talked about how to prepare application to handle DB failover from writer to reader instance within a AWS region. In this post lets discuss how to handle failover of DB to secondary region minimising changes needed on Application side.

Global DB setup

Global DB is a disaster recovery mechanism in Aurora to recover from region wide failure of DB service in AWS. Sample setup is shown below

We have two DB clusters. Primary cluster is in region usw2 and and Secondary cluster is in use2. We can see in images below that primary and secondary clusters have different URLs to connect to them.

Challenges with directly using using cluster URL

Application needs to know the cluster URL so that it can connect to DB. If we directly connect to DB using the URL of primary cluster our application config will look like

spring.datasource.url = cluster1.cluster-cdgfetyeywok.us-west-2.rds.amazonaws.com

When we failover the Aurora to secondary region we will need to update the DB cluster url in the application so that it can connects to it.

spring.datasource.url = global-db-1-cluster-1.cluster-chbxjwv3veoe.us-east-2.rds.amazonaws.com

Every steps which we have to perform during DR failover adds to overall RTO and increases chances of error. Also in many cases DB URL change may require Application restart which further increases the downtime.

Abstracting DB failover using CNAME

We can setup a CNAME as shown below.

Application Datasource can be setup as

spring.datasource.url = db.mycompany.com

Advantages of CNAME

  1. At the time of DB failover just CNAME value needs to be changed. Nothing changes on Application side. Application will automatically connect to current active DB cluster pointed by the CNAME.
  2. Application get a nice readable name like db.mycompany.com instead of AWS generated cluster URL.

Disadvantages of CNAME

  1. Extra cost
  2. TTL for CNAME needs to be setup carefully. Lower value increases cost while higher value increase staleness of DB cluster URL on application side.

Conclusion

I find managing Aurora Global DB failures using CNAME is much easier operationally as it reduces number of steps during DR failover and reduces downtime compared to directly using DB cluster URLs.

--

--