AWS RDS Aurora (postgres engine) upgrade from 13.12 to 14.17

I'm trying to upgrade my engine version from 13.12 to 14.17. Below is my Terraform code:
resource "aws_rds_cluster" "aurora" {
cluster_identifier = local.env_config.rds_db_identifier
engine = local.env_config.rds_engine
engine_mode = local.env_config.rds_engine_mode
engine_version = "14.17"
allow_major_version_upgrade = true
apply_immediately = local.env_config.rds_apply_immediately
database_name = local.env_config.auroradb_name
master_username = local.env_config.auroradb_user
master_password = random_password.aurora_passwd.result
vpc_security_group_ids = [aws_security_group.aurora.id]
db_subnet_group_name = aws_db_subnet_group.aurora_group.id
copy_tags_to_snapshot = true
deletion_protection = false
enable_http_endpoint = true
preferred_maintenance_window = local.env_config.preferred_maintenance_window
backup_retention_period = 14
preferred_backup_window = "01:00-02:00"
skip_final_snapshot = false
final_snapshot_identifier = "fhir-dev-platform-cluster-final-pgadmin"
snapshot_identifier = "fhir-dev-platform-cluster-final-pgadmin"
storage_encrypted = true
serverlessv2_scaling_configuration {
max_capacity = 4
min_capacity = 2
}
tags = local.default_tags
timeouts {
create = "120m"
}
}
resource "aws_rds_cluster_instance" "aurora_provisioned_instance" {
cluster_identifier = aws_rds_cluster.aurora.id
instance_class = "db.serverless"
engine = local.env_config.rds_engine
engine_version = local.env_config.rds_engine_version
publicly_accessible = false
db_subnet_group_name = aws_db_subnet_group.aurora_group.id
apply_immediately = true
}
resource "aws_db_subnet_group" "aurora_group" {
subnet_ids = module.base_inf.vpc_public_subnets
tags = local.default_tags
}
resource "aws_security_group" "aurora" {
name = "fhir-dev-platform-sg-aurora"
vpc_id = module.base_inf.vpc_id
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = concat(module.base_inf.vpc_services_subnets_cidr_blocks, module.base_inf.vpc_public_subnets_cidr_blocks)
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = local.default_tags
}
The error I'm facing is snapshot name not found, which I'm assuming is happening because the final snapshot identifier is not able to create the snapshot in time, and when the snapshot identifier tries to restore the cluster, it fails. The problem is I can't do multiple deployments for this code.
Another way I tried was without using the snapshot identifier to restore the cluster, and it worked. Still, the DB password and cluster password got out of sync since the cluster password gets updated only when it is getting recreated because I'm using random_function to do so.
Could you please tell me how I can handle the upgrade in one go?
Answer
The error I'm facing is snapshot name not found, which I'm assuming is happening because the final snapshot identifier is not able to create the snapshot in time, and when the snapshot identifier tries to restore the cluster, it fails.
That's not the problem. It shouldn't be trying to create a snapshot at all. And certainly it is not creating a final snapshot, as that only happens when you are deleting the database cluster, not when you are upgrading it. If the RDS service needs to create a snapshot as part of the version upgrade process, it will do that automatically behind the scenes. It will not use your final_snapshot
settings for that process.
The problem is this line:
snapshot_identifier = "fhir-dev-platform-cluster-final-pgadmin"
That line tells it to create a new Aurora cluster from the snapshot with that name. That snapshot doesn't exist. Unless you are trying to create a new Aurora cluster from an existing snapshot, you should not be setting the snapshot_identifier
attribute at all.
Enjoyed this article?
Check out more content on our blog or follow us on social media.
Browse more articles