Monday, July 4, 2022
HomeBig DataResize Amazon Redshift from DC2 to RA3 with minimal or no downtime

Resize Amazon Redshift from DC2 to RA3 with minimal or no downtime

Amazon Redshift is a well-liked cloud information warehouse that means that you can course of exabytes of information throughout your information warehouse, operational database, and information lake utilizing customary SQL. Amazon Redshift presents totally different node sorts like DC2 (dense compute) and RA3, which you need to use on your totally different workloads and use instances. For extra details about the advantages of migrating from DS2 to RA3, discuss with Scale your cloud information warehouse and cut back prices with the brand new Amazon Redshift RA3 nodes with managed storage and Amazon Redshift Benchmarking: Comparability of RA3 vs. DS2 Occasion Sorts.

Many shoppers use DC2 nodes for his or her compute-intensive workloads. It’s pure to scale along with your rising workload, specifically separating compute from storage so that they’re right-sized as per your wants. RA3 nodes with managed storage allow you to optimize your information warehouse by scaling and paying for compute and managed storage independently. Amazon Redshift managed storage makes use of giant, high-performance SSDs in every RA3 node for quick native storage and Amazon S3 for longer-term sturdy storage. If the information in a node grows past the scale of the big native SSDs, Amazon Redshift managed storage mechanically offloads that information to Amazon S3. RA3 nodes preserve monitor of the frequency of entry for every information block and cache the most popular blocks. If the blocks aren’t cached, the big networking bandwidth and exact storing strategies return the information in sub-seconds. Additionally, if you happen to’re in search of options like cross-cluster information sharing and cross-Availability Zone cluster relocation, these are a number of of the explanations for migrating to RA3. Many shoppers on DC2 have benefitted from migrating to RA3 to serve their rising efficiency necessities and enterprise use instances.

As a primary step of the migration, we at all times suggest discovering the proper load of your system and figuring out the variety of RA3 nodes that can meet your workload and provide the finest cost-performance profit. For this analysis, you need to use the easy Replay instrument to conduct a what-if evaluation and consider how your workload performs in numerous situations. For instance, you need to use the instrument to benchmark your precise workload on a brand new occasion kind like RA3, consider a brand new characteristic, or assess totally different cluster configurations. To decide on the best cluster kind, you may examine totally different node sorts on your workload and select the best configuration of RA3 with the Easy Replay utility.

As soon as you understand the cluster kind and nodes, the following query is the best way to migrate your present workload to RA3 with minimal downtime or with out disrupting your present workload. On this submit, we describe an strategy to do that with minimal downtime.

Resizing an Amazon Redshift cluster

There are 3 ways to resize or migrate an Amazon Redshift cluster from DC2 to RA3 :

  • Elastic resize – If it’s obtainable as an choice, use elastic resize to vary the node kind, variety of nodes, or each. Notice that while you solely change the variety of nodes, the queries are briefly paused and connections are saved open. An elastic resize can take between 10–quarter-hour. Throughout a resize operation, the cluster is read-only.
  • Traditional resize – Use basic resize to vary the node kind, variety of nodes, or each. Select this selection while you’re resizing to a configuration that isn’t obtainable by elastic resize. A resize operation can take 2 hours or extra, or last as long as a number of days relying in your information measurement. In the course of the resize operation, the supply cluster is read-only.
  • Snapshot, restore, and resize – To maintain your cluster obtainable throughout a basic resize, make a duplicate of the prevailing cluster, then resize the brand new cluster. If information is written to the supply cluster after a snapshot is taken, the information have to be manually copied over after the migration is full.

Checkpoints for resize

When a cluster is resized utilizing elastic resize with the identical node kind, the operation doesn’t create a brand new cluster. In consequence, the operation completes shortly. In case of resize, there could possibly be a number of challenges inflicting the delay in resize:

  • Information volumes – The time required to finish a basic resize or a snapshot and restore operation may range, relying on elements just like the workload on the supply cluster, the quantity and quantity of tables being reworked, how evenly information is distributed throughout the compute nodes and slices, and the node configuration within the supply and goal clusters.
  • Snapshots – Automated snapshots are mechanically deleted when their retention interval expires, while you disable automated snapshots, or while you delete a cluster. If you wish to preserve an automatic snapshot, you may copy it to a guide snapshot. You may take a guide snapshot of the cluster earlier than the migration, which is used for resize operations, however it could not embrace reside information from the time the snapshot was captured.
  • Cluster unavailable throughout resize – It’s essential to know roughly how lengthy the resize will take. To take action, you may strive making a cluster from the snapshot in a take a look at account. Nevertheless, this solely provides a ballpark concept as a result of resize instances can range, particularly if you happen to intend to question your cluster through the resize. If the cluster is reside nearly on a regular basis with minimal or zero non-business hours, a resize generally is a problem as a result of the cluster can’t upsert reside information and serve learn requests on this information throughout this window.
  • Cluster endpoint retention – Elastic resize and cluster resize permit you to change the node kind, variety of nodes, or each, however the endpoint is retained. With snapshot resize, a brand new cluster endpoint is created, which can require a change in your software to exchange the endpoint.
  • Reconciliation – Validate the goal cluster information with the supply to verify migration was accomplished with out information loss and guarantee information high quality. Reconciliation on the desk stage isn’t adequate, it’s worthwhile to guarantee information have additionally been copied from the supply. You may run an identical report depend verify adopted by information validation utilizing checksum for accuracy of information.

Answer overview

The steps to arrange for migration are as follows:

  1. Take a snapshot of the prevailing manufacturing Amazon Redshift cluster working on DC2.
  2. Create one other Amazon Easy Storage Service (Amazon S3) bucket, the place AWS Glue writes the curated information in parallel.
  3. Use the snapshot to create an RA3 cluster.
  4. Configure AWS Database Migration Service (AWS DMS) to load information from the migrated bucket to Amazon S3.
  5. After you affirm that the information is synced between the 2 clusters (DC and RA3) and all different downstream functions, cease the DC cluster and alter the endpoint of your dependent downstream software to the newly created RA3 cluster.

Following is the present structure depicting a reside workload.

On this resolution, information comes from three supply programs and are written right into a uncooked S3 bucket:

  • Change information seize (CDC) from an RDS occasion through AWS DMS (1 within the previous diagram)
  • Occasions captured through an exterior API (2)
  • CSV information from an exterior supply copied to the uncooked bucket (3)

These sources don’t have a sample or an interval of pushing new information.

Each couple of minutes, the ingested information is picked up by an S3 occasion set off to run an AWS Glue workflow (4 within the previous diagram). It offers an orchestration layer to handle and run jobs and crawlers. This workflow features a crawler (5) that updates the metadata schema and partitions of the dataset to the AWS Glue Information Catalog. Then the crawler triggers an AWS Glue job that writes the curated information to the S3 curated bucket. From there, one other AWS Glue job uploads information into Amazon Redshift (6).

On this situation, in case your workload is essential and you’ll’t afford a protracted downtime, then it’s worthwhile to plan your migration accordingly.

Twin write and transient information curation pipeline

As a primary step of the migration, you want a parallel information course of pipeline because the AWS Glue job, which writes the information into the curated S3 bucket. Create one other S3 bucket and identify it migrated-curated-bucket and modify the AWS Glue rework job. You too can replicate one other rework job to write down information to a brand new reserve S3 bucket in parallel.

On this situation, reside information ingestion happens each half-hour. When an iteration of the extract, rework, and cargo (ETL) job is full, this triggers a guide snapshot of the Amazon Redshift cluster. After the snapshot is captured, a brand new Amazon Redshift cluster is created utilizing that snapshot. Cluster creation time can range relying on the snapshot quantity.

If snapshot creation takes greater than half-hour, then the ETL job must be stopped, and resume after the snapshot creation is full. For instance, if the ETL job is triggered at 8:00 AM and finishes at 8:10 AM, then snapshot creation begins at 8:10 AM. If it finishes by 8:30 AM (the following ETL job will run at 8:30 AM as per the half-hour interval), then the ETL course of continues in accordance with the schedule. In any other case, the job stops, and resumes after the snapshot completion.

Now we use the snapshot to launch a brand new RA3 redshift cluster. The method doesn’t pause the prevailing ETL pipeline, reasonably it begins writing curated information in parallel to the reserve S3 bucket. The next diagram illustrates this up to date workflow.

At this level, the prevailing cluster remains to be reside and continues to course of the reside workload. Even when creation of the Amazon Redshift cluster takes time (owing to the large quantity of information), it’s best to nonetheless be coated. The curated information within the S3 bucket acts as a staging reserve, and this information must be loaded into the RA3 cluster after its cluster is launched.

Backfill the brand new RA3 cluster with lacking information

After the RA3 cluster has been launched, it’s worthwhile to playback the captured reside information from the reserve S3 bucket to the newly created cluster. Playback is just all through the snapshot seize to the present timestamp. With this course of, you’re making an attempt to deliver the RA3 cluster in sync with the prevailing reside DC2 cluster.

It’s essential to configure an AWS DMS migration activity with the reserve S3 bucket because the supply endpoint and the newly created RA3 cluster because the goal endpoint.

AWS DMS captures ongoing adjustments to the goal information retailer. This course of is named ongoing replication or change information seize (CDC). AWS DMS makes use of this course of when replicating ongoing adjustments from a supply information retailer. This course of works by gathering adjustments to the database logs utilizing the database engine’s native API. The next diagram illustrates this workflow.

Reconciliation and cutover

Information reconciliation is the method of verification of information between supply and goal. On this course of, goal information is in contrast with supply information to make sure that the information is transferred utterly with none alterations. To make sure reliability within the pipeline and the information processed, it’s best to create an end-to-end reconciliation report. This report verifies the proportion of matching tables, columns, and information information. It additionally identifies lacking information, lacking values, incorrect values, badly formatted values, and duplicated information.

You may outline the reconciliation course of to verify whether or not each clusters are working in sync. For which you could create easy Python scripts or shell scripts to question the supply and goal clusters, fetch the outcomes, and examine.

Cutover is the ultimate step of migration, and includes switching the prevailing cluster with the newly launched cluster. At this level, the clusters are working in parallel. Subsequent, you validate that the downstream information consumption flows are updated. Confirm the reconciliation metrics from the DC2 and RA3 clusters such that desk updates are in sync.

You may preserve twin write whilst you change from the migration information pipeline. For those who uncover any points after reducing over, you may change again to the outdated information pipeline, which is the supply of fact till cutover. On this case, cutover includes updating the DC2 cluster endpoint to the brand new RA3 cluster endpoint within the software. Ensure to establish a comparatively quiet window throughout  the day to replace the endpoint. To maintain the identical endpoint on your functions and customers, you may rename the brand new RA3 cluster with the identical identify as the unique DC2 cluster. To rename the cluster, modify the cluster within the Amazon Redshift console or ModifyCluster API operation. For extra info, see Renaming clusters or ModifyCluster API operation within the Amazon Redshift API Reference.

Up thus far, AWS DMS is continuous to replace RA3. After you narrow over to RA3, the DC2 cluster is now not reside and you’ll cease the AWS DMS replication job to RA3. Pause the final snapshot. Delete the reserve S3 bucket and AWS DMS assets used for RA3 load.


On this submit, we offered an strategy emigrate an current Amazon Redshift cluster with minimal to no information loss, which additionally permits the cluster to serve each learn and write operations through the resize window. Elastic resize is a fast strategy to resize your cluster to take care of the identical variety of slices within the goal cluster. Slice mapping reduces the time required to resize a cluster. For those who select a resize configuration that isn’t obtainable on elastic resize, you may select basic resize or carry out a snapshot, restore, and resize.

To be taught extra about what’s new with RA3 situations, discuss with Amazon Redshift RA3 situations with managed storage. Amazon Redshift delivers higher worth efficiency and on the identical time helps you retain your prices predictable. Amazon Redshift Serverless mechanically provisions and scales the information warehouse capability to ship excessive efficiency for demanding and unpredictable workloads, and also you pay just for the assets you utilize. This offers higher flexibility to decide on both or each primarily based on customized necessities. After you’ve made your selection, strive the hands-on labs on Amazon Redshift.

Concerning the Authors

Soujanya Konka is a Options Architect and Analytics specialist at AWS, targeted on serving to clients construct their concepts on cloud. Experience in design and implementation of enterprise info programs and Information warehousing options. Earlier than becoming a member of AWS, Soujanya has had stints with corporations equivalent to HSBC, Cognizant.

Dipayan Sarkar is a Specialist Options Architect for Analytics at AWS, the place he helps clients to modernise their information platform utilizing AWS Analytics companies. He works with buyer to design and construct analytics options enabling enterprise to make data-driven choices.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments