Register for "A Guide to Data Products: Everything You Need to Understand, Plan, and Implement" - Friday, May 31,1:00 p.m. ET

The Next Wave of Cloud Migrations Needs Data Streaming

COVID-19 steepened the cloud adoption curve this spring as Cloud Service Providers absorbed Super Bowl-like traffic spikes. Businesses went virtual, office workers became teleworkers and Zoom boomed.

In fact, Cloud adoption is one curve that will not flatten anytime soon, because COVID-19 unleashed several growth drivers. More business leaders now recognize cloud infrastructure and applications as methods of survival, ensuring business continuity if a resurgent COVID-19 disrupts future operations. Many also created virtual business models during the lockdown that expand their longer-term addressable markets. In addition, telework will persist as organizations recognize the benefits for employee safety, productivity and retention.

To support these changes, data teams will migrate an increasing portion of their on-premises operational and analytics workloads to the cloud. They can best meet budget and project requirements by using data streaming technologies such as change data capture (CDC), which replicates real-time updates between data source and target. While this might sound like boring plumbing, it creates transformational benefits for organizations as they move their businesses onto the cloud.

  • Uptime. Migrations typically entail the extraction, transformation and loading (ETL) of datasets such as database tables from on-premises source to cloud target. Operations often must continue on the source system while they transfer the tables. CDC technology captures and buffers incremental updates during this time, then applies them to the target upon completion of the transfer so that the target is fully synchronized with the source. At this point the data team re-points their application to the new cloud target without interrupting operations.
  • Bandwidth reductions. Organizations often must synchronize applications across hybrid or multi-cloud environments. By streaming incremental updates across the Wide Area Network (WAN), data teams eliminate the need to repeatedly replicate and transfer full loads – including old and new data – that consume bandwidth. This reduces cost and latency.
  • ScalabilityData streaming improves processing efficiency by reducing the CPU cycles and memory required to process and replicate a given set of data. This enables data teams to handle higher data volumes with the same resources.
  • Real-time insights. Data teams use CDC and open-source Apache Kafka streaming systems to extract and analyze data on a real-time basis. For example, they might offload analytics queries from an on-premises mainframe system by publishing its real-time source updates to a cloud analytics platform.

Let’s explore three primary use cases for cloud data streaming.

Zero-downtime data migration. As outlined earlier, streaming technologies such as CDC layer onto batch ETL transfers to enable zero-downtime migrations. In this case, data teams move databases, data warehouses and/or data lakes to their cloud-based counterpart, offered as Infrastructure as a Service (IaaS) by a Cloud Service Provider (CSP). They might change platform types during the move, for example moving from an Oracle database on-premises to Azure SQL DB in the cloud. This is typically a one-time event.

Zero Downtime Data Migration

Data synchronization. After migrating an analytics platform to the cloud, organizations frequently must synchronize the analytics dataset with an on-premises operational system such as a database. They use streaming, CDC in particular, to continuously update the cloud analytics platform with real-time source changes. Larger organizations might run distinct analytics workloads on distinct IaaS platforms, based on specialized workload requirements and the comparative advantages of a given CSP.

Data Synchronization

Analytics Workload Migration. Organizations also frequently migrate analytics platforms such as data warehouses or data lakes from their data centers to cloud IaaS. Streaming plays a similar role in these scenarios as in data migrations, helping maintain uptime by capturing incremental updates during the load transfer. As data teams learn and begin to specialize, they might relocate certain analytics workloads from one CSP to another.

As organizations emerge from lockdown, their business and data leaders must switch from crisis mode and start to plan the next 3-5 years. They need to peer through near-term uncertainties to identify the structural changes to their business models. In many industries, cloud offers strategic new benefits. To capitalize on this opportunity, data leaders and their teams need to embrace technologies such as data streaming and CDC that build real-time, agile and efficient data pipelines from the old world to the new.

To learn more about how to prepare for this next wave of cloud migrations, watch the webinar “The Why and How of Streaming-First Data Architectures” and download the accompanying white paper to the webinar. 

Kevin Petrie

Kevin is the VP of Research at BARC US, where he writes and speaks about the intersection of AI, analytics, and data management. For nearly three decades Kevin has deciphered...

More About Kevin Petrie