Background 

A Fortune 500 CRM (Customer Relationship Management) and cloud solutions provider faced the complex challenge of exponential growth in integration, compute, and data warehousing costs caused by the rapid acceleration of data volumes across its data landscape. The organization needed to move its data and analytics from Snowflake to a custom-built Hive solution on the Amazon AWS (Amazon Web Services) platform to reduce costs and improve performance.  

Why Premier  

While the Client has significant in-house data analytics and cloud infrastructure skills, they needed data engineering expertise to complete this initiative in the desired time-frame. The client had previously worked with Premier on data governance and integration projects, and through that experience, knew Premier had the necessary skills, accelerators, and methods to carry out this project efficiently and effectively. This was validated by Premier's three decades of experience building complex data engineering solutions.

The Project: A Joint Effort Across Four Milestones  

Like many large transformative initiatives, this project simultaneously posed significant risk and value. To reduce project risk and maximize value along the way, the initiative was split into four distinct milestones. This modular approach enabled the Client to realize value throughout the initiative without disrupting current processes. It also enabled the Client and Premier to focus their energy on their respective strengths.  

Being a pioneer in cloud applications and data modeling, the Client owned the design and development of the new analytics platform. The Client leveraged   Premier’s data engineering solutions to minimize development time and maximize data pipeline throughput. While Premier upgraded the data pipelines that fed their analytics platforms, the Client focused on data models and cloud architectures.  

Milestone 1: Replace Jitterbit Integrations with AWS Glue  

The project's first milestone focused on replacing approximately 300 Jitterbit integrations that connected the Client's operational data to their primary analytics data stores in Snowflake and Redshift. To help keep this milestone on track, Premier used its integration design frameworks and reference library to reverse engineer the poorly documented legacy Jitterbit integrations and replicate them in AWS glue.  

Milestone 2: Design and Build the Pipelines for the Future State Data and Analytics Platform  

While the Client focused on designing and developing the Hive database in AWS infrastructure, Premier designed and built the future state integration framework and process. Collaborating closely with the Client, Premier enhanced the integrations from Milestone 1 to easily re-point to the new analytics warehouse during cut-over. Additionally, Premier and the Client found opportunities to rationalize and improve the performance of existing integrations. As part of the improvements, Premier increased pipeline efficiency by transitioning/mirroring the ETLs from AWS Glue to Apache Airflow.  

Milestone 3: Migrate from Snowflake to Hive  

With the new analytics platform operational, it was time to migrate the data and shut down Snowflake. Premier built a Snowflake to Hive pipeline to execute the migration using PySpark in Apache Airflow. This approach maximized throughput and minimized development time. To reduce downtime during the cut-over, Premier and the Client collaborated on a tight cut-over plan. The execution of the plan exceeded expectations, resulting in no downtime and an on-time go-live.  

Phase 4: Consolidate Data Silos

After the new analytics platform went live, the last step was consolidating and decommissioning additional data silos into the new analytics platform. Premier designed the pipelines and processes for this last step to enable the Client to self-execute the plan when ready. When the Client was ready to migrate, Premier  provided as-needed back-up to the Client.

Impact: Improved Data Pipelines, Improved Data Analytics, Lower Costs  

This complex initiative enabled long-term sustainable analytics capabilities for the Client. They have a pathway for more intelligent AI, sharper analytics, and data-driven decisions. The new data pipelines in Apache Airflow run at a lower cost and greater efficiency than the prior Jitterbit framework.