The most effective method to Migrate to Snowflake Like a Boss

Tell me if this rings a bell: your boss is convinced that Snowflake is that the way forward for data and informs your team that you got to migrate from your data warehouse to the present sparkling new solution quite anything you’ve ever needed in your lives. Isolating storage from computing will save your company a lot of money, and on top of that, your VP can generate fancy new dashboards for your CEO to trace.

Snowflake, a cloud data warehousing platform, makes it easy for data teams to store and use data. Unlike traditional storage solutions, Snowflake supports a plethora of knowledge types and business intelligence tools and makes it easy for internal and external teams to collaborate throughout the ETL pipeline. A database, Snowflake, also can support most structured and unstructured data types.

Like your VP, many of my customers are excited at the prospect of migrating to cloud storage and compute solution like Snowflake, but they don’t know where to start. Rightly so: I used to be ready to find several articles about migrating from Redshift to Snowflake, but little or no about making the polar plunge from other solutions.

Like a snowflake, no two data stacks are alike, each with its assets, complexities, and requirements. Snowflake makes it easy to manage and collaborate across a good array of databases and data types. Image courtesy of Aaron Burden on Unsplash
After lecturing several migrators within the field, I broke down some lesser-discussed considerations for teams moving to Snowflake, no matter where you’re starting from:

  1. Say goodbye to partitions and indexes.
    Unlike other data warehouses, Snowflake doesn’t support partitions or indexes. Instead, Snowflake automatically divides large tables into micro-partitions, which are wont to calculate statistics about the worth ranges each column contains. These insights then determine which parts of your data set you got to run your query.

For most practitioners, this paradigm shift from indexes to micro-partitions really shouldn’t be a problem (in fact, many of us prefer to migrate to Snowflake because this approach reduces query latency). Still, if you’ve got partitions and indexes in your current ecosystem and are migrating to “clustering” models, you would like a sound approach. A couple of tips for a secure migration:

Document current data schema and lineage. This may be important once you need to cross-reference your old data ecosystem together with your new one.
Analyze your current schema and lineage. Next, determine if this structure and its corresponding upstream sources and downstream consumers add up for a way you’ll be using the info once it’s migrated to Snowflake.
Select appropriate cluster keys. This may make sure the best query performance for your team’s access patterns.
Bidding farewell to partitions and indexes is nothing to lose. Stay over as long as you’ve got visibility into your data.

  1. Expect (and embrace) syntax issues.
    I spoke to several data teams repeatedly called out syntax issues as an inevitable component of any cloud warehouse migration, and migration to Snowflake is not any exception.

One data analyst specifically called out the problem of converting SSIS packages for handling ETL from her SQL Server to Snowflake, which admits that SSIS packages aren’t easily integrated with their solution. Such errors weren’t only frustrating but substantially bogged down her migration, resulting in unforeseen costs and resource constraints.

While modelling solutions like DBT help validate data sets, the formatting of functions like hashing, time stamps, and dates is often inconsistent between old and new versions.

Additionally, Snowflake is case sensitive, so you must check for comparison issues in queries. As a result of these issues, some companies can expect to examine and refactor ALL lines of SQL being migrated.

Syntax errors become a much bigger pain point for companies in traditional industries, like financial services or healthcare (ICD10 codes, I’m watching you) that have long relied on legacy solutions and manual, error-prone data input. Unfortunately, simply moving to the cloud won’t fix these issues. together data analyst at a public sector consulting company told me: “Even if you hire amazing people and put the simplest data dictionary ahead of them, they probably can’t tell you what it all means.”

The sooner you accept syntax errors as a neighbourhood of the method, the better it’s to spot trends and patterns in these inconsistencies, which will expedite their resolution.

  1. Monitor your data, constantly and sometimes.
    Similar to syntax errors, data issues can cause even the smoothest Snowflake migrations to fail, generating false or misleading analysis once you attach your business intelligence tools. Often these will end in silent errors, which will go unnoticed until a consumer downstream catches a problem during a report or dashboard. If you’re lucky, it’s an indoor user — and if you’re not, it’d just be that new important customer you onboarded only last week and try to impress.

Another analyst we spoke with at a digital marketing consultancy noted that it is often hard to make sure comprehensive data definition between your old and new data warehouses. After a couple of data errors popped up in her company’s new Snowflake warehouse, she decided to check the reliability of their data by evolving two parallel data analytics layers, one her legacy warehouse and one via Snowflake. Using Looker to get metrics for both stacks, they quickly determined that there have been, in fact, inconsistencies between the two warehouses, with each set of metrics presenting different data volumes.

When upgrading your data warehouse, confirm you’re also upgrading the way your team operates, from small things like syntax concurrency, all thanks to data quality and reliability.

You’ve invested such a lot during this migration (rightfully so!), it’d be silly to let it all attend waste if the info itself can’t be trusted.

Master your migration
If you progress from indexes and partitions, expect syntax issues, and prioritize data quality, you’ll achieve a more seamless Snowflake migration, facilitating easier collaboration and delivering true business value for your organization.

Moving to Snowflake means more flexibility and scalability for your team, also as quicker, more reliable insights for your customers — and if you are doing it right, it is often a force multiplier for your entire organization, too.

Learn More: https://www.jadeglobal.com/snowflake

Leave a comment