In the past, setting up a data warehouse meant having to purchase expensive hardware in order to run your own data centre. Snowflake is disrupting this old way of working by providing businesses with a data warehouse that is easier to use, faster, and more flexible.
What is the Snowflake Data Warehouse?
Snowflake is the name of a cloud-based data warehouse which is built on top of the Amazon Web Services (AWS) or Microsoft Azure cloud infrastructure. It allows businesses to store and analyse their data using various cloud-based tools.
It is also a fully relational ANSI (American National Standards Institute) SQL data warehouse which is offered as a Warehouse-as-a-Service (WaaS) as availability, data protection, infrastructure and optimisation are handled by the Snowflake Computing team.
There is no need to select, install, configure or manage any hardware or software if you opt for Snowflake – which makes it ideal for businesses that don’t have the resources to do this with in-house servers. They can also use an ETL solution such as Stitch to move data over to Snowflake.
What makes Snowflake unique?
Most traditional data warehouse models are based on package deals, so the user has little control over what they use and how they pay for it. Snowflake’s main unique selling point is that it separates the storage and computation functions, so you can scale both independently at a cost-effective price. This means as a business you only pay for the capacity and performance you use, with storage billed as terabytes stored per month and computation on a per-second basis, meaning you can scale up or down as needed.
What makes Snowflake architecture different?
Snowflake themselves explain their architecture as “a hybrid of shared-disk database architectures and shared-nothing database architectures.” They “use a central data repository for persisted data that is accessible from all compute nodes in the data warehouse.” Similar to shared-nothing architectures “Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally…offering the data management simplicity of a shared-disk architecture with the performance and scale-out benefits of a shared-nothing architecture.”
As the above illustration shows, the Snowflake Architecture is made up of three layers – Database storage, Query processing (compute) and Cloud services – each of which is scalable independently of the other layers.
Layer One: Database Storage Layer
This layer runs independently of query processing and holds all of the data loaded into Snowflake – both structured and unstructured. Snowflake will then automatically manage all aspects of how this data is stored, including organisation, file size, structure, compression, metadata and statistics.
Layer Two: Query Processing
The query processing layer is made up of a number of virtual warehouses which execute the data processing tasks which are associated with queries. Each virtual warehouse can access all of the data in the database storage layer but work independently of each other so that each warehouse does not compete for or share query processing resources. The beauty of this is that it enables query processing resources to be scaled while queries are running, without the need to rebalance or redistribute the data in the database storage layer.
Layer Three: Cloud Services
The cloud services layer uses ANSI SQL to coordinate the entire system, eliminating the need for manual data warehouse management and tuning. The services contained in this layer include:
- Access control
- Infrastructure management
- Metadata management
- Query parsing and optimization
The main benefits of Snowflake
Snowflake was built specifically for the cloud, and so was designed to address many of the inherent problems found in hardware-based data-warehouses – such as data transformation issues and limited scalability. The benefits that using Snowflake will bring to your business depends on the specific needs of your project, but could include:
- Availability and Security. As Snowflake is distributed across the availability zones of the platform which it runs on, whether it be AWS or Azure, it is designed to operate continuously. It can also tolerate network and component failures with minimal impact to customers. Security-wise it is SOC2 Type II certified, and additional levels of security are also available.
- Concurrency and Accessibility. Snowflake addresses the concurrency issues of too many queries competing for resources that were found in many traditional data warehouses, through the use of its unique multicluster architecture. This means that each virtual warehouse can scale up or down as required, and they will never affect the queries from each other. It also means that data analysts and data scientists can get what they need when they need it – without having to wait for other processing tasks to complete.
- Performance and Speed. Snowflake is cloud-based and so can draw on the elastic nature of the cloud to scale up your virtual warehouse, taking advantage of extra processing resources if you want to run a high volume of queries or load data faster. You can then scale down the virtual warehouse once the task is completed and only pay for the time that you used.
- Seamless Data Sharing. All of Snowflake’s users can share data between themselves easily. Snowflake’s set-up also allows users to share data with any data consumer – whether the consumer is a Snowflake user or not – through the creation of reader accounts which are created directly from the user interface.
- Storage and Support for Structured and Semi-Structured Data. Snowflake automatically optimises how data is stored and queried, meaning there is no need for conversion or transformation of structured and semi-structured into a fixed relational schema first.
If the world of data warehousing intrigues you and you would like to find out more about your next career move – or you want to strengthen your existing data warehousing team, please contact one of our expert consultants.