Thursday, August 25, 2022

Snowflake - Architecture

 Snowflake 

It is an analytic data warehouse provided as Software-as-a-Service (SaaS). There is no hardware (virtual or physical) to select, install, or configure, there is no software to install, all ongoing maintenance and tunning is handled by Snowflake.

Database Storage

When data is loaded into Snowflake, Snowflake organizes the data into multiple micro partitions that are structured as an internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage. Data is stored in the cloud storage and works as a shared-disk model thereby providing simplicity in data management. This makes sure users do not have to worry about data distribution across multiple nodes in the shared-nothing model. Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. 

Query Processing

Query execution is performed in the processing (compute) layer. Snowflake is processing queries using “virtual warehouses”. Snowflake separates the query processing layer from the disk storage. Each virtual warehouse is a Massively Parallel Processing (MPP) compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider. Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.

Cloud Services

The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.

Among the services in this layer:

  • Authentication

  • Infrastructure management

  • Metadata management

  • Query parsing and optimization

  • Access control





Row-based vs Columnar-based storage organization

No comments: