Basic Concept

Snowflake is a new model cloud-based enterprise-level data warehouse.

Architecture

Snowflake’s architecture is a hybrid of traditional shared-disk database architectures and shared-nothing databases architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire dataset locally.

This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.

Reference:

Editions

There are four different editions:

  • Standard Edition
  • Enterprise Edition
  • Business Critical Edition
  • Virtual Private Snowflake (VPS)

Snowflake doesn’t provide on-premise edition at this moment.

Session

A Snowflake session can only have one virtual Warehouse specified, though you can change the current virtual Warehouse for a session.

Virtual Warehouse

You can increase and decrease the size of a Snowflake Virtual Warehouse as your user base & query demand.

Snowflake will not start executing any SQL queries on a new virtual warehouse until all the servers are provisioned. If provisioning fails for any reason, Snowflake will attempt to fix the failed servers, and SQL will start executing once 50% or more virtual servers are provisioned.

Data Type

Snowflake support all the following semi-structured data types:

  • AVRO
  • Parquet
  • ORC
  • JSON
  • XML

Data Sharing

In Snowflake Data Sharing, no data is copied. It is just the metadata that enables the sharing of data. Since no data is copied, the target Snowflake account, aka consumer, is not charged for any storage.

Each Snowflake share can consist of:

  1. privileges that grant access to a database, a schema from which objects will be shared
  2. privileges that grant access to the object, table, secure views from which will be shared
  3. the consumer account with which the database, the objects are to be shared

In Data Sharing scenarios, the virtual warehouse will be executed in:

  • If the Data Sharing is to a Snowflake consumer, the user will use their own virtual warehouse to execute
  • If it is a non-Snowflake customer is consuming the share, the customer will use producer’s virtual warehouse resources to execute
    • better to have separate configuration

In Snowflake, sharing from/to Virtual Private Snowflake or Snowflake VPS editions is not supported.

Cloning

Cloning, also referred as zero-copy cloning, creates a copy of a database, schema or table. A snapshot of data present in the source object is taken when the clone is created and is made available to the clone object. The clone object is writeable, independent of the clone source. That is, changes made to either the source object or the clone object are not part of the other. Cloning a database will clone all the schemas and tables within that database Cloning a schema will clone all the tables within that schema.

ref at here

Table Cloning

Table Cloning will copy

  • table structure
  • data
  • certain attributes

Table Cloning will not copy load metadata. Therefore, files that have been processed into the source table can be loaded again into a clone table.

Query Performance Optimisation

Clustering Key

A table is requiring a clustering key is because of

  • The query performance has slowed down over time
  • The size of the table is in multi-terabytes

Service Maintenance

Snowflake use a staged release process for new releases, which Snowflake releases updates is first applied to early access account, then standard & premier accounts, and finally Enterprise upgrades.

Data Ingestion

Load Metadata

Snowflake maintains file load metadata, through which it tracks which files have already been processed.

Snowflake is using load metadata to maintain which files have already been loaded. The default expire date for load metadata is 64 days.