Basic Concept
Snowflake is a new model cloud-based enterprise-level data warehouse.
Architecture
Snowflake’s architecture is a hybrid of traditional
shared-disk database architectures
andshared-nothing databases architectures
. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire dataset locally.This approach offers the
data management simplicity
of a shared-disk architecture, but withthe performance and scale-out benefits
of a shared-nothing architecture.
Reference:
Editions
There are four different editions:
- Standard Edition
- Enterprise Edition
- Business Critical Edition
- Virtual Private Snowflake (VPS)
Snowflake doesn’t provide on-premise edition at this moment.
Session
A Snowflake session can only have one virtual Warehouse specified, though you can change the current virtual Warehouse for a session.
Virtual Warehouse
You can increase and decrease the size of a Snowflake Virtual Warehouse as your user base & query demand.
Snowflake will not start executing any SQL queries on a new virtual warehouse until all the servers are provisioned. If provisioning fails for any reason, Snowflake will attempt to fix the failed servers, and SQL will start executing once 50% or more virtual servers are provisioned.
Data Type
Snowflake support all the following semi-structured data types:
- AVRO
- Parquet
- ORC
- JSON
- XML
Data Sharing
In Snowflake Data Sharing, no data is copied. It is just the metadata that enables the sharing of data. Since no data is copied, the target Snowflake account, aka consumer, is not charged for any storage.
Each Snowflake share can consist of:
- privileges that grant access to a database, a schema from which objects will be shared
- privileges that grant access to the object, table, secure views from which will be shared
- the consumer account with which the database, the objects are to be shared
In Data Sharing scenarios, the virtual warehouse will be executed in:
- If the Data Sharing is to a Snowflake consumer, the user will use their own virtual warehouse to execute
- If it is a non-Snowflake customer is consuming the share, the customer will use producer’s virtual warehouse resources to execute
- better to have separate configuration
In Snowflake, sharing from/to Virtual Private Snowflake or Snowflake VPS editions is not supported.
Cloning
Cloning, also referred as
zero-copy cloning
, creates a copy of a database, schema or table. A snapshot of data present in the source object is taken when the clone is created and is made available to the clone object. The clone object is writeable, independent of the clone source. That is, changes made to either the source object or the clone object are not part of the other. Cloning a database will clone all the schemas and tables within that database Cloning a schema will clone all the tables within that schema.
ref at here
Table Cloning
Table Cloning will copy
- table structure
- data
- certain attributes
Table Cloning will not copy load metadata. Therefore, files that have been processed into the source table can be loaded again into a clone table.
Query Performance Optimisation
Clustering Key
A table is requiring a clustering key
is because of
- The query performance has slowed down over time
- The size of the table is in multi-terabytes
Service Maintenance
Snowflake use a staged release process for new releases, which Snowflake releases updates is first applied to early access account, then standard & premier accounts, and finally Enterprise upgrades.
Data Ingestion
Load Metadata
Snowflake maintains file load metadata, through which it tracks which files have already been processed.
Snowflake is using load metadata to maintain which files have already been loaded. The default expire date for load metadata is 64 days
.