DB With DB: ColumnStore index in SQL Server

ColumnStore index in SQL Server

The SQL Server in-memory columnstore index stores and manages data by using column-based data storage and column-based query processing. Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.
Two types of columnstore indexes we have
1. Clustered column store index
2. Non Clustered Column store index
Clustered colulmnstore index updateable means it allows data updation on underlying table, which leads work load while insert, update, delete operation.

A columnstore index is a technology for storing, retrieving and managing data by using a columnar data format, called a columnstore. SQL Server supports both clustered and nonclustered columnstore indexes. Both use the same in-memory columnstore technology, but they do have differences in purpose and in features they support
Columnstore index works well for read only data with large data set, it helps to scan whole Index/table rather than index seek or to search particular value.
Benefits of columnstore index:
• Columns often have similar data, which results in high compression rates.
• High compression rates improve query performance by using a smaller in-memory footprint. In turn, query performance can improve because SQL Server can perform more query and data operations in-memory.
• A new query execution mechanism called batch-mode execution has been added to SQL Server that reduces CPU usage by a large amount. Batch-mode execution is closely integrated with, and optimized around, the columnstore storage format. Batch-mode execution is sometimes known as vector-based or vectorized execution.
• Queries often select only a few columns from a table, which reduces total I/O from the physical media.

• Can index a subset of columns in the clustered index or heap. For example, it can index the frequently used columns.
• Requires extra storage to store a copy of the columns in the index.
• Is updated by rebuilding the index or switching partitions in and out. It is not updateable by using the DML operations such as insert, update, and delete.
• Can be combined with other indexes on the table.
• Can be configured to use columnstore or columnstore archival compression.
• Does not physically store columns in a sorted order. Instead, it stores data to improve compression and performance. Pre-sorting the data before creating the columnstore index is not required, but can improve columnstore compression.

Characteristics:
Clustered columnstore index:
• Is available in Enterprise, Developer, and Evaluation editions.
• Is updateable.
• Is the primary storage method for the entire table.
• Has no key columns. All columns are included columns.
• Is the only index on the table. It cannot be combined with any other indexes.
• Can be configured to use columnstore or columnstore archival compression.
• Does not physically store columns in a sorted order. Instead, it stores data to improve compression and performance
Non Clustered columnstore index:
• Can index a subset of columns in the clustered index or heap. For example, it can index the frequently used columns.
• Requires extra storage to store a copy of the columns in the index.
• Is updated by rebuilding the index or switching partitions in and out. It is not updateable by using the DML operations such as insert, update, and delete.
• Can be combined with other indexes on the table.
• Can be configured to use columnstore or columnstore archival compression.
• Does not physically store columns in a sorted order. Instead, it stores data to improve compression and performance. Pre-sorting the data before creating the columnstore index is not required, but can improve columnstore compression.

Columnstore index works well for all data types except followings.
• ntext, text, and image
• varchar(max) and nvarchar(max)
• rowversion (and timestamp)
• sql_variant
• CLR types (hierarchyid and spatial types)
• Xml

Columnstore indexes cannot be combined with the following features:
• Page and row compression, and vardecimal storage format (A columnstore index is already compressed in a different format.)
• Replication
• Change tracking
• Change data capture
• Filestream
Syntex to create columnstore index
CREATE TABLE MyTable
(ProductId [int] NOT NULL,
RequestDateKey [int] NOT NULL,
DueDateKey [int] NOT NULL,
ShipDateKey [int] NOT NULL);
GO

CREATE CLUSTERED INDEX cl_myTable ON MyTable (ProductId);
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX csindx_MyTable
ON MyTable
(RequestDateKey, DueDateKey, ShipDateKey);
GO

Create columnstore index using options

CREATE NONCLUSTERED COLUMNSTORE INDEX csindx_MyTable
ON MyTable
(RequestDateKey, DueDateKey, ShipDateKey)
WITH (DROP_EXISTING = ON,
MAXDOP = 2)
ON "default"
GO

Change the data in a nonclustered columnstore index
Once you create a nonclustered columnstore index on a table, you cannot directly modify the data in that table. A query with INSERT, UPDATE, DELETE, or MERGE will fail and return an error message. To add or modify the data in the table, you can do one of the following:

Disable or drop the columnstore index. You can then update the data in the table. If you disable the columnstore index, you can rebuild the columnstore index when you finish updating the data.
Syntex to Disable and Rebuild the columnstore index
ALTER INDEX csindx_MyTable ON MyTable DISABLE;
ALTER INDEX csindx_MyTable on MyTable REBUILD

Conclusion:
After columnstore index implementation around 10 times performance(in terms of IO and Time) we get comparing to existing row based index storage approach

DB With DB

Thursday, February 27, 2014

ColumnStore index in SQL Server

No comments :

Post a Comment