SL SQL Server Community: July 2017

Data Warehouse best practices

This is general guidance for DW and some of concept are based on customer’s scenarios, this is a only guidance and customers technical team can get an idea from here, applicability of this is depend on customers scenarios, this is information only document.

· Identify the Dim and Fact

o Dim : provide the descriptive context – attributes with the who, what, when, why, or how. They should always include friendly names & descriptions.

o Dimension tables should *not* contain aggregatable numeric values (measures)

o Fact: Fact tables contain the numeric, quantitative data (aka measures). Typically one fact table per distinct business process

· Follow good naming convention to represent entity and attributes

· Benefits of a Star Schema

o Optimal for known reporting scenarios

o Denormalized structure, structured around business logic, is good for performance & consistency

o Usability for:

§ Stable, predictable environment

§ Less joins, easier navigation

§ Friendly, recognizable names

§ History retention

§ Integrate multiple systems Decoupled from source system

· Performance of DW

o Handling Larger Fact Tables

o Clustered Columnstore Index

§ Reducing data storage due to compression of redundant values

§ Improving query times for large datasets

§ Improving query times due to reduced I/O (ex: column elimination)

§ suitable for:Tables over 1 million rows

§ Data structured in a denormalized star schema format (DW not OLTP) ü Support for analytical query workload which scans a large number of rows, and retrieves few columns

§ Data which is not frequently updated (‘cold’ data not ‘hot’)

§ Can selectively be used on insert-oriented workloads (ex: IoT)

o Table Partitioning

§ Improving data load times due to partition switching

§ Flexibility for maintenance on larger tables

§ Improving query performance (possibly) due parallelism & partition elimination behavior

§ Speeding up ETL processes

§ Large datasets (50GB+) ü Small maintenance windows

§ Use of a sliding window ü Storage of partitions on separate drives (filegroups)

§ Older (cold) data on cheaper storage

§ Historical data on read-only filegroup ü

§ Speeding up queries (possibly) ,Partition elimination , Parallelism

· Track History in a DW

o Most common options for tracking history:

§ 1. Slowly changing dimension

§ 2. Fact snapshot tables

§ 3. Timestamp tracking fact New option in SQL Server 2016:

§ 4. Temporal data tables à Not a full replacement for slowly changing dimensions, but definitely useful for auditing

· Handling of Nulls

o Dimensions Rule of thumb is to avoid nulls in attribute columns.

o Best practice is to avoid nulls in foreign keys. (However, nulls are ok for a measure.) By using an ‘unknown member’ relationship to the dimension, you can:

§ Safely do inner joins

§ Allow the fact record to be inserted & meet referential integrity

§ Allow the fact record to be inserted which avoids understating measurement amounts

· Manually Maintained Data Maintain a DML script in a Lookup (LKP) table instead of hard-coding in the ETL

· User SQL Data tool ( SSDT ) to compare schema and data

· Extensibility in a DW Design

o change in mind. Ex: Create a lookup table with code/descriptions, or implement in a view, rather than hard-coding in ETL.

o Plan for a hybrid environment with multiple architectures. Introduce conformed dimensions first whenever possible.

o Try to avoid isolated “stovepipe” implementations unless the isolation is absolutely intended. Conduct active prototyping sessions with business users to flush out requirements. A data modeling tool like Power BI works well for this.

o Consider using an OLAP cube or in-memory model (like Analysis Services) for:

§ Summary data (as opposed to summary tables in your DW) • Year-to-Date type of calculations • Year-over-Year type of calculations • Aggregate level calculations (as opposed to row-by-row calculations)

Summary

· DW Design Principles

o Staging as a “kitchen” area

o Integrate data from multiple systems to increase its value

o Denormalize the data into a star schema

o A column exists in one and only one place in the star schema

o Avoid snowflake design most of the time

o Use surrogate keys which are independent from source systems

o Use conformed dimensions

o Know the grain of every table

o Have a strategy for handling changes, and for storage of history

o Store the lowest level of detail that you can

o Use an ‘unknown member’ to avoid understating facts

o Transform the data, but don’t “fix” it in the DW

o Structure your dimensional model around business processes

o Design facts around a single business event

o Always use friendly names & descriptions

o Use an explicit date dimension in a “role-playing” way

o Utilize bridge tables to handle many-to-many scenarios

o Plan for complexities such as:

§ Header/line data

§ Semi-additive facts

§ Multiple currencies

§ Multiple units of measure

§ Alternate hierarchies and calculations per business units

§ Allocation of measures in a snowflake design

§ Reporting of what didn’t occur (factless facts)

§ Dimensional only analysis

· Reference architectures

SL SQL Server Community

Monday, July 31, 2017

SQL DW Best pratices