Sunday, October 1, 2017

SQL Server 2017 GA

Microsoft announced general availability of SQL Server 2017, coming October 2! This is an incredible milestone representing the first version of SQL Server to run on Windows Server, Linux and Docker—and it already has been pulled on Docker X 2 million times since November.

https://blogs.technet.microsoft.com/dataplatforminsider/2017/09/25/microsoft-for-the-modern-data-estate/

Monday, July 31, 2017

SQL DW Best pratices

Data Warehouse best practices

This is general guidance for DW and some of concept are based on customer’s scenarios, this is a only guidance and customers technical team can get an idea from here, applicability of this is depend on customers scenarios, this is information only document.

· Identify the Dim and Fact

o Dim : provide the descriptive context – attributes with the who, what, when, why, or how. They should always include friendly names & descriptions.

o Dimension tables should *not* contain aggregatable numeric values (measures)

o Fact: Fact tables contain the numeric, quantitative data (aka measures). Typically one fact table per distinct business process

· Follow good naming convention to represent entity and attributes

· Benefits of a Star Schema

o Optimal for known reporting scenarios

o Denormalized structure, structured around business logic, is good for performance & consistency

o Usability for:

§ Stable, predictable environment

§ Less joins, easier navigation

§ Friendly, recognizable names

§ History retention

§ Integrate multiple systems Decoupled from source system

· Performance of DW

o Handling Larger Fact Tables

o Clustered Columnstore Index

§ Reducing data storage due to compression of redundant values

§ Improving query times for large datasets

§ Improving query times due to reduced I/O (ex: column elimination)

§ suitable for:Tables over 1 million rows

§ Data structured in a denormalized star schema format (DW not OLTP) ü Support for analytical query workload which scans a large number of rows, and retrieves few columns

§ Data which is not frequently updated (‘cold’ data not ‘hot’)

§ Can selectively be used on insert-oriented workloads (ex: IoT)

o Table Partitioning

§ Improving data load times due to partition switching

§ Flexibility for maintenance on larger tables

§ Improving query performance (possibly) due parallelism & partition elimination behavior

§ Speeding up ETL processes

§ Large datasets (50GB+) ü Small maintenance windows

§ Use of a sliding window ü Storage of partitions on separate drives (filegroups)

§ Older (cold) data on cheaper storage

§ Historical data on read-only filegroup ü

§ Speeding up queries (possibly) ,Partition elimination , Parallelism

· Track History in a DW

o Most common options for tracking history:

§ 1. Slowly changing dimension

§ 2. Fact snapshot tables

§ 3. Timestamp tracking fact New option in SQL Server 2016:

§ 4. Temporal data tables à Not a full replacement for slowly changing dimensions, but definitely useful for auditing

· Handling of Nulls

o Dimensions Rule of thumb is to avoid nulls in attribute columns.

o Best practice is to avoid nulls in foreign keys. (However, nulls are ok for a measure.) By using an ‘unknown member’ relationship to the dimension, you can:

§ Safely do inner joins

§ Allow the fact record to be inserted & meet referential integrity

§ Allow the fact record to be inserted which avoids understating measurement amounts

· Manually Maintained Data Maintain a DML script in a Lookup (LKP) table instead of hard-coding in the ETL

· User SQL Data tool ( SSDT ) to compare schema and data

· Extensibility in a DW Design

o change in mind. Ex: Create a lookup table with code/descriptions, or implement in a view, rather than hard-coding in ETL.

o Plan for a hybrid environment with multiple architectures. Introduce conformed dimensions first whenever possible.

o Try to avoid isolated “stovepipe” implementations unless the isolation is absolutely intended. Conduct active prototyping sessions with business users to flush out requirements. A data modeling tool like Power BI works well for this.

o Consider using an OLAP cube or in-memory model (like Analysis Services) for:

§ Summary data (as opposed to summary tables in your DW) • Year-to-Date type of calculations • Year-over-Year type of calculations • Aggregate level calculations (as opposed to row-by-row calculations)

Summary

· DW Design Principles

o Staging as a “kitchen” area

o Integrate data from multiple systems to increase its value

o Denormalize the data into a star schema

o A column exists in one and only one place in the star schema

o Avoid snowflake design most of the time

o Use surrogate keys which are independent from source systems

o Use conformed dimensions

o Know the grain of every table

o Have a strategy for handling changes, and for storage of history

o Store the lowest level of detail that you can

o Use an ‘unknown member’ to avoid understating facts

o Transform the data, but don’t “fix” it in the DW

o Structure your dimensional model around business processes

o Design facts around a single business event

o Always use friendly names & descriptions

o Use an explicit date dimension in a “role-playing” way

o Utilize bridge tables to handle many-to-many scenarios

o Plan for complexities such as:

§ Header/line data

§ Semi-additive facts

§ Multiple currencies

§ Multiple units of measure

§ Alternate hierarchies and calculations per business units

§ Allocation of measures in a snowflake design

§ Reporting of what didn’t occur (factless facts)

§ Dimensional only analysis

· Reference architectures

Tuesday, June 6, 2017

Power BI - Create dim date table easly

get blank query and put below .. :)

//Create Date Dimension
(StartDate as date, EndDate as date)=>
let
//Capture the date range from the parameters
StartDate = #date(Date.Year(StartDate), Date.Month(StartDate),
Date.Day(StartDate)),
EndDate = #date(Date.Year(EndDate), Date.Month(EndDate),
Date.Day(EndDate)),
//Get the number of dates that will be required for the table
GetDateCount = Duration.Days(EndDate - StartDate),
//Take the count of dates and turn it into a list of dates
GetDateList = List.Dates(StartDate, GetDateCount,
#duration(1,0,0,0)),
//Convert the list into a table
DateListToTable = Table.FromList(GetDateList,
Splitter.SplitByNothing(), {"Date"}, null, ExtraValues.Error),
//Create various date attributes from the date column
//Add Year Column
YearNumber = Table.AddColumn(DateListToTable, "Year",
each Date.Year([Date])),
//Add Quarter Column
QuarterNumber = Table.AddColumn(YearNumber , "Quarter",
each "Q" & Number.ToText(Date.QuarterOfYear([Date]))),
//Add Week Number Column
WeekNumber= Table.AddColumn(QuarterNumber , "Week Number",
each Date.WeekOfYear([Date])),
//Add Month Number Column
MonthNumber = Table.AddColumn(WeekNumber, "Month Number",
each Date.Month([Date])),
//Add Month Name Column
MonthName = Table.AddColumn(MonthNumber , "Month",
each Date.ToText([Date],"MMMM")),
//Add Day of Week Column
DayOfWeek = Table.AddColumn(MonthName , "Day of Week",
each Date.ToText([Date],"dddd"))
in
DayOfWeek

Monday, June 5, 2017

Power BI on-prem gateway troubleshoot

Configuring proxy settings for the On-premises Data Gateway

Change the On-Premises Data Gateway service account

Change the Windows service account for the On-premises Data Gateway service.
The default account for this service is NT SERVICE\PBIEgwService. You will want to change this to a domain user account within your Active Directory domain. Or, you will want to use a managed service account to avoid having to change the password.
You will want to change the account on the Log On tab within the properties of the Windows service.
Restart the On-premises Data Gateway service.
From an admin command prompt, issue the following commands.
```
net stop PBIEgwService

net start PBIEgwService
```
Start the On-premises Data Gateway configurator. You can select the windows start button and search for On-premises Data Gateway.
Sign in to Power BI.
Restore the gateway using your recovery key.
This will allow the new service account to be able to decrypt stored credentials for data sources.

Monday, May 8, 2017

SQL 2017on linux

Part 1

Create vm on Azure using below template.
Red Hat Enterprise Linux 7.3

Connect to linux vm ( install bitvise ssh client on local and connect to VM)

---- install
sudo su
curl https://packages.microsoft.com/config/rhel/7/mssql-server.repo > /etc/yum.repos.d/mssql-server.repo
exit

sudo yum install -y mssql-server

sudo /opt/mssql/bin/mssql-conf setup

systemctl status mssql-server

sudo firewall-cmd --zone=public --add-port=1433/tcp --permanent
sudo firewall-cmd --reload

Part 2

-------------- backup and restore
you can restore the backup taken on windows server to linux as it is.

take backup of windows backup and copy to /var/tmp/backup

sudo su
chown mssql:mssql AdventureWorksDW2016CTP3.bak
--show logical name infor
restore filelistonly from Disk='/var/tmp/backup/AdventureWorksDW2016CTP3.bak'

-- Restore database backup

restore DATABASE AdventureWorksDW2016Linux from Disk='/var/tmp/backup/AdventureWorksDW2016CTP3.bak' WITH FILE = 1,
MOVE 'AdventureWorksDW2014_Data' TO '/var/opt/mssql/data/AdventureWorksDW2016CTP3.mdf',
MOVE 'AdventureWorksDW2014_Log' TO '/var/opt/mssql/data/AdventureWorksDW2016CTP3_log.ldf',
NOUNLOAD,
STATS = 5
GO

Thursday, May 4, 2017

Power BI Premium

Power BI Premium

Previously available were two tiers, Power BI Free and Power BI Pro ($10/user/month). The problem with Power BI Pro is that for large organizations, this can add up. In addition, their performance needs might not be met. Power BI Premium, which is an add-on to Power BI Pro, addresses the concern about cost and scale.

For costs, it allows an unlimited number of users since it is priced by aggregate capacity (see Power BI Premium calculator). Users who need to create content in Power BI will still require a $10/month Power BI Pro seat, but there is no per-seat charge for consumption.

For scale, it runs on dedicated hardware giving capacity exclusively allocated to an organization for increased performance (no noisy neighbors). Organizations can choose to apply their dedicated capacity broadly, or allocate it to assigned workspaces based on the number of users, workload needs or other factors—and scale up or down as requirements change.

There will be changes to the Power BI’s free tier. Users of the free tier will now be able to connect to all of the data sources that Pro users can connect to, including those available through the on-premises data gateway, and their storage quota will increase from 1GB to 10GB.

The data refresh maximum increases from once daily to 8 per day (hourly-based schedule), and streaming data rates increase from ten thousand rows per hour to one million rows per hour.

For Power BI Premium, you get 100TB of storage, data refresh maximum of 48 per day (minute-based schedule), and soon-to-be-available is that the dataset size cached limit is removed (it is 1GB in Power BI Pro), so you will be able to build models as large as the Power BI Premium dedicated capacity memory can hold (currently 50GB).

Upcoming features for Power BI Premium include the ability to incrementally refresh the data so that only the newest data from the last day (or hour) is loaded into Power BI, pinning datasets to memory, dedicated data refresh nodes, read-only replicas, and geographic distribution (see Microsoft Power BI Premium Whitepaper for more info).

Users of free tier will no longer be able to share their reports and dashboards with other users. Peer-to-peer dashboard sharing, group workspaces (now called app workspaces), export to PowerPoint, export to CSV/Excel, and analyze in Excel with Power BI apps are capabilities limited to Power BI Pro. The rationale for this is that if the scope of a user’s needs are limited to personal use, then no fees should apply, but if the user wishes to share or collaborate with others, those are capabilities that need to be paid for. For existing users of the free service who have been active within the past year, Microsoft is offering a free, 12-month extended trial of Power BI Pro

If you are sharing dashboards/reports with free users, beginning June 1st they will need to take advantage of the extended Pro trial to continue accessing the content. After the extended trial expires, users will need a Pro license to maintain access.

Thursday, February 23, 2017

Performance tuning tips

1) Format disk with 64K

2) Put tempdb separate disk and add multiple tempdb files ( 2016 on wards)

3) Data Warehouse use Columstore index,this would give performance improvement and grate compression rate

4) OLTP use In-Memory

5) SQL Server defaults to a maximum degree of parallelism (MAXDOP) of 0, which dictates that SQL will dynamically allocate work up to the total number of CPU cores seen by the SQL service. With Hyper Thread turned on for a system with 80 physical cores, SQL will see a total of 160 cores which can, for many workload mixes, lead to a sub-optimal allocation of SQL threads. To address this, we can change the “Max Degree of Parallelism” parameter in SQL Server Advanced properties to less than or equal to the number of physical CPU cores as shown in following figure.

6) In order to reserve some memory for system processes, user processes, programs, etc., we can change the SQL Server “Maximum server memory” setting in SQL Server properties. Typically, a value of total system memory minus 8GB is sufficient. This means that SQL Server will allocate all available memory in the system, except for 8GB

7) every table should have Clustered index ( row or Columstore )