2012 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
Welcome
November 2012Vorstellung Parallel Data Warehouse
1
Vorstellung Parallel Data Warehouse
November 2012Meinrad Weiss
2012 © Trivadis
Vorstellung Parallel Data Warehouse2
Data Warehouse – Products Positioning
1
2
3
Minimal HW tune-up/optimization; supports mixed workloads
Balanced solution for mostly scan-centric workloads.
Max HW tune-up for most DW scenarios.
4 Most flexible architecture for handling all DW scenarios.
ScaleComplexityHA by defaultSW-HW integration
SQL Server 2008 R2Fast Track
SQL Server 2008 R2Enterprise
PDWSQL Server 2008 R2Data Center
1
2
3
4
PDW with Distributed
Data Architecture
November 2012
2012 © Trivadis
Vorstellung Parallel Data Warehouse3
Microsoft Data Warehousing Solutions
Tier 1 Offerings
Scalable and reliable platform for data
warehousing on any hardware
Reference Architectures offering best price
performance for data warehousing
Scalable and reliable platform for data
warehousing on any hardware
Appliance for high-end data warehousing requiring highest
scalability, performance, or complexity
Ideal for data marts or small to mid-sized EDWs
Ideal for data marts or small to mid-sized DWs
with scan- centric workloads
Ideal for large data marts or mid-sized EDWs
Offers flexibility in hardware and architecture
Software only Reference Architectures (software and hardware)
Software onlyDW appliance
(fully integrated software and hardware)
Scale-up DW Scale-up DW Scale-up DW Scale-out DW with MPP
10s of TB 2 – 80 TB 10s of TB 10s - 100s of TB
November 2012
2012 © Trivadis
Vorstellung Parallel Data Warehouse4
Data Warehouse – Products Positioning
100% SQL Server 2008 R2 Compatibility
ScaleComplexityHA by defaultSW-HW integration
SQL Server 2008 R2with Fast Track
Reference Architecture
SQL Server 2008 R2Enterprise
PDWSQL Server 2008 R2Data Center
PDW with Distributed Data
Architecture
November 2012
2012 © Trivadis
Vorstellung Parallel Data Warehouse
MPP vs. SMP
November 2012
5
MPP - Massively Parallel Processing Uses many separate CPUs running in parallel to execute a single
program
Each CPU has its own memory and disks
High-speed communications between nodes
Applications must be segmented
SMP MPP SMP - Symmetric Multiprocessing
Multiple CPUs used to complete individual processes simultaneously
All CPUs share the same memory, disks, and network controllers
All SQL Server implementations up until now have been SMP
2012 © Trivadis
Two hardware vendors: HP and Dell
November 2012Vorstellung Parallel Data Warehouse
6
Microsoft+DellParallel Data Warehouse Appliance
Microsoft+HP Enterprise Data
Warehouse Appliance
2012 © Trivadis
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Control Node
Management Node
Landing Zone
Backup Node
Control Rack Data Rack(s)
November 2012
7Vorstellung Parallel Data Warehouse
2012 © Trivadis
SQL
Control Node
Management Node
Landing Zone
Backup Node
Control Rack Data Rack(s)
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Client connections always go through the control node
Windows Failover Cluster for Availability
Contains no persistent user data Processes SQL requests
Prepares execution plan
Orchestrates distributed execution
Local SQL Server processes final query plan and aggregates results
November 2012
8Vorstellung Parallel Data Warehouse
2012 © Trivadis
SQL
Control Node
Management Node
Landing Zone
Backup Node
Control Rack Data Rack(s)
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Provides Support and Patching for the Appliance
Holds image for re-deployment of compute node
Holds Active Directory
November 2012
9Vorstellung Parallel Data Warehouse
2012 © Trivadis
SQL
Control Node
Management Node
Landing Zone
Backup Node
Control Rack Data Rack(s)
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Provides high-capacity storage for data files from ETL processes
Is available as a sandbox for other applications and scripts that run on the internal network
Provides SQL Server Integration Services
SourceLanding
Zone Files
Data Loader
Compute Nodes
DWLoader or SQL Server Integration
Services
November 2012
10Vorstellung Parallel Data Warehouse
2012 © Trivadis
SQL
Control Node
Management Node
Landing Zone
Backup Node
Control Rack Data Rack(s)
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL Provides Integrated Backup Solution
Integrates with 3rd party backup products
Orderable in different sizesNovember 2012
11Vorstellung Parallel Data Warehouse
2012 © Trivadis
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Control Node
Management Node
Landing Zone
Backup Node
Control Rack Data Rack(s)
SQL
Data Rack Servers 5/10 active + 1 passive per Rack
InfiniBand, FC and Ethernet switching
Expansion Grow from 1/2–4 data racks, storage options, test/dev system
Consists of COMPUTE NODES and STORAGE NODES
Shared Nothing
Spare Node provides failover in case of node failure
November 2012
12Vorstellung Parallel Data Warehouse
2012 © Trivadis
Vorstellung Parallel Data Warehouse
Connectivity and Tools
Nexus Query Chameleon
DWSQL
November 2012
13
2012 © Trivadis
Vorstellung Parallel Data Warehouse
Creating a Database
CREATE DATABASE PDW WITH (AUTOGROW = ON, REPLICATED_SIZE = 1024 GB, -- (per Node) DISTRIBUTED_SIZE = 16384 GB, -- (whole System) LOG_SIZE = 1024 GB);
November 2012
14
2012 © Trivadis
Vorstellung Parallel Data Warehouse
Distribution and Replication of Data: Replicate
November 2012
15
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store Dim
Store Dim IDStore NameStore MgrStore Size
Product Dim
Prod Dim IDProd CategoryProd Sub CatProd Desc
MktgCampaign Dim
Mktg Camp IDCamp NameCamp MgrCamp StartCamp End
SQL
SQL
SQL
SQL
TD
PD
SD
MD
TD
PD
SD
MD
TD
PD
SD
MD
Smaller (<5GB ) Dimension Tables are Replicated on Every
Compute Node
TD
PD
SD
MD
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-2
SF-3
SF-4
Result: Fact -Dimension Joins can be performed
locally
2012 © Trivadis
Vorstellung Parallel Data Warehouse
Create Replicated Table
November 2012
16
CREATE TABLE DimProduct(ProductId BIGINT NOT NULL,Description VARCHAR(50),CategoryId INT NOT NULL,ListPrice DECIMAL(12,2))
WITH (DISTRIBUTION = REPLICATE);
Creates tables on each of the individual compute nodes and
assigns them to the REPLICATED file group.
Data Compression is automatically turned on
2012 © Trivadis
Vorstellung Parallel Data Warehouse
Distribution and Replication of Data: Distribute
November 2012
17
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
SQL
SQL
SQL
SQL
Larger (> 10 GB) Fact Table is Hash
Distributed Across All Compute Nodes
SF-1
SF-2
SF-3
SF-4
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store Dim
Store Dim IDStore NameStore MgrStore Size
Product Dim
Prod Dim IDProd CategoryProd Sub CatProd Desc
MktgCampaign Dim
Mktg Camp IDCamp NameCamp MgrCamp StartCamp End
2012 © Trivadis
Vorstellung Parallel Data WarehouseNovember 2012
Distribution on a PDW
PDW Node 1Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h
8 Tables per Node
PDW Node 2Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h
PDW Node 10Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h
PDW Node …
Final Result:80 individual tables across a 10 node (1 data rack) appliance
CREATE TABLE myTable (column Defs)WITH (DISTRIBUTION = HASH (id));
18
2012 © Trivadis
Reference Case: Today’s process flow / Building blocks
DB_GSAPOP
DB_MasterTables
DB_ReportTables
FinanceCube
Baseline : Once data extracted from SAP:Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies)
DW_FinanceTransactions
MasterFinance table population
6 hours 21min
6 hours 1 hour
Suspicious words
Reports
3hr21min
SAP
2012 © Trivadis
Reference Case: Audit Process with PDW
DB_GSAPOP
DB_MasterTables
DB_ReportTables
FinanceCube
Once data is extracted from SAP:Creating 5 CM Reports & FSCP Finance Cube;Time taken: 30 Minutes
DW_FinanceTransactions
MasterFinance table population 8m50sec
load from FlatFile23min
10m10sec 11 min
All 5 Reportswithin 6min
(80) (80) (80)
(80)
SAP
2012 © Trivadis
Vorstellung Parallel Data Warehouse
Appliance Update AU3
November 2012
21
Performance – up to 10x improvement Data Movement Services New cost based Query Optimizer New Data Movement Service
1/2 rack appliances from HP and Dell
System Center 2012 Integration (SCOM pack)
And YES … Support for Stored Procedures (subset)
Collations: Full support for international data
Native SQL Server drivers
2012 © Trivadis
Vorstellung Parallel Data Warehouse
Landing Zone
ETL Tools
Hub and SpokeDepartmental
Reporting
RegionalReporting
High-Performance ReportingCentral EDW Hub
RegionalReporting with
Business Decision
Appliance
Third-Party
RDBMS
Third-PartyData
Integration
Mobile Applicatio
ns
November 2012
22
2012 © Trivadis4 8 12 16 20 24 28 32 36 40
SQL Server Compute Nodes
Sys
tem
Thr
ough
put
Regular SQL Server
( 1 Node)
Seamless Scalability
Half Rack PDW
( 5 Nodes)
Full Rack PDW
( 10 Nodes)
2 Rack PDW( 20 Nodes)
3 Rack PDW( 30 Nodes)
4 Rack PDW( 30 Nodes)
November 2012
25Vorstellung Parallel Data Warehouse
2012 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
Let‘s go.
November 2012Vorstellung Parallel Data Warehouse
26
Wettbewerb