Was ist Big Data? - doag.org · „Big Data Management Systeme bestehen aus“ ...

DOAG Business Intelligence Community

Informiert zu BI und Big Data

2017

Was ist Big Data?

Interaktion: Jetzt bitte Vorurteile „abladen“

2

Definierte Methodik, strukturiert

Lange erprobt

Alles über SQL machbar

Datenqualität

Zu unflexibel

Alles dauert lang

Viel zu teuer

BI / DWH

?

Unstrukturierte Daten

Komplex, mühsam

Zu viele Daten

Big Data

Was ist Business Intelligence?

3

Verfahren und Prozesse zur

systematischen Analyse (Sammlung,

Auswertung und Darstellung) von

Daten in elektronischer Form.

Source: Wikipedia

Was ist ein Data Warehouse?

4

Eine für Analysezwecke optimierte

zentrale Datenbank, die Daten aus

mehreren […] heterogenen Quellen

zusammenführt und verdichtet.

Source: Wikipedia

Ein Data Warehouse (DWH)

5

Wenn‘s BI gibt, warum dann Big Data?

6

Altes Modell: Wenige Produzenten, viele Nutzer

Warum Big Data? Paradigmenwechsel

7

Neues Modell: Viele Produzenten = Viele Nutzer

Woher kommen die Daten?

Herausforderungen

Datenextraktion und -sammlung

Administrieren, analysieren, aggregieren, visualisieren und aus den

gesammelten Daten zeitnah und skalierbar Wissen zu schöpfen

8

Social Media und Netzwerke

(wir alle generieren Daten)

Wissenschaftl. Instrumente

(alle möglichen Daten)

Mobile Devices

(bspw. Tracking)

Sensor Technologie und -

Netzwerke

(alle möglichen Daten)

Und wie ist Big Data definiert?

9

VolumeData at rest

• Tera-, Peta-, Exa,

Zetta-, Yotta-

Bytes verarbeiten

• Sensor- und

Social Data

• Neue Storages

VarietyData in many forms

• Strukturierte und

unstrukturierte

Daten

• Text, Zahlen,

Multimedia

• Unterschiedlichste

Datenquellen

VelocityData in motion

• Streaming Data

• (Milli)sekunden

bis Minuten zur

Erkennung,

Beantwortung

oder Analyse

VeracityData in doubt

• Ungewissheit

durch Datenin-

konsistenz,

Unvollständigkeit,

Mehrdeutigkeit,

Verzögerung,

Täuschung,

Schätzung

adaptiert nach IBM (2014)

Wie ist Big Data definiert? Noch‘n Versuch

10

• McKinsey

Big Data refers to datasets whose size is beyond the ability of typical

database software tools to capture, store, manage, and analyze.

• Gartner

Big Data are high-volume, high-velocity, and/or high-variety information

assets that require new forms of processing to enable enhanced

decision making, insight discovery, and process optimization.

• BARC

Big Data designates methods and technologies for the highly scalable

acquisition, storage, and analysis of polystructured data

Unkonventionelle Methoden und Tools

für unlimitiertere Datenverarbeitung

…

Inter-

net

CRM

Event

ERP

Also: Big Data = BI/DWH in groß und schnell?

11

Fragen wir nach: Inmon & Kimbal zu Big Data & DWH

12

“Data warehouse is an architecture and

Big Data is a technology. They are not the

same thing at all … There simply is not the

carefully constructed and carefully

maintained infrastructure surrounding Big

Data that there is for the data warehouse.

Any executive that would use Big Data for

Sarbanes-Oxley reporting or Basel II

reporting isn’t long for his/her job.”

http://www.forestrimtech.com/big-data-vs-data-warehouse

Bill Inmon zu

Big Data & BI/DWH

Ralph Kimball zu

Big Data & DWH

„It‘s a rennaisance that is happening here

… a Data Warehouse needs to encompass

Big Data and I hope that all folks working

with those (Big Data) topics realize that

they are part of the larger Data Warehouse

team“

„We want to use SQL and SQL like

languages but we don‘t want the RDBMS

storage constraints. The disruptive

solution: Hadoop“http://www.cloudera.com/content/cloudera/en/resources/library/recordedwe

binar/building-a-hadoop-data-warehouse-video.html

http://www.forestrimtech.com/big-data-vs-data-warehouse

http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/building-a-hadoop-data-warehouse-video.html

IT Sicht: Wer ist von Big Data betroffen?

13

Big DataBusiness

IntelligenceApplication

Development

Advanced

Visualization

Advanced

Analytics

Also gut – versuchen wir‘s über die Technik

Big Data „in Ordnung bringen“ – Schritt 1


Die Leinwand des Architekten

Data

Acquisition

Data

Sources

Governance

Organisation

Information

Provisioning Consumer

Data

Management

Legal ComplianceQuality & Accountability SecurityMetadata Management Master Data Management

IT Operations Business StakeholdersBI Competence Center

Un-/Semi-

structured Data

Structured

Data

Master & Reference

Data

Machine Data

Content

Serv

ices

(Pu

sh)

Co

nn

ecto

rs (

Pu

ll)

Str

eam

Batc

h/B

ulk

Incre

men

tal

Fu

ll

Raw Data at Rest

Standardized Data at Rest

Optimized Data at Rest

Data Lab (Sandbox)

Data Refinery/Factory

Virtualization

Raw Data in Motion

Standardized Data in Motion

Optimized Data in Motion

Query

Service / API

Search

Information

Services

Data Science

Tools

Dashboard

Prebuild &

AdHoc BI Assets

Advanced Analysis

Tools

Velocity

VolumeVariety


Ein DWH!

Data

Acquisition

Data

Sources

Governance

Organisation

Information


Data

Management



Un-/Semi-

structured Data

Structured

Data

Master &

Reference

Data

Machine Data

Content

Serv

ices

(Pu

sh)

Co

nn

ecto

rs (

Pu

ll)

Str

eam

Ba

tch

/Bu

lk

Incre

me

nta

lFu

ll

Raw Data at Rest



Data Lab (Sandbox)


Virtualization

Raw Data in Motion



Query

Service / API

Search

Information

Services

Data Science

Tools

Dashboard

Prebuild & AdHoc BI Assets

Advanced Analysis

Tools

Core DWH

Data Marts

Staging Area

ETL


Big Data!

Data

Acquisition

Data

Sources

Governance

Organisation

Information


Data

Management



Un-/Semi-structured Data

StructuredData

Master & Reference

Data

Machine Data

Content

Se

rvic

es

(Pu

sh)

Co

nn

ecto

rs (

Pu

ll)

Str

ea

mB

atc

h/B

ulk

Incre

me

nta

lFu

ll

Raw Data at Rest



Data Lab (Sandbox)


Merge Layer

Raw Data in Motion



Query

Service / API

Search

Information Services

Data Science Tools

Dashboard

Prebuild & AdHoc BI Assets

Advanced Analysis Tools

Event Hub

Stream Analytics

Hadoop Raw Data

Processed Files

NoSQL DB

SQL Engine

Mehr Tools? Mehr Komplexität?

• Best Fit Ansatz

• Nutze das geeignete Tool für ein Problem

• Data Lab Ansatz

• Baue eine Lösung passend zu einem Problem

• Umfangreicheres Know-How

• Mehr Programmiersprachen, mehr Datenbanken …

… und was ist mit umfassenden Lösungen? (Wie bspw. ein DWH)

• Methoden? Architektur? Infrastruktur? Modelle?

Big Data Analytics – Architekturbeispiel

20

Realtime views: e.g. Cassandra

e.g.Spark

e.g. Hadoop

& Spark

Batch views: e.g. Impala

QFD 1 QFD 2 QFD n

QFD 1 QFD 2 QFD n

All Data

(HDFS)

Precomputed Views

(MapReduce)

Process StreamIncremented

Views

Messaging

Kafka

Client

Web App

Batch layer

Speed layer

Serving layer

Batch (re)compute

Realtime increment

QFD = Query Focused DataSource System

API

Java App

Query & Merge

REST

Consumer layerDistribution layerSensor layer

…

…

NoSQL Datenbanken

NoSQL

Databases

Not only SQL

Graph

Databases

Property

Graphs

Neo4J, Datastax Cassandra,

Oracle

RDF Triple

Stores

Oracle Spatial&Graphs,

Allegrograph, Virtuoso,

Blazegraph, Marklogic, Enzo

Key Value

Stores

Oracle NoSQL, Redis, Riak KV …

Wide Column

Stores

Cassandra, Hbase …

Document Store

Databases

MarkLogic, MongoDB …

RDBMS

To manage a mix of structured, semi-structured and

unstructured data

Domain of traditional Databases

Complex & Rich

Big & Fast

Scala

bili

ty

Model-Standardization, Tools, Complexity

Key-valueWide Column

(Column Families / Extensible Records)

Document

Property Graph

Relational

SQL Comfort-Zone

Multi

Dimensional

Semantic Graph

NoSQL – Einsatzgebiete

Wichtige Begriffe (1)

• Apache Hadoop

Ein Open Source Framework mit verteiltem Dateisystem (HDFS),

toolgestütztem Programmierframework (Map/Reduce) und Ressource

Management Service (YARN) für große Cluster aus günstigen

Shared-Nothing Servern.

• Apache Hive

SQL Zugriff auf Dateien unter HDFS – durch Map/Reduce

Programmgenerierung

• Apache Pig

Einfache Scriptsprache (Pig-Latin) als Abstraktionsebene für

Map/Reduce

Wichtige Begriffe (2)

• Apache Spark

Ein Open Source Data Analytics Cluster Computing Framework. Seit

zwei Jahren eines der heißesten Themen bei Big Data. Für Batch-

und Streamverarbeitung, Data Mining und mehr. Kann mit Hadoop –

muss aber nicht.

• NoSQL Datenbanken

Klasse von DBMS, die nicht dem relationalen Modell folgen. NoSQL =

not only SQL.

• Elasticsearch & Solr

Open source Text-Suchmaschinen auf Basis von Apache Lucene.

Deutsche ORACLE Anwender Gruppe (1)

http://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/2297765.pdf

Beschreibt und definiert

Big Data, organisatorische

und technische

Anforderungen sowie

„Tooling“

… und

Entscheidungs-

kriterien

http://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/2297765.pdf


Data Acquisition

DataSources

Governance

Organisation

Information Provisioning Consumer

DataManagement



Un-/Semi-

structured Data

Structured

Data

Master & Reference

Data

Machine Data

Content

Serv

ices

(Pu

sh)

Co

nn

ecto

rs (P

ull)

Str

eam

Batc

h/B

ulk

Incr

em

en

tal

Fu

ll

Raw Data at Rest



Data Lab (Sandbox)


Virtualization

Raw Data in Motion



Query

Service / API

Information

Services

Data Science

Tools

Dashboard

Prebuild &

AdHoc BI Assets

Advanced Analysis

Tools

Golden Gate

Data Integrator

SQL*LoaderDatabase

InMemory. OLAP, Semantic Graph

XDB, JSON, DM, DB M/R,

Enterprise „R“

NoSQLKey Value

Stream Analytics

Enterprise „R“

Big Data

DiscoveryCloudera

HadoopHDFS + Tools

ESSBASE

Big Data SQL

Connectors

BIEE, BI-

Publisher,

Hyperion

Visual

Analyzer, BD

Discovery, Endeca

Enterprise Metadata Manager

Edge Analytics


Oracle Big Data Statement of Direction

http://www.oracle.com/technetwork/database/bigdata-appliance/overview/sod-bdms-2015-04-final-2516729.pdf

„Big Data Management Systeme bestehen aus“ …

Data Warehouse

(Database)

Data Reservoir

BigData „Ecosystem mit

Hadoop & NoSQL

(Big Data Appliance)

„Franchised

Query Engine“

Federation Tool

(Big Data SQL)

http://www.oracle.com/technetwork/database/bigdata-appliance/overview/sod-bdms-2015-04-final-2516729.pdf


“A favorite hobby of new entrants to the database market is to

paint Oracle, the market-leading database, as inflexible and

promote their product on the basis that Oracle will never be able to

provide the same type of functionality as their new platform. Such

vendors pursue this positioning at their peril: object-oriented

databases, massively-parallel databases, columnar databases,

data warehouse appliances and other trends have been outed as

replacements for Oracle Database only to later see their core

benefits subsumed by the Oracle platform.”

Widerstand ist zwecklos!

Diskussion

29

… und nochmal in eigener Sache:

Wer hat Interesse sich aktiv in der DOAG

Business Intelligence Community zu

beteiligen?

Frage: Wieviel Aufwand wird auf mich zukommen?

Antwort: So viel Du möchtest, aber es werden wohl mind. 6 PT im Jahr sein.

Danke fürs Mitmachen!

Date post:	18-Apr-2018
Category:	Documents
Upload:	truongdieu
View:	220 times
Download:	4 times

Was ist Big Data? - doag.org · „Big Data Management Systeme bestehen aus“ ...

Documents