+ All Categories
Home > Technology > Architektur von Big Data Lösungen

Architektur von Big Data Lösungen

Date post: 13-Apr-2017
Category:
Upload: guido-schmutz
View: 61 times
Download: 0 times
Share this document with a friend
39
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Architektur von Big Data Lösungen Guido Schmutz ([email protected] ) @ gschmutz
Transcript
Page 1: Architektur von Big Data Lösungen

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Architektur von Big Data LösungenGuido Schmutz ([email protected])

@gschmutz

Page 2: Architektur von Big Data Lösungen

Guido Schmutz

Working for Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOACo-Author of different booksConsultant, Trainer, Software Architect for Java, SOA & Big Data / Fast DataMember of Trivadis Architecture BoardTechnology Manager @ Trivadis

More than 30 years of software development experience

Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz

2 Architektur von Big Data Lösungen

Page 3: Architektur von Big Data Lösungen

Agenda

1. Introduction2. Big Data Reference Architectures

• Traditional Big Data• Event / Stream-Processing• Lambda Architecture• Kappa Architecture• Unified Architecture• Microservices Architecture

3. Big Data Ecosystem – many choices sorted!

3 Architektur von Big Data Lösungen

Page 4: Architektur von Big Data Lösungen

Introduction

4 Architektur von Big Data Lösungen

Page 5: Architektur von Big Data Lösungen

Big Data Definition (4 Vs)

+Timetoaction?– BigData+Real-Time=StreamProcessing

CharacteristicsofBigData:ItsVolume,VelocityandVarietyincombination

Reliable Data Ingestion in Big Data/IoT

Page 6: Architektur von Big Data Lösungen

How to do Big Data? Why is a structuring / architecture important?

6 Architektur von Big Data Lösungen

Page 7: Architektur von Big Data Lösungen

Why talk about Big Data Architectures?

Choosing the right architecture is key for any (big data) project

Big Data is still quite a rather young field and therefore a “moving target”

no standard architectures available which have been used for years

In the past years, some architectures and best practices have evolved

Know your use cases before choosing your architecture / technologies

To have a reference architecture in place helps in choosing the right/matching technologies

7 Architektur von Big Data Lösungen

Page 8: Architektur von Big Data Lösungen

Important Properties for choosing (Big) Data Architecture

Latency

Keep raw and un-interpreted data “forever” ?

Volume, Velocity, Variety, Veracity

Ad-Hoc Query Capabilities needed ?

Robustness & Fault Tolerance

Scalability

8 Architektur von Big Data Lösungen

Page 9: Architektur von Big Data Lösungen

Big Data Reference Architectures -Traditional Big Data

9 Architektur von Big Data Lösungen

Page 10: Architektur von Big Data Lösungen

“Traditional Architecture” for Big Data

DataIngestion (Analytical)DataProcessing

DataSources

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

Batchcompute

PushingIngestion ResultStore

QueryEngine

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRest

PullingIngestion

Channel

10 Architektur von Big Data Lösungen

Page 11: Architektur von Big Data Lösungen

“Traditional Architecture” for Big Data – Hadoop Technology Mapping

DataIngestion (Analytical)DataProcessing

DataSources

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

Batchcompute

PushingIngestion ResultStore

QueryEngine

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRest

PullingIngestion

Channel

11 Architektur von Big Data Lösungen

Page 12: Architektur von Big Data Lösungen

“Traditional Architecture” for Big Data – Spark Technology Mapping

DataIngestion (Analytical)DataProcessing

DataSources

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

Batchcompute

PushingIngestion ResultStore

QueryEngine

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRest

PullingIngestion

Channel

12 Architektur von Big Data Lösungen

Page 13: Architektur von Big Data Lösungen

“Traditional Architecture” for Big Data – Feeding in High-Volume Event Streams

DataIngestion (Analytical)DataProcessing

DataSources

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

Batchcompute

PushingIngestion ResultStore

QueryEngine

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRest

PullingIngestion

Channel

?

?

13 Architektur von Big Data Lösungen

Page 14: Architektur von Big Data Lösungen

Traditional Architecture for Big Data

• Batch Processing - “Data at Rest”

• Not for low latency use cases• Responses are delivered “after the fact”• Maximum value of the identified situation is lost• Decision are made on old and stale data

• Spark Core is a faster alternative to Hadoop Map Reduce, but still Batch Processing

• Spark Ecosystems offers a lot of additional advanced analytic capabilities (machine learning, graph processing, …)

14 Architektur von Big Data Lösungen

Page 15: Architektur von Big Data Lösungen

Big Data Reference Architectures –Event/Stream Processing

15 Architektur von Big Data Lösungen

Page 16: Architektur von Big Data Lösungen

Event / Stream Processing – “Data in Motion”

“Data in motion”

Events are analyzed and processed in real-time as the arrive

Decisions are timely, contextual and based on fresh data

Decision latency is eliminated

16 Architektur von Big Data Lösungen

Page 17: Architektur von Big Data Lösungen

Event / Stream Processing Architecture

DataIngestion

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

Logfiles

Social

RDBMS

ERP

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

Messaging

ResultStore

=DatainMotion =DataatRest17 Architektur von Big Data Lösungen

Page 18: Architektur von Big Data Lösungen

Challenges for Ingesting Data

Multitude of sensors

Real-Time Streaming

Multiple Firmware versions

Bad Data from damaged sensors

Regulatory Constraints

Data Quality

18 Architektur von Big Data Lösungen

Page 19: Architektur von Big Data Lösungen

Continuous Data Ingestion

DBSource

BigData

Log

StreamProcessing

IoT Sensor

EventHub

Topic

Topic

REST

Topic

IoT GW

CDCGW

Conn

ectCDC

DBSource

Log CDC Native

IoT Sensor

IoT Sensor

19

DataflowGW

Topic

Topic

Queue

MessageGW

Topic

DataflowGW

Dataflow

TopicRE

ST19FileSourceLog

Log

Log

Social

Native

Topic

Topic

19 Architektur von Big Data Lösungen

Page 20: Architektur von Big Data Lösungen

Continuous Data Ingestion

DBSource

BigData

Log

StreamProcessing

IoT Sensor

EventHub

Topic

Topic

REST

Topic

IoT GW

CDCGW

Conn

ectCDC

DBSource

Log CDC Native

IoT Sensor

IoT Sensor

20

DataflowGW

Topic

Topic

Queue

MessageGW

Topic

DataflowGW

Dataflow

TopicRE

ST20FileSourceLog

Log

Log

Social

Native

Topic

Topic

20 Architektur von Big Data Lösungen

Page 21: Architektur von Big Data Lösungen

DataIngestion (Analytical)Real-TimeDataProcessing

Event / Stream Processing Architecture – Open Source Technology Mapping

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

Logfiles

Social

RDBMS

ERP

Sensor

Machine

Stream/EventProcessing

Messaging

ResultStore

=DatainMotion =DataatRest22 Architektur von Big Data Lösungen

Page 22: Architektur von Big Data Lösungen

DataIngestion (Analytical)Real-TimeDataProcessing

Event / Stream Processing Architecture – Oracle Technology Mapping

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

Logfiles

Social

RDBMS

ERP

Sensor

Machine

Stream/EventProcessing

Messaging

ResultStore

=DatainMotion =DataatRest23 Architektur von Big Data Lösungen

Page 23: Architektur von Big Data Lösungen

Event / Stream Processing Architecture

The solution for low latency use cases

Process each event separately => low latency

Process events in micro-batches => increases latency but offers better reliability

Previously known as “Complex Event Processing”

Keep the data moving / Data in Motion instead of Data at Rest => raw events were not stored

24 Architektur von Big Data Lösungen

Page 24: Architektur von Big Data Lösungen

Event / Stream Processing Architecture - Keep raw event data

DataIngestion

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

Logfiles

Social

RDBMS

ERP

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessingMessaging

ResultStore

(Analytical)BatchDataProcessing

RawData(Reservoir)

=DatainMotion =DataatRest25 Architektur von Big Data Lösungen

Page 25: Architektur von Big Data Lösungen

Big Data Reference Architectures -Lambda Architecture for Big Data

26 Architektur von Big Data Lösungen

Page 26: Architektur von Big Data Lösungen

“Lambda Architecture” for Big Data

DataIngestion

(Analytical)BatchDataProcessing

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

Batchcompute

Messaging

ResultStore

QueryEngine

ResultStore

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRest

PullingIngestion

27 Architektur von Big Data Lösungen

Page 27: Architektur von Big Data Lösungen

Lambda Architecture for Big Data

Combines (Big) Data at Rest with (Fast) Data in Motion

Closes the gap from high-latency batch processing

Keeps the raw information forever

Makes it possible to rerun analytics operations on whole data set if necessary => because the old run had an error or => because we have found a better algorithm we want to apply

Have to implement functionality twice• Once for batch• Once for real-time streaming

29 Architektur von Big Data Lösungen

Page 28: Architektur von Big Data Lösungen

Big Data Reference Architectures -„Kappa“ Architecture

30 Architektur von Big Data Lösungen

Page 29: Architektur von Big Data Lösungen

“Kappa Architecture” for Big Data

DataIngestion

“RawDataReservoir”

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

Messaging

ResultStore

RawData(Reservoir)

ComputedInformation

=DatainMotion =DataatRest31 Architektur von Big Data Lösungen

Queryable State

Page 30: Architektur von Big Data Lösungen

Organizing NoSQL Data Stores – Different Types KeyValueStore

Wide-columnstore

Documentstore

Graphstore

Key ValueK1 V1K2 V2K3 V3

Document{k1:v1,k2:v2,k3:[v1,v2,v3]}

RowkeyCK1

RK1V1

CK2V2

CK3V3

CK4V4

……

CK1RK2V1

CK4V4

CK6V6

……

…………

CK3V3

32 Architektur von Big Data Lösungen

Page 31: Architektur von Big Data Lösungen

Organizing NoSQL Data Stores – and the Products KeyValueStore

Wide-columnstore

Documentstore

Graphstore

33 Architektur von Big Data Lösungen

Page 32: Architektur von Big Data Lösungen

Big Data Reference Architectures -„Unified“ Architecture

34 Architektur von Big Data Lösungen

Page 33: Architektur von Big Data Lösungen

“Unified Architecture” for Big Data

DataIngestion

(Analytical)BatchDataProcessing(CalculateModelsofincomingdata)

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

Batchcompute

Messaging

ResultStore

ResultStore

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRest

PredictionModels

35 Architektur von Big Data Lösungen

Queryable State

Page 34: Architektur von Big Data Lösungen

Event Driven (Micro-) Service Architectures

36 Architektur von Big Data Lösungen

Page 35: Architektur von Big Data Lösungen

MicroserviceMicroservice

MicroserviceMicroservice

Event-Driven (Micro-) Services Architecture

DataIngestion

“RawDataReservoir”

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

MachineMicroservice 2

Service

RawData(Reservoir)

ComputedInformation

=DatainMotion =DataatRest37 Architektur von Big Data Lösungen

State

Batchcompute

Microservice 1

Service State

API

ResultStore

Page 36: Architektur von Big Data Lösungen

Big Data Ecosystem – many choices sorted!

38 Architektur von Big Data Lösungen

Page 37: Architektur von Big Data Lösungen

Building Blocks for (Big) Data ProcessingData

Acquisition

FormatFile System

Stream Processing

Batch SQL

Graph DBMS

Document DBMS

Relational DBMS

Visualization

IoT

Messaging

Analytics

OLAP DBMS

Query Federation

Table-Style DBMS

Key Value DBMS

Batch Processing

In-Memory

39 Architektur von Big Data Lösungen

Page 38: Architektur von Big Data Lösungen

Big Data Ecosystem – many choices sorted!

40 Architektur von Big Data Lösungen

Page 39: Architektur von Big Data Lösungen

Guido SchmutzTechnology Manager

[email protected]

41 Architektur von Big Data Lösungen


Recommended