+ All Categories
Home > Technology > Big Data and Fast Data - Lambda Architecture in Action

Big Data and Fast Data - Lambda Architecture in Action

Date post: 14-Jul-2015
Category:
Upload: guido-schmutz
View: 1,362 times
Download: 2 times
Share this document with a friend
49
BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 2014 © Trivadis Big Data und Fast Data - Lambda Architektur und deren Umsetzung 19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung 1 Guido Schmutz DOAG Konferenz 2014 19.11.2014 – 16:00 Raum Oslo
Transcript
Page 1: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

2014 © Trivadis

Big Data und Fast Data - Lambda Architektur und deren Umsetzung

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

1

Guido Schmutz

DOAG Konferenz 2014

19.11.2014 – 16:00 Raum Oslo

Page 2: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Guido Schmutz

•  Working for Trivadis for more than 17 years

•  Oracle ACE Director for Fusion Middleware and SOA •  Co-Author of different books •  Consultant, Trainer Software Architect for Java, Oracle, SOA and

Big Data / Fast Data •  Member of Trivadis Architecture Board •  Technology Manager @ Trivadis

•  More than 25 years of software development experience

•  Contact: [email protected] •  Blog: http://guidoschmutz.wordpress.com •  Twitter: gschmutz

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

2

Page 3: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria.

We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems.

Our company

O P E R A T I O N

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

3

Page 4: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

AGENDA

1.  Big Data and Fast Data, what is it?

2.  Architecting (Big) Data Systems

3.  The Lambda Architecture

4.  Use Case and the Implementation

5.  Summary and Outlook

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

4

Page 5: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Big Data Definition (4 Vs)

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

+ Time to action ? – Big Data + Event Processing = Fast Data

Characteristics of Big Data: Its Volume, Velocity and Variety in combination

5

Page 6: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

The world is changing …

The model of Generating/Consuming Data has changed ….

Old Model: few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming data

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

6

Page 7: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

7

Page 8: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Internet Of Things – Sensors are/will be everywhere

There are more devices tapping into the internet than people on earth

How do we prepare our systems/architecture for the future?

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Source: Cisco Source: The Economist 8

Page 9: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

The world is changing … new data stores

Problem of traditional (R)DBMS approach: §  Complex object graph §  Schema evolution §  Semi-structured data §  Scaling

Polyglot persistence §  Using multiple data storage technologies (RDMBS + NoSQL + NewSQL + In-

Memory)

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

9

ORDER

ADDRESS

CUSTOMER

ORDER_LINES

OrderID: 1001Order Date: 15.9.2012

Line Items

Customer

First Name: PeterLast Name: Sample

Billing AddressStreet: Somestreet 10City: SomewherePostal Code: 55901

Name

Ipod Touch

Monster Beat

Apple Mouse

Quantity

1

2

1

Price

220.95

190.00

69.90

Page 10: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

The world is changing … New platforms evolving (i.e. Hadoop Ecosystem)

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

10

Page 11: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Data as an Asset – Store everything?

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Data is just too valuable to delete! We must store anything!

Nonsense! Just store the data

you know you need today!

It depends … Big Data technologies allow to store the raw information from new and existing data sources so that you can later use it to create new data-driven products, which you haven’t thought about today!

11

Page 12: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

AGENDA

1.  Big Data and Fast Data, what is it?

2.  Architecting (Big) Data Systems

3.  The Lambda Architecture

4.  Use Case and the Implementation

5.  Summary and Outlook

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

12

Page 13: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

What is a data system?

•  A (data) system that manages the storage and querying of data with a lifetime measured in years encompassing every version of the application to ever exist, every hardware failure and every human mistake ever made.

•  A data system answers questions based on information that was acquired in the past

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

13

Page 14: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

How do we build (data) systems today – Today’s Architectures

Source of Truth is mutable!

•  CRUD pattern

What is the problem with this?

•  Lack of Human Fault Tolerance

•  Potential loss of information/data

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Mutable Database

Application (Query)

RDBMS NoSQL

NewSQL

Mobile Web RIA

Rich Client

Source of Truth

Source of Truth

14

Page 15: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lack of Human Fault Tolerance

Bugs will be deployed to production over the lifetime of a data system

Operational mistakes will be made

Humans are part of the overall system •  Just like hard disks, CPUs, memory, software •  design for human error like you design for any other fault

Examples of human error •  Deploy a bug that increments counters by two instead of by one •  Accidentally delete data from database •  Accidental DOS on important internal service

Worst two consequences: data loss or data corruption

As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

15

Page 16: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lack of Human Fault Tolerance – Immutability vs. Mutability

The U and D in CRUD

A mutable system updates the current state of the world

Mutable systems inherently lack human fault-tolerance

Easy to corrupt or lose data

An immutable system captures historical records of events

Each event happens at a particular time and is always true

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Immutability restricts the range of errors causing data loss/data corruption

Vastly more human fault-tolerant

Conclusion: Your source of truth should always be immutable

16

Page 17: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

A different kind of architecture with immutable source of truth

Instead of using our traditional approach … why not building data systems like this

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

HDFS NoSQL

NewSQL RDBMS

View on Data

Mobile Web RIA

Rich Client

Source of Truth

Immutable data

View on Data

Application (Query)

Source of Truth

17

Page 18: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

How to create the views on the Immutable data?

On the fly ?

Materialized, i.e. Pre-computed ?

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Immutable data View

Immutable data

Pre- Computed

Views

Query

Query

18

Page 19: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

(Big) Data Processing

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Immutable data

Pre-Computed

Views Query ? ? Incoming

Data

How to compute the materialized views ?

How to compute queries from the views ?

19

Page 20: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Today Big Data Processing means Batch Processing …

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

HDFS

Data Store optimized for appending large

results

Queries

Stream 1

Stream 2

Event

Hadoop cluster (Map/Reduce)

Hadoop Distributed File System

20

Page 21: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Big Data Processing - Batch

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10

iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15

4 derive derive

Favorite Product List Changes Current Favorite

Product List Current Product Count

Raw information => data Information => derived

21

Page 22: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Big Data Processing – Batch

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

§  Using only batch processing, leaves you always with a portion of non-processed data.

Fully processed data Last full batch period

Time forbatch job

time

now non-processed data

time

now

batch-processed data

But we are not done yet …

22

Page 23: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Big Data Processing - Adding Real-Time

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Immutable data

Batch Views

Query

? Data Stream

Realtime Views

Incoming Data

How to compute queries from the views ? How to compute real-time views

23

Page 24: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Big Data Processing - Adding Real-Time

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10 Now Add Canon Scanner

iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15

5

compute

Favorite Product List Changes Current Favorite Product List

Current Product Count

Now Canon Scanner compute Add Canon Scanner

Stream of Favorite Product List Changes

Immutable data

Views

Data Stream

Query

incoming

24

Page 25: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Big Data Processing - Batch & Real Time

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

time

Fully processed data Last full batch period

now

Time forbatch job

batch processingworked fine here

(e.g. Hadoop)

real time processingworks here

blended view for end user

Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20

25

Page 26: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

AGENDA

1.  Big Data and Fast Data, what is it?

2.  Architecting (Big) Data Systems

3.  The Lambda Architecture

4.  The Use Case and the Implementation

5.  Summary and Outlook

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

26

Page 27: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lambda Architecture

Lambda => Query = function(all data)

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

27

Immutable data

Batch View

Query

Data Stream

Realtime View

Incoming Data

Serving Layer

Speed Layer

Batch Layer

A

B C D

E F

G

Page 28: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lambda Architecture

A.  All data is sent to both the batch and speed layer

B.  Master data set is an immutable, append-only set of data

C.  Batch layer pre-computes query functions from scratch, result is called Batch Views. Batch layer constantly re-computes the batch views.

D.  Batch views are indexed and stored in a scalable database to get particular values very quickly. Swaps in new batch views when they are available

E.  Speed layer compensates for the high latency of updates to the Batch Views

F.  Uses fast incremental algorithms and read/write databases to produce real-time views

G.  Queries are resolved by getting results from both batch and real-time views

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

28

Page 29: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lambda Architecture

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Stores the immutable constantly growing dataset Computes arbitrary views from this dataset using BigData technologies (can take hours) Can be always recreated Computes the views from the constant stream of data it receives Needed to compensate for the high latency of the batch layer Incremental model and views are transient Responsible for indexing and exposing the pre-computed batch views so that they can be queried Exposes the incremented real-time views Merges the batch and the real-time views into a consistent result

Serving Layer

Batch Layer

Speed Layer

29

Page 30: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lambda Architecture

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning.

30

Distribution Layer

Speed Layer

Precompute Views

Vis

ualiz

atio

n

Batch Layer

Precomputed information All data

Incremented information Process stream

Batch recompute

Realtime increment

Serving Layer

batch view

batch view

real time view

real time view

Dat

a Se

rvic

e (M

erge

)

Sensor Layer

Incoming Data

social

mobile

IoT

Page 31: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

AGENDA

1.  Big Data and Fast Data, what is it?

2.  Architecting (Big) Data Systems

3.  The Lambda Architecture

4.  Use Case and the Implementation

5.  Summary and Outlook

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

31

Page 32: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Project Definition

•  Build a platform for analyzing Twitter communications in retrospective and in real-time

•  Scalability and ability for future data fusion with other information is a must

•  Provide a Web-based access to the analytical information

•  Invest into new, innovative and not widely-proven technology •  PoC environment, a pre-invest for future systems

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

32

Page 33: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

"profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/15032594\/1371570460", "profile_link_color":"2FC2EF", "profile_sidebar_border_color":"FFFFFF", "profile_sidebar_fill_color":"252429", "profile_text_color":"666666", "profile_use_background_image":true, "default_profile":false, "default_profile_image":false, "following":null, "follow_request_sent":null, "notifications":null}, "geo":{ "type":"Point","coordinates":[43.28261499,-2.96464655]}, "coordinates":{"type":"Point","coordinates":[-2.96464655,43.28261499]}, "place":{"id":"cd43ea85d651af92", "url":"https:\/\/api.twitter.com\/1.1\/geo\/id\/cd43ea85d651af92.json", "place_type":"city", "name":"Bilbao", "full_name":"Bilbao, Vizcaya", "country_code":"ES", "country":"Espa\u00f1a", "bounding_box":{"type":"Polygon","coordinates":[[[-2.9860102,43.2136542], [-2.9860102,43.2901452],[-2.8803248,43.2901452],[-2.8803248,43.2136542]]]}, "attributes":{}}, "contributors": null, "retweet_count":0, "favorite_count":0, "entities":{"hashtags":[{"text":"quelosepash","indices":[58,70]}], "symbols":[], "urls":[], "user_mentions":[]}, "favorited":false, "retweeted":false, "filter_level":"medium", "lang":"es“ }

Anatomy of a tweet

33

{ "created_at":"Sun Aug 18 14:29:11 +0000 2013", "id":369103686938546176, "id_str":"369103686938546176", "text":"Baloncesto preparaci\u00f3n Eslovenia, Rajoy derrota a Merkel. #quelosepash", "source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\” \u003eTwitter for iPhone\u003c\/a\u003e", "truncated":false, "in_reply_to_status_id":null, "in_reply_to_status_id_str":null, "in_reply_to_user_id":null, "in_reply_to_user_id_str":null, "in_reply_to_screen_name":null, "user":{ "id":15032594, "id_str":"15032594", "name":"Juan Carlos Romo\u2122", "screen_name":"jcsromo", "location":"Sopuerta, Vizcaya", "url":null, "description":"Portugalujo, saturado de todo, de baloncesto no. Twitter personal.", "protected":false, "followers_count":1331, "friends_count":1326, "listed_count":31, "created_at":"Fri Jun 06 21:21:22 +0000 2008", "favourites_count":255, "utc_offset":7200, "time_zone":"Madrid", "geo_enabled":true, "verified":false, "statuses_count":22787, "lang":"es", "contributors_enabled":false, "is_translator":false, … "profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/2649762203\ be4973d9eb457a45077897879c47c8b7_normal.jpeg",

Time Space Content Social Technic

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Page 34: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Views on Tweets in four dimensions

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

34

when ⇐ where+what+who • Time series • Timelines

where ⇐ when+what+who • Geo maps • Density plots

what ⇐ when+where+who • Word clouds • Topic trends

who ⇐ when+where+what • Social network graphs • Activity graphs

Time

Space

Social

Content

Time

Space

Social

Content

Time

Space

Social

Content

Time

Space Social

Content

Page 35: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Accessing Twitter

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

35

Quelle Limitierungen Zugang Twitter’s Search API 3200 / user

5000 / keyword 180 Anfragen / 15 Minuten

gratis

Twitter’s Streaming API 1%-40% des Volumens gratis

DataSift keine 0.15 -0.20$ /

unit Gnip keine Auf Anfrage

Page 36: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lambda Architecture

Open Source Frameworks for implementing a Lambda Architecture

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

36

Distribution Layer

Speed Layer

Precompute Views

Vis

ualiz

atio

n

Batch Layer

Precomputed information All data

Incremented information Process stream

Batch recompute

Realtime increment

Serving Layer

batch view

batch view

real time view

real time view

Dat

a Se

rvic

e (M

erge

)

Sensor Layer

Incoming Data

social

mobile

IoT

Page 37: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lambda Architecture in Action

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

37

Cloudera Distribution

•  Distribution of Apache Hadoop: HDFS, MapReduce, Hive, Flume, Pig, Impala

Cloudera Impala

•  distributed query execution engine that runs against data stored in HDFS and HBase

Apache Zookeeper

•  Distributed, highly available coordination service. Provides primitives such as distributed locks

Apache Storm & Trident

•  distributed, fault-tolerant realtime computation system

Apache Cassandra

•  distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure

Twitter Horsebird Client (hbc)

•  Twitter Java API over Streaming API

Spring Framework

•  Popular Java Framework used to modularize part of the logic (sensor and serving layer)

Apache Kafka

•  Simple messaging framework based on file system to distribute information to both batch and speed layer

Apache Avro

•  Serialization system for efficient cross-language RPC and persistent data storage

JSON

•  open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs.

Page 38: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Facts & Figures

Currently in total

•  2.7 TB Raw Data

•  1.1 TB Pre-Processed data in Impala

•  1 TB Solr indices for full text search

Cloudera 4.7.0 with Hadoop, Pig, Hive, Impala and Solr

Kafka 0.7, Storm 0.9, DataStax Enterprise Edition

14 active twitter feeds

•  ~ 14 million tweets/day ( > 5 billion tweets/year)

•  ~ 8 GB/day raw data, compressed (2 DVDs)

•  66 GB storage capacity / day (replication & views/results included)

Cluster of 10 nodes

•  ~100 processors

•  ~40 TB HD capacity in total; 46% used

•  >500 GB RAM

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

38

Page 39: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Lambda Architecture with Oracle Product Stack

Possible implementation with Oracle Product stack

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

39

Distribution Layer

Speed Layer

Precompute Views

Vis

ualiz

atio

n

Batch Layer

Precomputed information All data

Incremented information Process stream

Batch recompute

Realtime increment

Serving Layer

batch view

batch view

real time view

real time view

Dat

a Se

rvic

e (M

erge

)

Sensor Layer

Incoming Data

social

mobile

IoT

Oracle NoSQL

Oracle RDBMS Oracle Coherence

Oracle BigData Appliance

Oracle NoSQL Oracle Coherence

Oracle Event Processing Oracle GoldenGate

Oracle Data Integrator

Oracle GoldenGate

Oracle Event Processing For Embedded Oracle Service Bus

Ora

cle

Web

Log

ic S

erve

r

OBI

EE

Ora

cle

Ende

ca

Ora

cle

Big

Dat

a C

onne

ctor

s

Oracle Coherence

WebLogic JMS

Ora

cle

BAM

Page 40: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

AGENDA

1.  Big Data and Fast Data, what is it?

2.  Architecting (Big) Data Systems

3.  The Lambda Architecture

4.  Use Case and the Implementation

5.  Summary and Outlook

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

40

Page 41: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Summary – The lambda architecture

•  Can discard batch views and real-time views and recreate everything from scratch

•  Mistakes corrected via re-computation

•  Scalability through platform and distribution

•  Data storage layer optimized independently from query resolution layer

•  Still in a early stage …. But a very interesting idea! •  Today a zoo of technologies are needed => Infrastructure group might not like

it •  Better with so-called Hadoop distributions and Hadoop V2 (YARN)

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

41

Page 42: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Alternative Approaches – Motivation

Data Sharing in Map Reduce …

23/06/14 Obsidian

42

iter. 1 iter. 2 . . .

Input

HDFS"read

HDFS"write

HDFS"read

HDFS"write

Input

query 1

query 2

query 3

result 1

result 2

result 3

. . .

HDFS"read

Page 43: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

iter. 1 iter. 2 . . .

Input

Alternative Approaches – Motivation

What we would like …

23/06/14 Obsidian

43

Distributed"memory

Input

query 1

query 2

query 3

. . .

one-time"processing

Page 44: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Alternatives – Apache Spark

23/06/14 Obsidian

44

Spark

Spark Streaming"

real-time

Spark SQL structured

GraphX graph

MLlib machine learning

YARN HDFS

HDFS Cassandra

Page 45: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Alternative Technologies – Apache Spark

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

45

Distribution Layer

Speed Layer

Precompute Views

Vis

ualiz

atio

n

Batch Layer

Precomputed information All data

Incremented information Process stream

Batch recompute

Realtime increment

Serving Layer

batch view

batch view

real time view

real time view

Dat

a Se

rvic

e (M

erge

)

Sensor Layer

Incoming Data

social

mobile

IoT

Page 46: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

“Kappa Architecture”

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning.

46

Distribution Layer

Speed Layer

Vis

ualiz

atio

n

Batch Layer

All data

Incremented information Process stream

Realtime increment

Serving Layer

real time view

real time view Dat

a Se

rvic

e

Sensor Layer

Incoming Data

social

mobile

IoT

Precomputed analytics analytic view

Dat

a Se

rvic

e

Batch Analytical analysis

Replay

Page 47: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Unified Log Processing Architecture

Stream processing allows for computing feeds off of other feeds

Derived feeds are no different than original feeds they are computed off

Single deployment of “Unified Log” but logically different feeds

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

47

Meter Readings Collector

Enrich / Transform

Aggregate by Minute

Raw MeterReadings

Meter with Customer

Meter by Customer by Minute

Customer Aggregate by Minute

Meter by Minute

Persist

Meter by Minute

Persist

Raw Meter Readings

Page 48: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

Weitere Informationen...

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

48

INFOBOX – Lesen und Löschen •  Folie wenn auf weitere Informationen

verwiesen werden soll, also z.B. Bücher, Websiten, etc.

Page 49: Big Data and Fast Data - Lambda Architecture in Action

2014 © Trivadis

BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

Fragen und Antworten...

2013 © Trivadis

Guido Schmutz

Technology Manager

[email protected]

19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung

INFOBOX – Lesen und Löschen •  Die Schlussfolie steht in zwei Varianten

zur Verfügung, einmal für die Kontaktdaten eines Referenten, einmal in der Variante für zwei oder mehr Referenten

•  Name, Titel und Location jeweils untereinander in eine Zeile (Shift+Return)

•  Die Idee ist das diese Folie als letzte Folie (auch für Fragen und Antworten) am Ende der Präsentation lange stehen bleibt, somit haben die Zuhörer die Möglichkeit die Kontaktdaten aufzuschreiben J


Recommended