+ All Categories
Home > Documents > Hans-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl

Hans-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl

Date post: 21-Mar-2016
Category:
Upload: bond
View: 53 times
Download: 2 times
Share this document with a friend
Description:
Database Group. A Cost Model for Interval Intersection Queries on RI-Trees. SSDBM 2002 , Edinburgh. Hans-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl. Institute for Computer Science University of Munich, Germany. Outline of the Talk. 1. Introduction 2. RI-Tree 3. Cost Model - PowerPoint PPT Presentation
29
ns-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science University of Munich, Germany Database Group SSDBM 2002, Edinburgh
Transcript
Page 1: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl

A Cost Model for Interval Intersection Queries on RI-Trees

Institute for Computer ScienceUniversity of Munich, Germany

Database Group

SSDBM 2002, Edinburgh

Page 2: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Outline of the Talk

1. Introduction

2. RI-Tree

3. Cost Model

4. Evaluation

5. Conclusions and Future Work

Page 3: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Extended Objects in Databases

t

1D Objects:• Temporal data• Approximate values• Interval constraints• …

2D Objects:• Geographic data• VLSI design• Bitemporal data• …

3D Objects:• CAD documents• Digital mockup• Haptic rendering• …

t

Interval query

Box query

Window query

Page 4: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Integration of Access Methods

Declarative EmbeddingObject-relational DML and DDL

Extensible Indexing Framework

query processing

index_open()index_fetch()index_close()

maintenance

index_create()index_drop()index_insert()index_delete()index_update()

Page 5: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Integration of Access Methods

Extensible Indexing Framework

Declarative EmbeddingObject-relational DML and DDL

Physical ImplementationBlock-Manager, Caches, Locking, Logging, …

User-defined Index Structure

Extensible Indexing FrameworkObject-relational interface for indexmaintenance and querying functions.

Relational ImplementationMapping to built-in indexes (B+-trees);SQL-based query processing

Page 6: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Integration of Access Methods

User-defined Index Structure

Extensible Indexing FrameworkObject-relational interface for indexmaintenance and querying functions.

Relational ImplementationMapping to built-in indexes (B+-trees);SQL-based query processing

Physical ImplementationBlock-Manager, Caches, Locking, Logging, …

Declarative EmbeddingObject-relational DML and DDL

Extensible Optimization Framework

optimization

stats_collect()stats_delete()predicate_sel()index_io_cost()

Page 7: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Integration of Access Methods

User-defined Index Structure

Extensible Indexing FrameworkObject-relational interface for indexmaintenance and querying functions.

Relational ImplementationMapping to built-in indexes (B+-trees);SQL-based query processing

Physical ImplementationBlock-Manager, Caches, Locking, Logging, …

User-defined Cost Model

Extensible Optimization FrameworkObject-relational interface for selectivity estimation and cost prediction functions.

Relational ImplementationMapping to built-in statistics facilities;SQL-based evaluation of cost model

Declarative EmbeddingObject-relational DML and DDL

Page 8: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Outline of the Talk

1. Introduction2. RI-Tree

3. Cost Model

4. Evaluation

5. Conclusions and Future Work

Page 9: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

3a 15a12c

5c 15a

Relational Interval Tree (RI-Tree)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

15

8

1 3 5 7 13119

2 6 10 14

4 12

alicebob

chrisdave

7b1b

13d 13d

Foundation: Interval Tree [Edelsbrunner 1980]

• primary structure: binary search tree on possible endpoints• secondary structure: sorted lists of stored endpoints

each interval is registered at exactly one node

[Kriegel, Pötke, Seidl VLDB 2000]

Page 10: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

15

8

1 3 5 7 13119

2 6 10 14

4 12

RI-Tree: Virtual Primary Structure

151 3 5 7 13119

2 6 10 14

4 12

8

root = 2h–1

1 2h – 1

• no materialization of the binary tree• storage cost O(1): parameter root• fixed data space: root = 2h–1 covers [1..2h – 1]

first step: virtualize the primary structure

Page 11: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

RI-Tree: Relational Secondary Structure

8

151 3 5 7 13119

2 6 10 14

4 123a 15a12c

5c 15a7b1b

13d 13d

second step: manage secondary structure by two B+-trees

• storage of n intervals: O(n/b) disk blocks of size b• insert and delete: O(logbn) disk block accesses in the indexes

node lower

488

13

135

13

id

bacd

node upper

488

13

7121513

id

bcad

lowerIndex(node,lower,id)

upperIndex(node,upper,id)

Page 12: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

upper = 2522 = lower

RI-Tree: Interval Intersection Query

16 = root

24 = fork

25

2826

20

22

1

23

4

h = 5

23

Page 13: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

upper = 2522 = lower

RI-Tree: Interval Intersection Query

16 = root

20

1

23

4

h = 5

select id from upperIndex i, leftNodes leftwhere i.node = left.node and i.upper >= .lower

Page 14: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

upper = 2522 = lower

RI-Tree: Interval Intersection Query

24 = fork

2522

1

23

4

h = 5

select id from upperIndex i, leftNodes leftwhere i.node = left.node and i.upper >= .lowerunion allselect id from upperIndex iwhere i.node between .lower and .upper

23

Page 15: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

upper = 2522 = lower

RI-Tree: Interval Intersection Query

2826

1

23

4

h = 5

select id from upperIndex i, leftNodes leftwhere i.node = left.node and i.upper >= .lowerunion allselect id from upperIndex iwhere i.node between .lower and .upperunion allselect id from lowerIndex i, rightNodes rightwhere i.node = right.node and i.lower <= .upper

Page 16: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

upper = 2522 = lower

RI-Tree: Interval Intersection Query

16 = root

24 = fork

25

2826

20

22

1

23

4

h = 5

I/O complexity:

O(h·logbn + r/b)

select id from upperIndex i, leftNodes leftwhere i.node = left.node and i.upper >= .lowerunion allselect id from upperIndex iwhere i.node between .lower and .upperunion allselect id from lowerIndex i, rightNodes rightwhere i.node = right.node and i.lower <= .upper

23

Page 17: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Outline of the Talk

1. Introduction2. RI-Tree

3. Cost Model

4. Evaluation

5. Conclusions and Future Work

Page 18: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

joinI/O(T,) = min ,( )( )

1 gg Gaps

Gapsleft

right

Gapsleft()

Gapsright()

root

root

I/O Cost Model for Interval Intersections

1234

h = 5T

upperIndex(node, upper, id)

lowerIndex(node, lower, id)

B

root

outputI/O(T,) = (T,)·BO( h·logbn + r/b )

Page 19: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Selectivity Estimation

• Histogram-based:(equi-width histogram)

( , )( , )

,Toverlap bucket

n nii

ii

i

1 1

1

01

– replication of intervals intersection multiple buckets– statistics management requires user-defined code

• Quantile-based:(equi-count histogram)

where ,10n

TrTrTrT rightinnerleft ,

),(),(),(),(

r T

overlap l l q q overlap l T l q q

q qn

left

i i upper i i

i ii

( , )min , , , , ( ), , ,

;

1 1

11

+ better adaption to the data distribution+ exploits built-in statistics of the ORDBMS

r T

overlap q qq q

ninner

i i

i ii

( , ), ,

;

1

11

r Tright ( , ) analogously to rleft

Page 20: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Outline of the Talk

1. Introduction2. RI-Tree

3. Cost Model

4. Evaluation

5. Conclusions and Future Work

Page 21: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Experimental EvaluationDatasets

0,0

0,6

1,2

1,8

0 200 401 602 802bucket (low er bound)

[x 1.000]

num

ber o

f int

erva

ls

[x 1

.000

]

database = UNIdb size = 100,000 intervalsresolution = 100 buckets

0

2

4

6

8

10

62 232 402 572 741bucket (low er bound)

[x 1.000.000]

num

ber o

f int

erva

ls [x

1.0

00]

database = REALdb size = 115,000 intervalsresolution = 100 buckets

UNI REAL

Page 22: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Experimental Evaluation Computation of Statistics

0

10

20

30

40

100.000 200.000 300.000 400.000database s ize [#intervals ]

real

tim

e [s

ec]

histograms

quantiles

database = UNIresolution = 100 buckets/quantiles

Page 23: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Experimental EvaluationSelectivity Estimation

0

2

4

6

8

0 25 50 75 100query s ize [%]

rela

tive

erro

r [%

]

histograms

quantiles

database = UNIdb size = 100,000 intervalsresolution = 100 buckets/quantiles

0

1

2

3

4

0 25 50 75 100query s ize [%]

rela

tive

erro

r [%

]

quantiles

histograms

database = REALdb size = 115,000 intervalsresolution = 100 buckets/quantiles

UNI REAL

Page 24: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Experimental EvaluationSelectivity Estimation

0

5

10

15

0 100 200 300 400resolution [buckets /quantiles ]

rela

tive

erro

r [%

]

quantiles

histograms

database = REALquery size = 10%

Page 25: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Experimental EvaluationCost Estimation

0

5

10

15

0 25 50 75 100query s ize [%]

rela

tive

erro

r [%

]

database = UNIdb size = 100,000 intervalsresolution = 100 quantiles

0

5

10

15

20

25

0 25 50 75 100query s ize [%]

rela

tive

erro

r [%

]

database = REALdb size = 115,000 intervalsresolution = 100 quantiles

UNI REAL

Page 26: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Experimental EvaluationCost Estimation

0

1

2

3

4

5

0 10 20 30 40 50 60 70 80 90 100

actual se lectivity [%]

phys

ical

read

s

[x 1

00 b

lock

s]

est. output I/O est. join I/O est. total I/O real join I/O real total I/O

database = 1D-UNIdb size = 100,000 intervalsresolution = 100 quantiles

0

1

2

3

4

5

0 10 20 30 40 50 60 70 80 90 100

actual se lectivity [%]

phys

ical

read

s

[x 1

00 b

lock

s]

est. output I/O est. join I/O est. total I/O real join I/O real total I/O

database = REALdb size = 115,000 intervalsresolution = 100 quantiles

UNI REAL

Page 27: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Outline of the Talk

1. Introduction2. RI-Tree

3. Cost Model

4. Evaluation

5. Conclusions and Future Work

Page 28: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Conclusions and Future Work

• Relational access methods:– employ an ORDBMS as virtual machine– extensible indexing and optimizing framework

• Indexing extended objects:– Relational Interval Tree

• Development of cost models:– estimation of selectivity and I/O cost

Conclusions:

Future Work: • Cost models:– general interval relationships– interval sequences

Page 29: Hans-Peter Kriegel,  Martin Pfeifle , Marco Pötke, Thomas Seidl

07/25/02 Martin Pfeifle

Any questions?

? ?

?

??

?

?


Recommended