+ All Categories
Home > Documents > Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group...

Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group...

Date post: 29-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
20
Björn Ommer Computer Vision Group University of Heidelberg I Im ma ag ge e2 2C Co on nt te en nt t: : V Vi is su ue el ll le e O Ob bj je ek kt te er rk ke en nn nu un ng g z zu um m D Da at ta am mi in ni in ng g i in n d de en n G Ge ei is st te es sw wi is ss se en ns sc ch ha af ft te en n u un nd d d di ie e B Be ed de eu ut tu un ng g v vo on n C Cr ro ow wd ds so ou ur rc ci in ng g
Transcript
Page 1: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer

Computer Vision Group

University of Heidelberg

IImmaaggee22CCoonntteenntt:: VViissuueellllee OObbjjeekktteerrkkeennnnuunngg zzuumm DDaattaammiinniinngg iinn ddeenn GGeeiisstteesswwiisssseennsscchhaafftteenn uunndd ddiiee BBeeddeeuuttuunngg vvoonn CCrroowwddssoouurrcciinngg

Page 2: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

OOuurr WWoorrlldd iiss VViissuuaall

“When I took office, only high energy physicists had ever heard of what is called the World Wide Web... Now even my cat has it's own page.” - Bill Clinton

>>110000bbnn images on facebook, ++66bbnn images/month

72 hours of video uploaded to YouTube every minute

The Silver Institute

All photos

Analog photos

Number of Photos Taken Each Year

1st digital consumer SLR

Page 3: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

~1.7M images ~1M images ~108M images of artworks

FFiinnddiinngg RReelleevvaanntt CCoonntteenntt

§  Large-scale digitization in the Arts & Humanities:

§  „We are drowning in information and starved for knowledge“, John Naisbitt: Megatrends

Page 4: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

AAnnaallyyssiiss ooff LLaarrggee PPrree--MMooddeerrnn IImmaaggee DDaattaasseettss

Large scale digitization in the Humanities: Pibliotheca Palatina: §  ~270 000 pages §  ~7 000 miniatures

Crown

Swelling Bows Hat

Symbols of power: >2.5K images related to crowns in Cod. Pal. Germ. ð concentration game with > 2.5K cards!

Textual annotations provided, but… NO labeling w.r.t. §  object locations §  relations betw. objects

within images & betw. Images

§  hierarchical nature of categories

Page 5: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

IImmaaggee RReettrriieevvaall vvss.. TTeexxtt--BBaasseedd SSeeaarrcchh §  Object retrieval in images: search through images

NOT through textual annotations

[Yarlagadda et al., ACCV‘10 e-Heritage]

Page 6: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

CCoommppuutteerr VViissiioonn aanndd tthhee HHuummaanniittiieess

images digitization, signal processing, image databases, manual annotation, …

objects, actions

scene understanding

object & action recognition, object statistics & analysis

Efforts so far: Semantic level:

interactions, context

Humanities

objects, actions object & action recognition, object statistics & analysis

scene understanding

interactions, context

Page 7: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

OOuurr ((VViissuuaall)) WWoorrlldd iiss HHiigghhllyy SSttrruuccttuurreedd sscc

aallee

Patterns, Patterns, Everywhere ð Machine Learning

Page 8: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

RReeccooggnniittiioonn >> OObbsseerrvviinngg PPiixxeellss:: TThhee SSeemmaannttiicc GGaapp

"I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees." --Max Wertheimer

Page 9: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

RReepprreesseennttiinngg SSttrruuccttuurree

§  Structure, esp. shape, is an emergent property ð cannot be observed locally

§  How can we represent what cannot be measured (directly)?

ðPerceptual Grouping §  Bottom-up grouping using relationships

between perceptual entities

  Top-down grouping using prior know-ledge

ðCompositional hierarchy bridges seman-tic gap btw. parts & whole object

Max Wertheimer

Page 10: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

OOuurr AApppprrooaacchh:: CCoommppoossiittiioonnaalliittyy

§  Simple, widely reusable parts & relations between them ð Compositions

[Ommer et al., PAMI’10 ]

Page 11: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

OOuurr OOnnggooiinngg PPrroojjeeccttss iinn tthhee HHuummaanniittiieess 1.  Object detection [Cod. Pal. germ.]

2.  Analysis of object category variability [CPG]

3.  Architectural analysis

4.  Registration of reproductions [Cod. Manesse]

5.  Gesture recognition [Sachsenspiegel]

6.  Iconographic analysis [Chinese Revolution Comics]

7.  Analysis of ancient script [Cuneiform inscriptions]

Page 12: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

GGeessttuurree RReeccooggnniittiioonn -- SSaacchhsseennssppiieeggeell

pointing

swearing

speaking

groundtruth

detected gesture

[Schlecht, Carque, Ommer, ICIP’11]

Page 13: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

IInnttrraa--CCaatteeggoorryy VVaarriiaabbiilliittyy ooff CCrroowwnnss Swabian workshop of Ludwig HHeennfffflliinn

Hagenau workshop of Diebold Lauber.

[Yarlagadda et al., ACCV’10 (eHeritage)]

Alsatian workshop of 1418

Workshop HHeennfffflliinn Lauber Alsatian

HHeennfffflliinn 98.4% 1.6% 0%

Lauber 3.7% 96.3% 0%

Alsatian 0.8% 0.8% 98.3%

pred: corr.:

Page 14: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

IImmaaggee RReeggiissttrraattiioonn –– CCooddeexx MMaanneessssee

[Monroy, Carque, Ommer, ICIP’11]

Page 15: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

RReeccoonnssttrruuccttiinngg tthhee MMeeddiieevvaall DDrraawwiinngg PPrroocceessss

[Monroy et al., ICIP‘11]

Page 16: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

CCoommppoossiittiioonnaall OObbjjeecctt RReeccooggnniittiioonn –– LLeeaarrnniinngg OObbjjeecctt SSttrruuccttuurree ffrroomm SSaammpplleess Our  (visual)  world  is    

highly  structured:   Training  images   Composi�onal  grouping  &  learning   Object  Model  

Recogni�on  algorithm   Detec�on,  classifica�on  Query  image  

Learning  Phase  

Recogni�on  Phase  

[Ommer & Buhmann, PAMI’10]

Page 17: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

DDaattaasseett AAnnnnoottaattiioonn vvss.. MMooddeell LLeeaarrnniinngg

§  Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.

➪ Annotate subset of data to train recognition alg. and have computer label additional data

§  Benefit of training recognition algorithm: §  Automatic generalization of training labels to whole dataset

➪efficiency

§  Learned object models yield an abstraction that can be

verified on novel data ➪label consistency

§  Generalizes to non-categorial annotations (relational data)

§  …

Page 18: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

LLeeaarrnniinngg OObbjjeecctt MMooddeellss

§  Level of supervision

§  Number of training samples

Page 19: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

CCrroowwddssoouurrcciinngg

§  Many independent annotators: §  Effective means for obtaining large amounts of training data

§  Inherent check for consistency

Num

ber

of im

ages

0 20 40 60 100

Percentage of pixels labeled

0

2000

4000

6000

8000

10000

12000

10 30 50 70 80 90

1 2 3 4 5 6 7 8 9 10 11 12 13 14 >150

2000

4000

6000

8000

10000

12000

14000

16000

( )

Nu

mb

er

of

ima

ge

s

Number of objects per image

Page 20: Image2Content: Visuelle Objekterkennung zum Datamining in ... · Björn Ommer Computer Vision Group University of Heidelberg Image2Content: Visuelle Objekterkennung zum Datamining

Björn Ommer | [email protected]

UNIVERSITÄT HEIDELBERG

Björn Ommer | [email protected]

CCoommbbiinniinngg CCrroowwddssoouurrcciinngg wwiitthh OObbjjeecctt RReeccooggnniittiioonn §  Recognition alg. supports annotators:

§  Suggests related images, object localization, labels, ...

§  Generalizes annotation to novel images

§  Alleviate simple tasks ➪ automation

§  Continuous annotation supports recognition alg.: §  Previously learned models are verified in consecutive rounds

§  Provides large amounts of training data

§  Combination of man/machine enables novel games: §  Have users deal with large numbers of images simultaneously,

e.g. relations btw. Images / artistic reproductions


Recommended