Post on 20-Sep-2020
transcript
Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova
35th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval, Portland, USA
• In the modern world people are producing a large amount of visual content
• Photo sharing is one of the most popular activities in social applications
235th SIGIR Conference Portland, USA 12/08/12
Such images can be of a highly sensitive nature, disclosing many details of
the users' private sphere. For example photos showing weddings, family
holidays and private parties.
Privacy directed
Search
and
Diversification
Support sharing
Decision
335th SIGIR Conference Portland, USA 12/08/12
Technical challenge
Private
Public
Work Sea Winter Water
Automatic privacy directed image detection and search
435th SIGIR Conference Portland, USA 12/08/12
Outline
• INTRODUCTION: Related Work
• DATA: Selection&Annotation
• FEATURES: Textual&Visual
• EVALUATION: Classification Model
• PRIVACY EXPLORER: Detection&Search
• FUTURE WORK: Ideas&Directions
535th SIGIR Conference Portland, USA 12/08/12
Overview: Sensitive Information on Web
• Colleges keep track of student online activities. The posting of
personal information by students has consequences1,2
• Only a minimal percentage of users changes the highly permeable
privacy preferences (4000 students)3
~90% contain an image, birthday, real name; 40% phone
number
• Even people who did not publish any compromising information,
can leave discoverable footprints (mark-a-friend in Facebook)
1. V. Schleswig-Holstein. Statistische Erfassung zum Internetverhalten Jugendlicher und Heranwachsender. In A study of the
consumer organization in Schleswig-Holstein, Germany, March 2010.
2. S. B. Barnes. A privacy paradox: Social networking in the united states. First Monday, 11(9), Sept. 2006
3. Gross and A. Acquisti. Information revelation and privacy in online social networks. In WPES '05.
.
635th SIGIR Conference Portland, USA 12/08/12
Overview: State of the Art
• Privacy prediction: Based on tags and manually defined user privacy profile
(Vyas et al. 2009, Ahern et al. 2007)
•Access control policies: Access to parts of social graph, use of tags and FOAF relations
(Felt et al. 2008, Au Yeung et al. 2009)
• Image analysis: Textual features in Web2.0
(Figueiredo et al. 2009, San Pedro et al. 2009)
Visual features for photo quality
(Yeh et al. 2010)
735th SIGIR Conference Portland, USA 12/08/12
Outline
• INTRODUCTION: Related Work
• DATA: Selection&Annotation
• FEATURES: Textual&Visual
• EVALUATION: Classification Model
• PRIVACY EXPLORER: Detection&Search
• FUTURE WORK: Ideas&Directions
835th SIGIR Conference Portland, USA 12/08/12
DATA
• Gathering average community notion of privacy
• We crawled “most recently uploaded” Flickr photos (2 Months)
• Started a social annotation game (over the course of 2 weeks)
• 81 users (colleagues, social networks , forum users) , 6 teams
9
„Private are photos which have to do with the private
sphere (like self portraits, family, friends, your home) or
contain objects that you would not share with the entire
world (like a private email). The rest is public. In case no
decision can be made, the picture should be marked as
undecidable."
35th SIGIR Conference Portland, USA 12/08/12
DATA: Inter Rater Agreement
• 37,535 images were judged, each by at least two persons
• 70% were labeled public or undecidable by all annotators
• 13% were labeled private by all annotators, 28% by at least one person
• 4,701 private, 27,405 public labels were assigned.
• Inter-Rater Agreement for 100 photos and 36 users: Fleiss kappa=0.6
1035th SIGIR Conference Portland, USA 12/08/12
Outline
• INTRODUCTION: Related Work
• DATA: Selection&Annotation
• FEATURES: Textual&Visual
• EVALUATION: Classification Model
• PRIVACY EXPLORER: Detection&Search
• FUTURE WORK: Ideas&Directions
1135th SIGIR Conference Portland, USA 12/08/12
Features
• Frontal face detection: faces associated with higher privacy
• Edges: Long coherent edges correspond to artificial environments
• Colors: fewer dominant colors correspond to professional photos
• SIFT - Scale Invariant Feature Transform: Objects/Regions detection
• Text: Tags, image title
• Brightness/Sharpeness/Profile faces did not show strong discriminative
properties
1235th SIGIR Conference Portland, USA 12/08/12
Features: Colors
13
Public
Private
We determined most discriminative colors for each
class using Mutual Information Theory
Example of a public photo with a few dominant colors and a private photo.
35th SIGIR Conference Portland, USA 12/08/12
Features: Edges
14
Example of a public photo dominated by incoherent edges and a private photo of a
working place with a mix of coherent and incoherent edges.
35th SIGIR Conference Portland, USA 12/08/12
Features: SIFT
1535th SIGIR Conference Portland, USA 12/08/12
Features: Text
16
Family, Emotions, Sentiment Nature, Inanimate
35th SIGIR Conference Portland, USA 12/08/12
Outline
• INTRODUCTION: Related Work
• DATA: Selection&Annotation
• FEATURES: Textual&Visual
• EVALUATION: Classification Model
• PRIVACY EXPLORER: Detection&Search
• FUTURE WORK: Ideas&Directions
1735th SIGIR Conference Portland, USA 12/08/12
Classification
18
• We used SVM classifier from SVMLight library
• We converted Edges and Colors histograms to feature vectors
• By SIFT and Text features each object or term is a dimension
• We normalized values in each dimension into the range [0,1] using
Platt’s sigmoid method
35th SIGIR Conference Portland, USA 12/08/12
Classification
19
• Labeled images: 4,701 private, 27,405 public
• Balanced set of 4,701 private and 4,701 randomly selected public images
• We used 60% as training data and 40% as test data
• We used Precision-Recall Curves and Break Even Points as quality
measure
• We tested visual, textual features and their combinations
35th SIGIR Conference Portland, USA 12/08/12
Textual Features P/R Curve
20
The pictures we used for classification experiments, contained good quality textual metadata (e.g titles and at
least three English tags). Thus the text features could provide a short but concise summary of the image
content and result in a BEP of 0.78.
35th SIGIR Conference Portland, USA 12/08/12
Visual Features P/R Curves
21
• The occurrence of faces in photos is an intuitive indicator for privacy, reflected by a
BEP of 0.63 for the face feature
• The edge-direction coherence feature achieves a BEP of 0.65
• SIFT features outperform all of the other visual features (BEP = 0.70)
35th SIGIR Conference Portland, USA 12/08/12
Feature Combinations P/R Curves
22
The combination of the visual and textual features leads to a BEP of 0.80, showing that
textual and visual features can complement each other in the privacy classification task
However, classification with only visual features alone also produces promising results, and
can be useful if no or insufficient textual annotations are available as is the case for many
photos on the web.
35th SIGIR Conference Portland, USA 12/08/12
Outline
• INTRODUCTION: Related Work
• DATA: Selection&Annotation
• FEATURES: Textual&Visual
• EVALUATION: Classification Model
• PRIVACY EXPLORER: Detection&Search
• FUTURE WORK: Ideas&Directions
2335th SIGIR Conference Portland, USA 12/08/12
24
Privacy Directed Search
35th SIGIR Conference Portland, USA 12/08/12
25
PicAlert!
35th SIGIR Conference Portland, USA 12/08/12
Outline
• INTRODUCTION: Related Work
• DATA: Selection&Annotation
• FEATURES: Textual&Visual
• EVALUATION: Classification Model
• PRIVACY EXPLORER: Detection&Search
• FUTURE WORK: Ideas&Directions
2635th SIGIR Conference Portland, USA 12/08/12
Conclusion and Future Work
• We applied classification using various visual and textual features
• Classification models were trained on a large-scale dataset with privacy
assignments obtained through a social annotation game
• Approach of using only visual features shows applicable results and can be
applied in scenarios where no textual annotation is available (e.g. personal
photo collections or mobile phone pictures)
Future Work:
• Using collaborative filtering for personalization
• Using other features like Color-Sift. Using context (mobile sensors)
• Larger user studies / annotation games / temporal developments study
• Integration into popular Web2.0 applications
2735th SIGIR Conference Portland, USA 12/08/12
PicAlert: http://l3s.de/picalert/
Sergej Zerr, Stefan Siersdorfer, Jonathon Hare
zerr@L3S.de
Data
Features
Search & Diversification Evaluation
Thank you!Special thanks to ACM SIGIR,
for providing the travelling grant!