+ All Categories
Home > Documents > Dirk Pieper/Friedrich Summann Bielefeld UL

Dirk Pieper/Friedrich Summann Bielefeld UL

Date post: 11-Jan-2016
Category:
Upload: floria
View: 37 times
Download: 0 times
Share this document with a friend
Description:
Bielefeld Academic Search Engine (BASE): an End-user Oriented Institutional Repository Search Service. Dirk Pieper/Friedrich Summann Bielefeld UL. Part 1: Institutional Repository Servers BASE: concept and content Creating a special view on institutional repository server collections - PowerPoint PPT Presentation
22
BASE: Institutional Repositories Bielefeld Academic Search Engine (BASE): an End-user Oriented Institutional Repository Search Service Dirk Pieper/Friedrich Summann Bielefeld UL
Transcript
Page 1: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Bielefeld Academic Search Engine (BASE):

an End-user Oriented Institutional Repository Search Service

Dirk Pieper/Friedrich Summann

Bielefeld UL

Page 2: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Part 1:Institutional Repository ServersBASE: concept and contentCreating a special view on institutional repository server collectionsDemo: BASE user-interface and further visions

Part 2:OAI dataflow, BASE dataflowRepository information in registriesOAI harvesting problemsFurther developments of BASE

Overview:

Page 3: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Definition: “A digital collection capturing and preserving the intellectual output of a single or multi-university community.” (Raym Crow, http://www.arl.org.sparc/IR/ir.html)IR servers exist of course also outside the university community IR servers appear as simple web sites, database systems with OAI interface, …

Institutional Repository Servers:

Page 4: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

BASE uses Fast Data Search BASE contains intellectual selected resources with focus on OAI-Servers but also web crawled contentBASE displays result lists as bibliographic data and full text hitsBASE frontend is written in PHP using the search API from Fast Data SearchBASE offers sorting, search refinement and search history

BASE: concept and content

Page 5: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Search API

Pipeline

QU

ERY &

RESU

LTPR

OC

ESSINGDO

CU

MEN

TPR

OC

ESSING

Pipeline

Pipeline

FILETRAVERSER

FILTER

SEARCH

INDEXFILES

CO

NN

ECTO

RS

TUNING, ADMINISTRATION and DEBUGGING

WEBCRAWLER

BASE: concept and content

Page 6: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

BASE: concept and content At present 2,7 mio documents in 189 collections,

15 of them web crawled data

Page 7: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Projekt Gutenberg-DE

Internet Library of Early Journals Oxford

Various Institutional Repositories

Springer Link Metadata

Cornell HistMath Fulltext Crawl

University Michigan Historical Math

CiteSeer Zentralblatt Mathematik

Bielefeld Univ: Math. Preprints

ArXiv OPAC UL Bielefeld

Ifo Institute Munich

Zeitschriften der Aufklärung (Bielefeld UL)

BASE: concept and content

Page 8: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Special view on IR server collections Collections are listed in configuration file

[ftubirmingham]url = "http://eprints.bham.ac.uk/"desc_de = "The Univ. of Birmingham: Eprints Archive"desc_en = "The Univ. of Birmingham: Eprints Archive"descdd_de = "Birmingham Univ."descdd_en = "Birmingham Univ."

Collections can be clustered for user-interface, e.g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], …

Parametric search possible

Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)

Page 9: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Try your search on Google Scholar ...

Vision: search in Google Scholar

Page 10: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Check citations (citing articles) in Google

Scholar ...

Vision: check citations in Google Scholar

Page 11: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

OAI-Data

Harvesting

BASE Internal Index (FAST)

OPAC

Article Database

Dissertations,monographs

(fulltext)

Articles(fulltext)

PubMed, Euclid,ArXiv, CiteSeer,

Citebase, DOAJ articles

All ressources(texts, images,

video,references ....

OAI dataflow at Bielefeld UL

Page 12: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

OAI-Data Web PagesDatabaseRecords

Harvesting Pre-Processing

Processing

Internal Index (FAST)

User interface (PHP)

BASE dataflow

Page 13: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Eprints Registry (607)

Openarchives.org (383)

DSpace Registry (28)

Directory of Open Archive Repositories (324)

Univ. of Illinois Registry (1000)

Repository information in registries

Page 14: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

2

1612

12

5514

6

33

4

2

18

17

3

3

USA 76Canada 13South America 2Africa 2 India 3Australia 11New Zealand 1

3

OAI-compliant univ. repositories in BASE

Page 15: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

OAI Registry Watcher(Bielefeld UL, Perl)

Open Source Harvester (FS Consulting, Perl with modifications) XML Validator and Repairer

(Bielefeld UL, based on Perl XML modules

OAI Harvest Watcher(Bielefeld UL, Perl)

OAI Resource Updater(Bielefeld UL, Perl)

Tools for the Harvesting Environment

Page 16: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Repositories do not response or deliver Error Messages

Data contain only References without any Fulltext

Links to the Document do not work

Access to fulltext is restricted

XML file is not well-formed

Field content varies

OAI harvesting challenges

Page 17: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es <source>http://xxx.xxx.uni-xxxxx.de/publications/

ELibD905_diplom_allnoch.pdf</source>

<dc:creator>Barry Wellman,Jeffrey Boase,Kakuko Miyata</dc:creator> <dc:subject>Barry Wellman,Jeffrey Boase,Kakuko Miyata The Mobile-izing ....</dc:subject>

<dc:title>Talk P. Bruzzone</dc:title> <dc:creator>Bruzzone </dc:creator> <dc:creator>Pierluigi</dc:creator>

Reproductive Biology and Endocrinology 2004, 2:52 doi:10.1186/1477-7827-2-52

<dc:date>2004-07-05</dc:date> <dc:type>Review </dc:type><dc:identifier>http://www.rbej.com/content/2/1/52</dc:identifier>

OAI Harvesting: Problems in Practice 1

Page 18: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

EN: 9910ENG: 771En: 566Eng: 1English: 24084English (United States): 63English and Greek: 1English and Russian: 1English/Japanese: 1English; Russian: 1English=en: 1Translation into English: 2

en: 1279115en-CA: 865en-US: 3en-es: 5en-us: 8en;: 2en_UK: 618en_US: 18456eng: 186787eng : 92eng + dut: 2eng;: 17eng; fre; ger;: 141 ....

OAI Harvesting: Problems in Practice 2- Variations of <dc:language>

Page 19: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Standard repository software is great - for OAI harvesting as well

Small collections – small problems

Getting the related fulltext is complicated

Libraries produce better metadata

Data aggregation may produce problems

Writing e-mails helps - sometimes

Some Rules from Harvesting Practice

Page 20: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Search form (working)

HTTP calls (working)

Web Service (in development)

Federated Search (Vascoda) (in discussion)

Further Developments: BASE Interfaces

Page 21: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

<form action="http://www.base-search.net/index.php" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /></form>

Local Integration: Search Form

Page 22: Dirk Pieper/Friedrich Summann Bielefeld UL

BA

SE:

Inst

ituti

on

al R

ep

osi

tori

es

Thank you!


Recommended