Research Center for Advanced Studiesfrom the National Polytechnic Institute
Information Technology Laboratory
Technical Report:
Ontology learning from text:Method for learning axioms
Ana Rios-Alvarado and Ivan Lopez-Arevalo
CINVESTAV UNIDAD TAMAULIPAS. Parque Cientıfico y Tecnologico TECNOTAM – Km. 5.5
carretera Cd. Victoria-Soto La Marina. C.P. 87130 Cd. Victoria, Tamps.
LTI-TR-2012-07
Cd. Victoria, Tamaulipas, Mexico. December, 2012
Abstract:
Ontologies provide a structural organizational knowledge, they support the exchange and sharingof information. Moreover, one of the main benefits of using ontologies is the ability to infer newknowledge that allows the development of more realistic applications. The need for overcomingthe bottleneck, given in the knowledge acquisition by the manual construction of ontologies, hasmotivated studies on semi-automatic and automatic methods to build ontologies. One of the mainsources of knowledge created by humans is given by text resources. The analysis and extractionof the elements of an ontology from texts is a very hard task. In this report we focus on presentthe method for learning axioms from text based on named entity recognition. In our proposal weexploit corpora with high occurrence of named entities that give information on the individuals in aspecific domain knowledge expressed by the corpus. Given the set of identified named entities theaxiomatic relations such as subClassOf, disjointWith, and equivalentClass were identified. For thispurpose a named entity recognition tool was used and the linguistic context, where classes co-occur,was extracted. This report of activities corresponds to the thrid year of doctoral studies.
KEYWORDS: Ontology learning, axioms, named entity recognition
Corresponding author: Ana Rios-Alvarado <[email protected]> and Ivan Lopez-Arevalo<[email protected]>
© Copyright by CINVESTAV-Tamaulipas. All rigths reserved
Date of submission: December 5th, 2012
Ana B. Rios-Alvarado and Ivan Lopez-Arevalo. Ontology learning from text: Method for learningaxioms. CINVESTAV Tamaulipas. 2012 Dec. 20 pp. Technical Report No. LTI-TR-2012-07
Place and date of publication: Ciudad Victoria, Tamaulipas, MEXICO. December 5th, 2012
Contents
Contents i1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Overview of this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 The method - Axiom learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.2 Identification of instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.3 SameIndividualAs/differentFrom relation . . . . . . . . . . . . . . . . . . . . . . . . 114.4 SubClassOf relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.5 DisjointWith relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.6 EquivalentClass relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Bibliography 19
i
1 Introduction
The access to new technologies on the Web has allowed the creation of a huge amount of unstructured
information. This information currently comes in different formats, such as news, e-mails, blogs, tweets,
which significantly represents a source of collective expertise (know how). In order to store, retrieve, or
infer knowledge from this information, it is necessary represent it by using a conceptual structure. This
could be achieved by means of ontologies.
1.1 Background
According to Studer et al. [9], an ontology is a formal, explicit specification of a shared conceptualization.
Conceptualization refers to an abstract model of some phenomenon in the world. Explicit makes reference
to define the type of concepts used and the constraints of their use. Formal involves the fact that the
ontology should be machine-readable. Shared shows that an ontology captures consensual knowledge, that
is, it is accepted by a group of experts in the domain. Neches et al. [4] describe an ontology as an element
that it defines the basic terms and relations contained into the vocabulary of a topic area as well as the
rules for combining terms and relations to define extensions of a conceptualization. Staab et al. [8] define
an ontology as a formal description of concepts and relationships that can exist for a community of human
and/or machine agents.
The notion of ontologies is crucial for the purpose of enabling knowledge sharing and reuse. In WordNet1
(lexical database for English) appears the follow definition: “an ontology (in Computer Science) is a rigorous
and exhaustive organization of some knowledge domain that is usually hierarchical and contains all the
relevant entities and their relations”. Thus, an ontology should: 1) capture a shared understanding and 2)
enable logical inference on facts through axioms.
An ontology can be built in a manual manner through knowledge engineers and domain experts, resulting
in long and tedious development stages, and becoming a knowledge acquisition bottleneck [3]. That is the
reason why, nowadays an important research area is ontology learning. Ontology learning is the set of
1http://wordnet.princeton.edu/
1
2 1. Introduction
Figure 1: Classification of ontology learning approaches
methods and techniques used for building an ontology from scratch, enriching, or adapting an existing
ontology in a semi-automatic fashion using several knowledge and information sources [5]. Shamsfard and
Barforoush [7] show an overview of classification of ontology learning approaches from different points of
view (see Figure 1). Maedche and Staab [3] consider the cycle of life in the building of an ontology and
claimed four parts in the ontology learning process: extract, prune, refine, import or reuse. Weng et al.
[15] emphasize in the extraction methods considering four categories: dictionary-based, text clustering,
association rules, and knowledge base. Cimiano [1] describes the process to build an ontology based on
the named cake model. The cake model considers building an ontology as overlay, where each layer
corresponds to a task that allows to get a component of the ontology. From the bottom to top layer is
organized as: terms, synonyms, concepts, concept hierarchy, relations, relation hierarchy, axiom schemata,
and general axioms. The methods that can perfom these tasks are classified into four groups based on [14]:
lexical-syntactic patterns, information extraction, machine learning, and co-ocurrence analysis. Techniques
of natural language proccesing are typically used for recognizing relevant terms and their relationships.
The text requires a processing phase, where tasks as: 1) extraction of plain text, 2) splitting of text into
sentences, 3) elimination of stopwords, 4) tagging of sentences, and 5) parsing of the sentences are applied.
In this context, text mining plays an important role for ontology learning. The text mining is the process
that allows to discover patterns and new knowledge for collections of text. In particular, text clustering
techniques are a good option for ontology learning because they can find relations between words that
OL from text: Method for learning axioms 3
appears in different places into the text. In the case of taxonomy extraction task, some techniques based
on distribution of the information on the Web are a good option because they exploit the whole information
contained into the Web. Even though large number of methods for ontology learning have been proposed
and there are a big number of lightweight ontologies, one of the main challenges in ontology learning is
building more expressive ontologies which contain elements such as axioms.
An axiom is an assertion in a logical form that together comprise the overall theory that the ontology
describes in its domain of application. The acquisition of axiom schemata and general axioms are the tasks
which given the high level of expressiveness to an ontology and these elements make a big difference in an
ontology with respect to other models for representing knowledge. Some approaches in ontology learning
that address (automatic and semi-automatic way) the axiom extraction task have used techniques such as
pattern-based [2, 6], transforming rules based [11, 12, 13] or heuristic based [10]. These proposals have
shown that exist an association between lexical relations and axioms.
In this research work, ontology learning techniques are proposed and developed to automatically discover
the vocabulary, relations, and axioms from textual resources.
1.2 Goals
The main objective in this research work is obtain an approach for ontology learning getting the vocabulary,
taxonomic relationships, and key axioms from textual resources.
Thus, the particular goals of our research topic are:
� Obtain a clustering algorithm for get the vocabulary from text.
� Obtain a method to get taxonomic relations and propose a method to extract the taxonomic
relationships between concepts.
� Obtain a method to extract axioms from text and propose a method to learn the axiomatic relations
between classes and instances.
Particularly, this report focuses on present an approach for identify class expression axioms based on
named entity recognition from unstructured text.
4 1. Introduction
For learning axiomatic relationships such as sameIndividualAs, differentFrom, subClassOf,
disjointWith, and equivalentClass, we propose take into account the evidence of named entities in domain-
specific text. Thus, several Named Entity Recognition (NER) tools will tested and the tool with the best
precision was selected. Thus, axiomatic relations can be established based on the named entities identified
by the best NER tool.
Figure 2: Example of ontology manually build. The instance level in an ontology characterises aclass3
Following a bottom up approach, the idea is first to identify individuals, which are instances of some
class. Such classes belong to a taxonomic structure, which is at the core of the ontology. Figure 2 shows
that the instance level corresponds to the leaves in a taxonomic tree structure and the class level to the
branches. The difference between one class and another is that its set of leaves is different and therefore
it can be characterised as a separate (disjoint) class, otherwise if the set of leaves is very similar, then it
can be characterised as an equivalent class. For example, in the instance level, the set of leaves for country
class includes France, Ireland, and Brazil instances, but the set of leaves for city class contains Brussels,
Iraklio, and Belfast instances. Then, country class and city class are disjoint (disjointWith(country,city)).
Thus, the collection of named entities provides the instances for a specific class, and defines a class in an
extensional manner. The proposed approach was implemented and tested with the input corpus of Tourism
domain.
OL from text: Method for learning axioms 5
1.3 Overview of this document
The rest of the document is organized as follows, in Section 2 a brief description of the work related to
generation of axioms is given. Next, in Section 3 the approach and the method to identify class expression
axioms are described. Later, in the Section 4, the experiments carried out are presented. Finally, Section 5
gives some conclusions and the future work.
2 Related work
In order to provide a higher level of expressiveness to learned ontologies, several approaches have been
proposed for extending ontology learning tools. The first approaches were manual; frameworks and tools
such as Protege4, OntoEdit5, NEON6, and KAON7 allow adding axioms by users or domain knowledge
engineers. More recent approaches include some kind of automation in order to add axioms under the
evaluation and supervision of a knowledge expert. Approaches such as LExO [11], LEDA [13], and
ReLExO [12] use a sequence of linguistic analyzers. LExO starts by analyzing the syntactic structure
of an input sentence. The resulting dependency tree is transformed into a set of OWL axioms (concept
inclusion, transitivity, role inclusion, role assertions, concept assertions and individual equalities) by means
of manually engineered transformation rules. ReLExO supports the acquisition and refinement of complex
class descriptions in order to identify passages from text, which indicate the validity of certain knowledge.
Given that the text can contain inconsistency, LeDA allows the automatic generation of disjointness axioms
based on machine learning classification. The classifier, which determines disjointness for any given pair of
classes, is trained based on a gold standard baseline of disjoint axioms manually created.
In other cases, such as [2, 10], the methods are completely automated. In [2] an automatic axiom-
learning algorithm starts from a set of non-taxonomic relations. It uses the Web as corpus and linguistic
techniques based on text patterns and statistical analysis from the distribution of web information. In the
project YAGO [10], facts are extracted from the category system and the info-boxes of Wikipedia and they
are combined with taxonomic relations from WordNet.
4http://protege.stanford.edu/5http://www.semtalk.com/semnet files/POntoEdit.htm6http://www.neon-project.org/nw/Welcome to the NeOnP roject7http://kaon.semanticweb.org/
6 3. The method - Axiom learning
3 The method - Axiom learning
The disjointness of classes guarantees that an individual, as member of one class, cannot be simultaneously
an instance of a specified other class. Similarly, the equivalence of classes is used to indicate that two
classes have precisely the same instances. The obtaining of instances for each class is a key step in the
identification of disjointness or equivalence relations.
The proposed method starts at the instance level, where a NER tool extracts the named entities from
text. Later, at the class level, each class has a set of instances associated with it that characterise it. The
NER tool provides a set of types (type/subtype) associated to each named entity. Using the type and the
linguistic context of each class, an axiomatic relation is identified. Figure 3 shows the general overview of
the proposed steps to extract axioms. This method consists of a bottom-up approach and it follows the
next steps:
1. Extraction of named entities. A NER tool obtains the named entities from text. The named entities
can correspond to one of the following types (defined by the tool): Person, Organization, State, City,
or Holiday among others. NER tools using Linked Data principles provide additional information
describing named entities identified in text. According to Linked Data8 principles, a unique global
identifier defines an entity. Such a de-referenced identifier provides useful information about the
corresponding resources and links to other relevant identifiers. NER tools such as AlchemyAPI and
OpenCalais exploit the Linked Data principles.
2. Identification of instances. The relations of type instanceOf(named entity, class) between a named
entity and a class are obtained by two methods: 1) the given type from the NER tool and 2) the
context where the named entity and its class co-occurs.
3. Building context. The sentences where a set of instances and its corresponding class occur are
grouped to determine if there exists a relation between the contexts of two classes. A part-of-speech
(POS) tagger and a syntactic parser are used to get the linguistic context (i.e., representative elements
such as nouns, verbs, or adjectives and their grammatical relations).
8T. Berners-Lee, Linked Data-Desing Issues, http://www.w3.org/DesingIssues/LinkedData.html (2006)
OL from text: Method for learning axioms 7
The linguistic context supports the identification of relations based on entities used to derive one of the
following axioms: sameIndividualAs, differentFrom, subClassOf, disjointWith, or equivalentClass.
To illustrate the method, a set of sentences was considered. These sentences provide evidence for the
relation subClassOf between festival and event based on the lexical-pattern <is a> and the instanceOf
relation:
� In Wexford the November Opera Festival is an international event.
� The Elephanta Festival is a classical dance and music event on Elephanta Island usually held in
February.
� The Grenada National Museum in the center of town incorporates an old French barracks dating
from 1704.
The festival class does not share a context (a sentence) with museum, theatre or church. On the
contrary, the festival class occurs with the class time or event that indicates some relation between festival
and event. From the given sentences, with the proposed method, we can infer the following relations of the
festival class:
� instanceOf(November Opera Festival, festival),
� instanceOf(Elephanta Festival, festival),
� subClassOf(festival, event),
� disjointWith(festival, museum)
because: i) the named entities November Opera Festival and Elephanta Festival are instances of festival
class; ii) the event class is more general than festival class; iii) the festival and museum class have different
instances.
4 Experiments and Results
For the experiments, we used the Lonely Planet9 dataset, which consists of 1801 files about the Tourism
domain. It covers a list of 96 classes, 278 named entities, and taxonomy with 103 hierarchical relations
9http://www.cimiano.de/doku.php?id=olp
8 4. Experiments and Results
Figure 3: Method for learning axioms
manually annotated. The measures used for the evaluation are: precision (Equation 1), recall (Equation 2),
and F-measure (Equation 3). The precision score is the result of dividing the amount of knowledge entities
retrieved and that are accepted by a human team by the total amount of knowledge entities retrieved.
The recall score is the result of dividing the amount of knowledge entities retrieved and that are accepted
by a human team by the total amount of knowledge entities contained into Lonely Planet dataset. The
F-measure score can be interpreted as a weighted average of the values corresponding to the precision and
recall.
Precision =CorrectlySelectedEntities
TotalSelectedEntities(1)
Recall =CorrectlySelectedEntities
TotalDomainEntities(2)
F =2 ∗ P ∗RP +R
(3)
OL from text: Method for learning axioms 9
4.1 Named Entity Recognition
In the first step, we analyzed eight different NER tools. The aim was test the named entity recognition
tools for ontology population task, so the results retrieved for each tool were compared against a list of
named entities provided by a benchmark for ontology population. The purpose of this test was find the best
NER tool for the automatic process of axiom learning. The classical tools evaluated were: 1) OpenNLP, 2)
PythonNLTK, and 3) StanfordNER; the tools based on Linked Data were: 4) AlchemyAPI, 5) OpenCalais,
6) DBPedia Spotlight, 7) Zemanta, and 8) Extractiv which make use of Linked Data. NER tools such as
AlchemyAPI and OpenCalais exploit the Linked Data principles.
The named entities extraction from the second group of tools incorporates a solution for the
disambiguation problem in named entities detection by analysing the input content for detecting named
entities, assigning them a weighted type by a confidence score and by providing a list of URIs for
disambiguation. In addition, these tools are able to associate every entity to a type in a taxonomy of types.
Table 1 shows the results of evaluation based on precision, recall, and F-measure for the experiments; where
AlchemyAPI presents the best precision. The obtained named entities were compared against a list of 278
named entities annotated manually in a sample corpus with 30 files.
Tool Precision Recall F MeasureAlchemyAPI 0.6648 0.4512 0.5376OpenCalais 0.6384 0.4079 0.4977
StanfordNER 0.5478 0.6389 0.5900Zemanta 0.5404 0.4584 0.4960OpenNLP 0.4873 0.2279 0.3540
PythonNLTK 0.4853 0.9061 0.6346Extractiv 0.2767 0.5703 0.3726
DBpedia Spotlight 0.1168 0.4981 0.1893
Table 1: Evaluation of NER tools
4.2 Identification of instances
In this stage, the objective was to evaluate the identification of the instanceOf relation using AlchemyAPI
and OpenCalais, as these tools define a taxonomy of types. Table 2 presents the results of precision,
10 4. Experiments and Results
recall, and F-measure values on 42 instanceOf relations that were manually annotated. According to
the evaluation, AlchemyAPI had better precision than OpenCalais in this task. More in detail, Table 3
presents the performance of AlchemyAPI and OpenCalais for the identification of instances belonging to
these classes: City, Continent, Country, Holiday, Person, Organization, and Region. The obtained results
were compared manually with 81 instanceOf relations from the Lonely Planet, where 12 correspond to City,
2 to Continent, 35 to Country, 10 to Holiday, 6 to Person, 2 to Organization, and 14 to Region. In most
cases, AlchemyAPI showed the best precision; only for the Person class OpenCalais had better precision
than AlchemyAPI.
Tool Precision Recall F MeasureAlchemyAPI 0.4667 0.1667 0.2456OpenCalais 0.4117 0.1667 0.2376
Table 2: Performance NER tools - identified instances
Class Tool Precision Recall F MeasureCity AlchemyAPI 0.4000 0.5000 0.4444
OpenCalais 0.3529 0.5000 0.4137Country AlchemyAPI 0.7631 0.8285 0.7945
OpenCalais 0.7000 0.8000 0.7486Holiday AlchemyAPI 0.4285 0.3000 0.3529
OpenCalais 0.4000 0.4000 0.4000Person AlchemyAPI 0.0667 0.3333 0.1111
OpenCalais 0.2000 0.1667 0.1818Organization AlchemyAPI 0.1000 0.5000 0.1667
OpenCalais - - -Region AlchemyAPI 0.4000 0.3333 0.1111
OpenCalais 0.2000 0.0714 0.1052
Table 3: Performance NER tools - identified instances by class
Using the context, it can be seen that, in some cases, instances of different classes appear in the same
sentence, i.e. they co-occur. For extracting relations, the linguistic context for each of the extracted named
entity was analysed. Examples of patterns that identify the instanceOf relation are:
� <NE> is a <NP>. Example 1: The Donia is a traditional music festival, it is held on Nosy Be in
May-June.
OL from text: Method for learning axioms 11
� <NP>, <NE>. Example 2: Crete is Greece’s most southerly point, with its largest city, Iraklio,
situated in the middle of the north side of the island.
� <NE>: <NP>. Example 3: South Africa: the country offers everything from ostrich riding to the
world’s highest bungee jump!
� <NP> like <NE> , <NE>, ... and <NE>. Example 4: The usual Christian holidays like Easter
and Christmas are celebrated...
where NE is a named entity and NP is a noun phrase. In example 1, the Donia is an instance of the
festival class and the pattern associated is <NE is a NP>. In example 2, Iraklio is an instance of the city
class where the pattern is <NP, NE>. In example 3, South Africa is an instance of country and the pattern
is <NE: NP>. Finally, in example 4 Easter and Christmas are instances of holiday and the identified
pattern is <NP like NE> and a more general pattern is <NP like NE , NE, ... and NE>.
It is important to note that the context analysis could allow dealing with the problem of ambiguous
named entities. For example, with the Country class and Australia individual, if both elements co-occur in
the same sentence, then Australia as instance of Country is resolved, for example in the sentence A highly
developed country, Australia is the world’s 13th-largest economy?. Whereas, if Australia co-occurs with
the Continent class in the sentence: Australia is the smallest continent and it is also an island?, then the
instanceOf(Australia,Continent) relation is established.
4.3 SameIndividualAs/differentFrom relation
At the instance level, two (or, sometimes, more than two) different named entities identify the same resource.
Those named entities refer to the same individual, and so they can assign objects to the sameIndividualAs
constructor. Some examples of linguistic context where this relation occurs are:
� <NE> (<NE>). Example 1. Beit al-Sahel (Palace Museum) served as the Sultan’s residence until
1964 when the dynasty was overthrown.
� <NE> (also known as <NE>). Example 2: North-eastern Libya, the Jebel Akhdar area (also known
as the Green Mountains), is the most verdant and arguably the most beautiful part of the country.
12 4. Experiments and Results
� <NE>, also called <NE>. Example 3: Dominica’s national bird, the Sisserou, also called the Imperial
Parrot, is about 20in (50cm) long when full grown, the largest of all Amazon parrots.
In example 1, the Sultan’s residence is named as Beit al-Sahel or Palace Museum. In example 2, the
North-eastern Libya area is also called Jebel Akhdar or Green Mountains. Finally, in example 3, Dominica’s
national bird corresponds to Sisserou and Imperial Parrot. The identified patterns are <NE (NE)>, <NE
(also known as NE)>, and <NE, also called NE>, respectively. On the contrary, when the sameIndividualAs
relation is not found between two or more named entities, then the differentFrom relation is established.
For example, in the example 2 the relation sameIndividualAs(Beit al-Sahel, Palace Museum) is found, as
well as differentFrom(Beit al-Sahel, Jebel Akhdar) relation is established.
4.4 SubClassOf relation
At the class level, the SubClassOf relation represents one of the main axioms, which structures the set
of classes into a taxonomy where a higher class is more general than a lower class. Hearst patterns
[13] are mostly used for extracting such subClassOf relationship. Other many approaches for learning
subClassOf relationships exploit hyponymy relationships from WordNet. However, as this approach has
been shown to be limited, we propose the use of NER as an additional approach for identifying subClassOf
relations in text. The NER tool used for this was AlchemyAPI because it shows best precision in
obtaining instances. AlchemyAPI obtains 15 types of instances and 54 subtypes on a sample corpus
with 30 files from the Lonely Planet corpus. Table 4 shows the types of instances, some examples of
the total of subtypes obtained for each of them and the number of correct subtype relations for each type.
For example Location, CityTown, River, BodyOfWater, AdministrativeDivision, TouristAttraction, Island,
Mountain, and Lake are correct subtypes of GeographicFeature type and MilitaryPerson, Actor, FilmActor,
Monarch, MemberOfParlament, OperaCharacter, and Politician are correct subtypes of Person type. In
other cases, the extracted subtype relation is not correct such as Saint for Person, or MeteorologicalService,
GeographicFeature, HumanLanguage, FilmDirector, FilmArtDirector, Organization, and CompanyDivision
as subtypes for Country. A team of humans was asked to evaluate all extracted subtype relation, which
gave a precision of 70.37% for the extracted relations based on AlchemyAPI identified subtypes-types. For
the complete Lonely Planet corpus the total number of classes identified was 441, which 415 subtype-type
relations were obtained. According to the human team evaluation, 222 relations were correct, which gave
OL from text: Method for learning axioms 13
a precision of 47.22%.
Types Subtypes Total of Total ofSubtypes Corrects
Organization - 0 -GeographicFeature Location, Island, CityTown, BodyOfWater, . . . 9 9
Country Location, GovermentalJurisdiction, Kingdom, . . . 19 8City AdministrativeDivision, . . . 3 3
GovermentalJurisdiction. . . .Region Location 1 0Facility - 0 -Holiday - 0 -Sport - 0 -
Continent Location 1 0StateOrCounty Location, PoliticalDistrict, 5 4
AdministrativeDivision. . . .Company - 0 -
NaturalDisaster - 0 -Person MilitaryPerson, Actor, Monarch, Politician, . . . 8 7
HealthCondition DiseaseOrMedicalCondition, CauseOfDeath, . . . 8 7FieldTerminology - 0 -
Table 4: Types/Subtypes identified by AlchemyAPI in 30 files of “Lonely Planet”
4.5 DisjointWith relation
A disjointWith relation states that one class has not an instance member in common with another class.
For learning the disjoint relationship between two classes, we consider named entities that co-occur in the
same context. For each NER (class1, class2) duple, the list of instances was compared. If there is not a
common named entity between the two classes then the disjointWith(class1, class2) relation is established.
To illustrate the evaluation of disjointWith relation extraction, it was used a sample corpus with 30 files;
here 5 types of instances without overlap between their set of instances were obtained. Thus, a number
of 105 duples (class1, class2) were obtained. According to the evaluation of the human team, 88 of the
relationships correspond correctly to disjointWith (class1, class2) and the rest of them (17) have some other
relation. Figure 4 shows a fragment of the total of obtained duples and what duple has a disjoint relation
between class1 and class2. For example, the Region and Holiday classes are disjoint; as are the Country and
14 4. Experiments and Results
Figure 4: Example of disjoint classes
Organization classes, Person and Facility classes, Country and Holiday classes, and the City and Holiday
classes. However, the Region and GeographicFeature are not necessary disjoint classes. Even although
according to NER results, the set of instances were very different between Region and GeographicFeature,
the classes meet in a subClassOf relation. The same case occurs with the Region and Country classes. As
a result, the precision was 83.80% for the learned disjoint relations.
Using the complete Lonely Planet corpus, a total of 325 disjointWith relations were identified, 299 of
those relations are identified as correct, which gave a precision of 92.0%.
4.6 EquivalentClass relation
The equivalentClass relation is established between two classes when the class descriptions include the
same set of individuals. It is important to mention that class equality means that the classes have
the same intensional meaning i.e. denote the same concept. For learning equivalentClass relation, two
ontologies were considered and for each ontology class its set of instances obtained by two different
NER tools were compared, if the set of instances between two different classes is highly similar then
an equivalentClass(class1, class2) relation can be established. Highly similar means that almost the total
of named entities detected by the NER tool is the same in both classes, that is because the identification
OL from text: Method for learning axioms 15
Figure 5: Example of equivalent classes
of instances depends on the precision of the NER tool.
In this case, using the same sample corpus with 30 files, the AlchemyAPI and OpenCalais tools identify
15 and 17 classes, respectively. However, only 16 duples (AlchemyAPI:class1, OpenCalais:class2) of the
total (255) have overlap between their set of instances. According to the evaluation of the human team, 11
of the relationships correspond correctly to equivalentClass(class1, class2) and the rest of them have some
other relation. As a result, the precision was 68.75% for the learned equivalentClass relations.
Figure 5 shows some examples of duples. In the case of the classes as AlchemyAPI:Organization /
OpenCalais:Organization, AlchemyAPI:Country / OpenCalais:Country, AlchemyAPI:Sport /
OpenCalais:SportsGame, and AlchemyAPI:HealthCondition / OpenCalais:MedicalCondition can clearly be
determined a equivalence relationship between them. In contrast, even when other classes belong to different
ontologies and they have similar individuals; they are not in an equivalence relationship. For example,
the classes AlchemyAPI:Organization / OpenCalais:Company or AlchemyAPI:Person / OpenCalais:Holiday
which have similar individuals but they are not equivalent.
The complete Lonely Planet corpus was evaluated and the number of 21 duples equivalentClass(class1,
class2) were identified as correct by the human team, which gave a precision of 80.73%.
Table 5 shows the results of precision (%) for axiom extraction task. This evaluation was done for two
corpus: Tourism and Sport Event domain.
16 5. Conclusions
Axiom Tourism Sport EventinstanceOf 46.67 23.52subClassOf 70.37 64.44
disjointWith 83.80 85.00equivalentClass 92.00 93.33
Table 5: Results of axiom extraction task
5 Conclusions
The main goal in this research is to obtain a method for ontology learning from textual resources in english
about a specific domain. This implies several challenges:
� obtain representative concepts
� find hierarchical relationships
� achieve a higher level of expressiveness (axioms)
The axiomatic relation learning, which represents an important task in ontology learning, is a very
hard task. In this report, an approach to discover axiomatic relations from text was described. The
approach is based on identifying named entities as class instances and comparing their textual context to
establish axiomatic relations such as sameIndividualAs, differentFrom, instanceOf subClassOf, disjointWith
and equivalentClass. New technologies in NER tools based on Linked Data can be useful in the process
of extracting axioms. From the total of tested tools, AlchemyAPI shows the better performance in the
identification of instances, as a consequence, it was selected for our purposes in the hypothesis verification
on learning axioms.
According to the experiments, we observed that the identified instances that belong to a specific class
could be considered as the extensional definition of this class and which is then described by the named
entities associated with it. However, the method must take into account the fact that the incorrect
identification of instances can derive erroneous disjoint or equivalence relations. For example, other relations
such as subClassOf and partOf were learned instead as a disjointWith relation. Such is the case of
Organization/Company and Region/Country classes that meet a subClassOf relation. Also, a specific
object property between Person and HealthCondition (Person has HealthCondition) was wrongly derived
OL from text: Method for learning axioms 17
as a disjointness relation. In other case, AlchemyAPI:GeographicFeature / OpenCalais:NaturalFeature and
AlchemyAPI:Person / OpenCalais:Holiday classes that meet a subClassOf or disjointWith relation were
wrongly derived as equivalentClass relation.
Figure 6: Schedule of activities for doctoral studies
The main achieved activities were obtaining and implementating a technique to get axioms from texts.
The proposed approach was evaluated with two corpus: Tourism and SportEvent domain. According to the
schedule of activities (see Figure 6) these activities correspond to the third phase of our research work.
The state of the art about discovering axiomatic relations has been reported in one book chapter:
� Ana B. Rios-Alvarado, Ivan Lopez-Arevalo, and Victor Sosa-Sosa. “An overview on ontology learning
methods from textual resources towards the acquisition of axioms”, Innovative Ways of Knowledge
Representation and Management, Universidad de Medellın, Colombia, 2012.
Besides, the next phase in the doctoral studies considers:
� Adapt of method to extract relations as axioms
� Integrate the methods to obtain the ontology learning approach
� Write a thesis
� Submit a dissertation
Bibliography
[1] Philipp Cimiano. Ontology Learning and Population from Text: Algorithms, Evaluation and
Applications. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[2] Luis Del Vasto Terrientes, Antonio Moreno, and David Sanchez. Discovery of relation axioms from
the web. In Proceedings of the 4th international conference on Knowledge science, engineering and
management, KSEM’10, pages 222–233, Berlin, Heidelberg, 2010. Springer-Verlag.
[3] A. Maedche and S. Staab. Ontology learning for the semantic web. Intelligent Systems, IEEE,
16(2):72–79, mar-apr 2001.
[4] Robert Neches, Richard Fikes, Tim Finin, Tom Gruber, Ramesh Patil, Ted Senator, and William R.
Swartout. Enabling technology for knowledge sharing. AI Mag., 12:36–56, September 1991.
[5] David Sanchez Ruenes. Domain ontology learning from the web. PhD thesis, Universitat Politecnica
de Catalunya, 2007.
[6] Alexander Schutz and Paul Buitelaar. Relext: A tool for relation extraction from text in ontology
extension. In International Semantic Web Conference, pages 593–606, 2005.
[7] Mehrnoush Shamsfard and Ahmad Abdollahzadeh Barforoush. Learning ontologies from natural
language texts. Int. J. Hum.-Comput. Stud., 60:17–63, January 2004.
[8] Steffen Staab and Rudi Studer. Handbook on Ontologies. Springer Publishing Company, Incorporated,
2nd edition, 2009.
[9] Rudi Studer, V. Richard Benjamins, and Dieter Fensel. Knowledge engineering: principles and methods.
Data Knowl. Eng., 25:161–197, March 1998.
[10] F.M. Suchanek, G. Kasneci, and G. Weikum. Yago: A large ontology from wikipedia and wordnet.
Web Semantics: Science, Services and Agents on the World Wide Web, 6(3):203–217, 2008.
19
20 BIBLIOGRAPHY
[11] Johanna Volker, Pascal Hitzler, and Philipp Cimiano. Acquisition of owl dl axioms from lexical
resources. In Enrico Franconi, Michael Kifer, and Wolfgang May, editors, The Semantic Web: Research
and Applications, volume 4519 of Lecture Notes in Computer Science, pages 670–685. Springer Berlin
/ Heidelberg, 2007. 10.1007/978-3-540-72667-8 47.
[12] Johanna Volker and Sebastian Rudolph. Lexico-logical acquisition of owl - dl axioms. In Raoul Medina
and Sergei Obiedkov, editors, Formal Concept Analysis, volume 4933 of Lecture Notes in Computer
Science, pages 62–77. Springer Berlin / Heidelberg, 2008. 10.1007/978-3-540-78137-0 5.
[13] Johanna Volker, Denny Vrandecic, York Sure, and Andreas Hotho. Learning disjointness. In
Proceedings of the 4th European conference on The Semantic Web: Research and Applications,
ESWC ’07, pages 175–189, Berlin, Heidelberg, 2007. Springer-Verlag.
[14] W. Wang, P.M. Barnaghi, and A. Bargiela. Probabilistic topic models for learning terminological
ontologies. IEEE Transactions on Knowledge and Data Engineering, 2009.
[15] Sung-Shun Weng, Hsine-Jen Tsai, Shang-Chia Liu, and Cheng-Hsin Hsu. Ontology construction for
information classification. Expert Systems with Applications, 31(1):1–12, 2006.