Hinweis zum Urheberrecht | Allgemeine Informationen | FAQ
Beim Zitieren dieses Dokumentes beziehen Sie sich bitte immer auf folgende URN: urn:nbn:de:hbz:5n-38938


Mathematisch-Naturwissenschaftliche Fakultät - Jahrgang 2015


Titel Semantic Interpretation of User Queries for Question Answering on Interlinked Data
Autor Saeedeh Shekarpour
Publikationsform Dissertation
Abstract The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, di erent datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across di erent datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. The contributions of this work are as follows:
1. A novel approach for determining the most suitable resources for a user-supplied query from di erent datasets (disambiguation approach). We employed a Hidden Markov Model, whose parameters were bootstrapped with di erent distribution functions.
2. A novel method for constructing federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph, which ultimately renders a corresponding SPARQL query.
3. Regarding the problem of vocabulary mismatch, our contribution is divided into two parts, First, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Data. We evaluate the e ectiveness of each feature individually as well as their combinations, employing Support Vector Machines and Decision Trees. Second, we propose a novel method for automatic query expansion, which employs a Hidden Markov Model to obtain the optimal tuples of derived words.
4. We provide two benchmarks for two di erent tasks to the community of question answering systems. The first one is used for the task of question answering on interlinked datasets (i.e. federated queries over Linked Data). The second one is used for the vocabulary mismatch task.
We evaluate the accuracy of our approach using measures like mean reciprocal rank, precision, recall, and F-measure on three interlinked life-science datasets as well as DBpedia. The results of our accuracy evaluation demonstrate the e ectiveness of our approach. Moreover, we study the runtime of our approach in its sequential as well as parallel implementations and draw conclusions on the scalability of our approach on Linked Data.
Inhaltsverzeichnis pdf-Dokument Hier können Sie den Adobe Acrobat Reader downloaden
Komplette Version pdf-Dokument (5 MB) Hier können Sie den Adobe Acrobat Reader downloaden
© Universitäts- und Landesbibliothek Bonn | Veröffentlicht: 29.01.2015