I finally decided on putting this on the web, so that our hard work in this project, once upon a time, doesn’t die out and comes handy to a random person that I will never meet.
To be very honest, the content of this post are mere copy-paste of some important portions of our utterly long report. Do ping me if you ever require the longer version and, with a good reason. 🙂
What’s all the fuss about?
The aim was to create an application that allowed user to query into an ontology using natural language in the form of text and generate required response(s). For this we built a Dialogue Manager that processed user’s queries and later responses in the form of text, generated an appropriate responses
What was the goal?
The goal of this project was to develop an interactive system where a user’s response in the form of text is processed, and an appropriate response was generated to facilitate user’s interaction in finding information from ontology. It involved searching through an ontology according to user’s query. It involved understanding user queries in the form of natural English language, using various tools to understand the semantic of the sentence, queried into the ontology for the necessary result, formed response and gave out response.
Why is this better?
Dialogue System in the past uses rule-based approach to generate response for user queries. But storing millions of data and comparing them is an inefficient way for a dialogue system. The text-based dialogue system that we propose makes use of VCARD ontology as Knowledge Base. User’s query is processed, semantic analysis is done and, a SPARQL query is generated which searches required information in the ontology and finally the response is fed back to the user. The use of dependency parser and ontology makes this dialogue system efficient and accurate.
Our Proposed Approach
Input from the user:
The application takes in input from the user in the form of texts. The text passed into the system is then sent to the dialogue manager for text preprocessing.
The first phase of Semantic Analysis is to change the words from Plural to Singular conversion using Java Inflector which uses REGEX conditions. The words sent to the inflector were tokenized using Java String Tokenizer.
Java String Tokenizer:
The string tokenizer class allows an application to break a string into tokens. The delimiters that were used were “ .,;?” to break the sentence into tokens and send them to inflector to change its form from plural to singular.
It changes plural form of words to singular using REGEX conditions.
The Stanford Named Entity Recognizer (NER) was used to find phrases contain, for instance, the names of persons, organizations. The names of a person if found was then stored in memory for future references. Here future reference means that if a pronoun was found in the next few instances then the pronoun will be replaced by the noun that was obtained from the NER. Another purpose of finding person’s name or organization was to query ontology using the name of a person. Since the ontology contains information about a person and to uniquely identify a node, we have used person’s name as an identifier.
Most sentences having the same semantic usually have similar dependencies so we took many cases and assumed conditions that match the function of a query.
Once example to clarify this is given below:
Query: find birthday of Amit Roushan.
root ROOT find
dobj find birthday
prep birthday of
nn Roushan Amit
pobj of Roushan
Query: find Amit Roushan’s birthday and nickname.
root ROOT find
nn Roushan Amit
poss birthday Roushan
possessive Roushan ‘s
dobj find birthday
cc birthday and
conj birthday nickname
The semantic of the sentence is to find Amit Roushan’s birthday and nickname. Using these kinds of dependencies, we have formed several conditions to match a functionality of user’s query. If any of the conditions match, then we assume that the sentence semantic is similar to the condition that we assumed. Then a query is formed in the background to fire into the ontology.
Stanford POS Tagger:
The main function of POS Tagger was to reads text and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc. The tag was then used by our system to replace pronouns with nouns to form sentence in the background for query processing. A pronoun such as ‘He or She’ was detected by the system and if there is a name of a person in the History then the pronoun was replaced by the name so that the sentence formed could be understood by the Dependency Parser in the next step. The replacing of pronouns was important because the system cannot process dependencies if there were pronouns.
Stanford Dependency Parser:
Here, Semantic Analysis will refer to finding out the meaning or the interpretation of the user query via the dependencies present between the words in the query. It is the most important step in the Dialogue System since it finds the dependencies of a sentences and finds what the user actually want to do.
The dialogue system makes use of various dependency relations like pobj, dobj and the system has various conditions to check for dependencies between words. After finding the semantic of a sentence, it forms queries and fires into the ontology to get the result.
For query formation, we have used Apache Jena to read the models of a RDF Graph.
We stored generic content of responses in a XML files. To parse the content of XML and get result , we used SAX Parser. Since SAX parser runs faster and is easier to learn than DOM parser because its API is really simple, we opted for SAX Parser to get result from the xml file.