-
Notifications
You must be signed in to change notification settings - Fork 5
Overview of main components
The textual question answering system has four main components:question analysis, document retrieval, document analysis and answer selection.
-
Given a natural language question posed by a user, the first step is to analyze the question itself. The question analysis component may include a morph-syntactic analysis of the question. The question is also classified to determine what it is asking for. Depending on the morph-syntactic analysis and the class of the question, a retrieval query is formulated which is posed to the retrieval component.
-
The retrieval component is generally a standard document retrieval system which identifies documents that contain terms from a given query. The retrieval component returns a set or ranked list of documents that are further analyzed by the document analysis component.
-
The document analysis component takes as input documents that are likely to contain an answer to the original question, together with a specification of what types of phrases should count as correct answers. This specification is generated by the question analysis component. This component extracts a number of candidate answers which are sent to the answer selection component.
-
The answer selection component selects the phrase that is most likely to be a correct answer from a number of phrases of the appropriate type, as specified by the question analysis component. It returns the final answer or a ranked list of answers to the user.
The main components are displayed in the pictures below.
The main function of the question analysis component is to understand the purpose of the question, i.e., the kind of information the question is asking for. To identify the purpose of a question, the question is analyzed in a number of ways.
- First, the question is assigned a class, or a number of classes, such as agent, aka, date. Assigning question classes can be accomplished in a variety of ways. One of the simplest, and yet quite effective ways is to apply pattern matching to the question to identify its type.
- In addition to classifying the question, the question analysis component has to formulate the query that is posed to the retrieval component. There are many ways to formulate the query, depending on the functionality of the retrieval engine. Here we simply assume bag-of-words queries, where a query is an unordered list of single terms.
The function of the document retrieval component is not to find actual answers to the question, but to identify documents that are likely to contain an answer. This process of pre-selecting documents is also known as pre-fetching.
Depending on the retrieval engine that is actually used, the retrieval component returns either an unordered set of documents that are likely to contain an answer, or a ranked list of documents, where the documents are ranked with respect to their likelihood of containing an answer.
The document analysis component searches through the documents returned by the retrieval component to identify phrases that are of the appropriate type, as specified by the question analysis component. To this end, a named-entity recognizer is used to assign semantic types to phrases in the top documents.
The document analysis component passes on the list of candidate answers to the answer selection component, together with the way in which each candidate answers was linked to the question, i.e., whether it was due to analyzing the syntactic structure, application of pattern matching, lexical chaining or proximity constraints.
The final component selects the phrase that is most likely the answer to the original question from the candidate answer phrases coming form the document analysis component.