|
Search Engine
A search engine is an information
retrieval system designed to help find information stored on a
computer system. Search engines help to minimize the time required
to find information and the amount of information which must be
consulted, akin to other techniques for managing information
overload.
The most popular form of a search engine is a Web search engine
which searches for information on the public World Wide Web. Other
kinds of search engines include enterprise search engines, which
search on intranets, personal search engines, and mobile search
engines.
Querying
Search engines provide an interface to a group of items that enables
users to specify criteria about an item of interest and have the
engine find the matching items within the group.
In the most popular form of search, items are documents or web pages
and the criteria are words or concepts that the documents may
contain.
There are several varieties of syntax in which a search engine user
can express a query. Some methods are formalized and require a
strict, logical and algebraic syntax. Other approaches are less
strict and allow for a less defined query. One form of a
less-restricted query syntax is referred to as Natural Language
Search, which is a term typically used to describe web search
engines that apply natural language processing of some form. For
example, instead of searching for one or two words, a query could
consist of an English sentence or paragraph. A natural language
search engine will then parse the query into words and evaluate
searches for these words. This places less burden on the search
engine user to formulate a specific query using restrictive, and
sometimes difficult to learn, syntax. A second definition of natural
language search engines reflects how the search engine performs
indexing, unrelated to the query syntax. This requires a semantic
understanding of the query in order to disambiguate the text.
Traditional search engines tend to use a non-linguistic model of
language and the hypothesis is that NLS will provide better results
- that is to say, results that more accurately and efficiently
support a user's need.
Ranking
A Boolean search for an item within a
group of items will either return the exact matching item or
nothing. This is a rather orthodox search method where the equality
between the desired item and the actual item must be exact. In
application, it is sometimes far more beneficial and useful to
incorporate a more lax measure of similarity between the desired
item (s) and the items that exist in the group being searched.
For example, instead of finding only the exact book in a library, a
library search engine may return a list of 'similar' books, with the
exact book listed first.
The list of items that meet the criteria specified by the query are
typically sorted, or ranked, in some regard so as to place the most
'relevant' items first. Placing the most relevant items first
reduces the time required by users to determine whether one or more
of the resulting items are sufficiently similar to the query. It has
become common knowledge through the use of Web search engines that
the further down the list of matching items you browse, the less
relevant the items become.
Indexing
To provide a set of matching items quickly, a search engine will
typically collect information, or metadata, about the group of items
under consideration beforehand. For example, a library search engine
may determine the author of each book automatically and add the
author name to a description of each book. Users can then search for
books by the author's name. Other metadata in this example might
include the book title, the number of pages in the book, the date it
was published, and so forth.
The metadata collected about each item is typically stored on a
computer in the form of an index. The index typically requires a
smaller amount of computer storage and provides a way for the search
engine to calculate the relevance, or similarity, between the query
and the set of items.
|