A web search engine is a software system
that is designed to search for information on the World Wide Web.
The search results are generally presented in a line of results often referred
to as search engine results pages (SERPs). The
information may be a mix of web pages,
images, and other types of files. Some search engines also look for information
available in databases or open directories.
Unlike web directories,
which are maintained only by human editors, search engines also maintain real-time
information by running an algorithm on a web crawler.
The very first tool used for searching on the Internet was Archie. The name stands for
"archive" without the "v". It was created in 1990 by Deutsch,
computer science students at a University in Montreal. The program downloaded the directory listings of
all the files located on public anonymous FTP (File
Transfer Protocol) sites, creating a searchable database of file names; however, Archie
did not index the contents of these sites since the amount of data was so
limited it could be readily searched manually.
The rise of “Gopher” led to two new search
programs, Veronica and Jughead. Like Archie, they
searched the file names and titles stored in Gopher index systems. Veronica (Very
Easy Rodent-Oriented Net-wide Index to Computerized
Archives) provided a keyword search of most Gopher menu titles in the
entire Gopher listings. Jughead (Jonzy's Universal Gopher Hierarchy
Excavation And Display) was a tool for obtaining menu
information from specific Gopher servers. While the name of the search engine
"Archie" was not a reference to the Archie comic book series, "Veronica" and "Jughead" are characters in
the series, thus referencing their predecessor.
In the summer of 1993, no search engine existed for the web, but
numerous specialized catalogues were maintained by hand. Oscar Nierstrasz formed the basis for W3Catalog, the web's first primitive search engine, released
on September 2, 1993.
Soon after, many search engines appeared and became very popular. These
included Magellan, Excite, Northern
Light, and AltaVista. Yahoo! was among the most popular ways for people to find
web pages of interest, but its search function operated on its web directory, rather than its full-text
copies of web pages. Information seekers could also look for information in the
directory instead of doing a search based on “keywords”.
Around 2000, Google's
search engine became
extremely popular. The company got better results for many searches with an
innovation called PageRank. This was an algorithm that ranks web pages based on the number and PageRank
of other web sites and pages that link there, on the premise that good or
desirable pages are linked to more than others. Google also maintained a
minimalist interface to its search engine.
By 2000, Yahoo! was providing search
services based on Inktomi's search engine. Yahoo! acquired Inktomi in 2002, and
Overture (which owned AlltheWeb and AltaVista) in 2003. Yahoo! switched to
Google's search engine until 2004, when it launched its own search engine based
on the combined technologies of its acquisitions.
Microsoft first launched MSN Search in 1998 using search
results from Inktomi. In early 1999 the site began to display listings from Looksmart, blended with results from Inktomi. For a short
time in 1999, MSN Search used results from AltaVista were instead. In 2004, Microsoft began a transition to its own search technology,
powered by its own web crawler (called msnbot).
Microsoft's renamed search engine, Bing, was launched on June 1, 2009. On July 29, 2009,
Yahoo! and Microsoft finalized a deal in which Yahoo! Search
would be powered by Microsoft Bing technology.
These are Examples of Web Search Engines:
Bing.com
Google.com
Ask.com
AOLsearch.com
The three essential features of a web search engine are crawling,
indexing, and searching.
Web search engines work by storing information about many web pages,
which they retrieve from the HTML markup of the pages. These pages are retrieved by
a Web crawler (sometimes also known as a
spider) — an automated Web crawler which follows every link on the site. The
site owner can exclude specific pages by using robots.txt.
The search engine then analyzes the contents of each page to determine
how it should be indexed (for example, words can be
extracted from the titles, page content, or headings). Information about web
pages are stored in an index database for later use. A request from a user can
be a single word. The index helps find information relating to the request as
quickly as possible. Some search engines, such as Google, store all or part of the source page (referred to
as a cache) as well as information
about the web pages, whereas others, such as AltaVista, store every word of every page they find. This
cached page always holds the actual search text since it is the one that was
actually indexed, so it can be very useful when the content of the current page
has been updated and the search terms are no longer in it. Increased search
relevance makes these cached pages very useful as they may contain data that
may no longer be available elsewhere.
When a user looks for information in a search engine (typically by using
keywords), the engine examines its index and provides a listing of
best-matching web pages according to its criteria, usually with a short summary
containing the document's title and sometimes parts of the text. The index is
built from the information stored with the data and the method by which the
information is indexed.
From 2007 the Google.com search engine has allowed one
to search by date. The engine can also look for the words or phrases exactly as
entered. Some search engines provide an advanced feature called proximity
search,
which allows users to define the distance between keywords. As well, natural
language requests allow the user to type a question in the same form one would
ask it to a human. A site like this would be ask.com.
The usefulness of a search engine depends on the relevance of the result set it gives back. While there may be millions of
web pages that include a particular word or phrase, some pages may be more
relevant, popular, or authoritative than others. Most search engines employ
methods to rank the results to provide the
"best" results first. How a search engine decides which pages are the
best matches, and what order the results should be shown in, varies widely from
one engine to another.
The methods also change over time as Internet usage
changes and new techniques evolve. There are two main types of search engine
that have evolved: one is a system of keywords that humans have programmed extensively.
The other is a system that analyzes the texts it locates.
Most Web search engines are commercial ventures supported by advertising revenue and for this reason
some of them allow advertisers to have their listings ranked higher in search results for a
fee. Search engines that do not accept money for their search results make
money by running search
related ads alongside the regular search engine results. The search engines make
money every time someone clicks on one of these ads.
No comments:
Post a Comment