CSC 379 SUM2008:Week 2, Group 3: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 11: Line 11:


A search engine is an information retrieval system that match queries with an index it creates. Search engines consist of four essential modules:
A search engine is an information retrieval system that match queries with an index it creates. Search engines consist of four essential modules:
:* Document Processor - this prepares, processes, and inputs the documents, pages, or sites that users search against.
:# Document Processor - this prepares, processes, and inputs the documents, pages, or sites that users search against.
:* Query processor - this consist of seven possible steps
:# Query processor - this consist of seven possible steps
:**Tokenizing usually by breaking inputs into strings separated by white space.
:#*Tokenizing usually by breaking inputs into strings separated by white space.
:**Parsing operators like reserved punctuation or reserved terms in specialized format (e.g., AND, OR). This may also include boolean, adjacency, or proximity operators.
:#*Parsing operators like reserved punctuation or reserved terms in specialized format (e.g., AND, OR). This may also include boolean, adjacency, or proximity operators.
:**Stop list and stemming might contain words from commonly occurring querying phrases. Engines may drop these two steps.
:#*Stop list and stemming might contain words from commonly occurring querying phrases. Engines may drop these two steps.
:**Creating the query depends on the method used to do the matching.
:#*Creating the query depends on the method used to do the matching.
:**Query expansion employs synonyms to optimize the search results.
:#*Query expansion employs synonyms to optimize the search results.
:**Query term weighting is used to judge the importance of each term in the query.
:#*Query term weighting is used to judge the importance of each term in the query.
:* Search and matching function - this is based on which theoretical model of information retrieval underlies the system's design philosophy.
:# Search and matching function - this is based on which theoretical model of information retrieval underlies the system's design philosophy.
:* Ranking capability - this is done many ways
:# Ranking capability - this is done several ways
:**Term frequency
:#*Term frequency
:**Location of terms
:#*Location of terms
:**Link analysis
:#*Link analysis
:**Popularity
:#*Popularity
:**Date of Publication
:#*Date of Publication
:**Length
:#*Length
:**Proximity of query terms
:#*Proximity of query terms
:**Proper nouns
:#*Proper nouns


==History==
==History==

Revision as of 19:46, 18 July 2008

Search Engines

Search engines fill an important role in our lives, helping us locate information within a wide array of multimedia. However many ethical considerations are involved in their operation; the ordering of rankings, the range of content indexed (or not), and how advertisements are incorporated, are a few. Broadly examine the ethics of search engine operation and use.

Function

A search engine is an information retrieval system that match queries with an index it creates. Search engines consist of four essential modules:

  1. Document Processor - this prepares, processes, and inputs the documents, pages, or sites that users search against.
  2. Query processor - this consist of seven possible steps
    • Tokenizing usually by breaking inputs into strings separated by white space.
    • Parsing operators like reserved punctuation or reserved terms in specialized format (e.g., AND, OR). This may also include boolean, adjacency, or proximity operators.
    • Stop list and stemming might contain words from commonly occurring querying phrases. Engines may drop these two steps.
    • Creating the query depends on the method used to do the matching.
    • Query expansion employs synonyms to optimize the search results.
    • Query term weighting is used to judge the importance of each term in the query.
  3. Search and matching function - this is based on which theoretical model of information retrieval underlies the system's design philosophy.
  4. Ranking capability - this is done several ways
    • Term frequency
    • Location of terms
    • Link analysis
    • Popularity
    • Date of Publication
    • Length
    • Proximity of query terms
    • Proper nouns

History

Issues

Algorithms

Politics