CSC 379 SUM2008:Week 2, Group 3: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Search Engines==
==History==


Search engines fill an important role in our lives, helping us locate information within a wide array of multimedia.  However many ethical considerations are involved in their operation; the ordering of rankings, the range of content indexed (or not), and how advertisements are incorporated, are a few.  Broadly examine the ethics of search engine operation and use.
Search engines fill an important role in our lives, helping us locate information within a wide array of multimedia.  However many ethical considerations are involved in their operation; the ordering of rankings, the range of content indexed (or not), and how advertisements are incorporated, are a few.  Broadly examine the ethics of search engine operation and use.


* [http://www.i-r-i-e.net/inhalt/003/003_hinman.pdf http://www.i-r-i-e.net/inhalt/003/003_hinman.pdf]
===Evolution of the search engine===
* [http://www.i-r-i-e.net/inhalt/003/003_editorial.pdf http://www.i-r-i-e.net/inhalt/003/003_editorial.pdf]
 
* [http://www.boingboing.net/2008/04/04/usfunded-health-sear.html http://www.boingboing.net/2008/04/04/usfunded-health-sear.html]
[http://en.wikipedia.org/wiki/Tim_Berners-Lee Tim Berners-Lee] came up with the first way of searching the web which was holding a complete list of webservers on the CERN webserver. It was hard to keep the list up to date as the number of the webservers was growing.
* [http://www.scu.edu/ethics/publications/submitted/search-engine-panel.html http://www.scu.edu/ethics/publications/submitted/search-engine-panel.html]
 
[http://en.wikipedia.org/wiki/Alan_Emtage Alan Emtage], from McGill University in Montreal developed [http://en.wikipedia.org/wiki/Archie_search_engine “Archie”] in 1990. The program would create a database from a list of files that was downloaded by the program from the files that were on public anonymous FTP. The content of these pages were not indexed and there for the database was searchable only by the name of the files.
 
Mark McCahill from the University of Minnesota developed “Gopher” in 1991 which was a listing very similar to Archie but the two program “Veronica” and “Jughead” had the ability to search the listing more efficiently.
 
In 1993 the web crawler started to coming out. It started with Wandex, which was developed by Mathew Gray at MIT. Aliweb and JumpStation  came out later. The search was still limited to the title of the pages.
 
WebCrawler was the first search engine that could search the whole text came out in 1994. Later the same year Lycos came out which was developed at Carnegie Mellon University. Ever since the web crawlers became the standard for all of the search engines.
 
Google search engine started around 2000 and they became pioneer in web searching. They used a new method called PageRank that was iterative algorithm. This algorithm ranks the given webpage based on the ranking of the pages and sites that link to it.
 
During 2002 and 2003 Yahoo got hold of Inktomi and Overture and provide a search engine with the combination of these two technologies. Microsoft developed its own web crawler that is called “msnbot” in 2004 for the MSN search engine which was founded in 1998. Later a number of search engines that are specific for a particular country became popular. Some of these are [http://en.wikipedia.org/wiki/Baidu Baidu] in China and [http://en.wikipedia.org/wiki/Guruji.com Guruji] in India.


==Function==
==Function==
Line 34: Line 45:
===Algorithms===
===Algorithms===


The algorithms that were initially used were fairly basic and objective. They were based such criterion as number of visits to a page or the number of pages that link to a certain page. The success and popularity of Google is attributed to their more subjective approach. Google implemented an algorithm that is meant to find what users are looking for instead of offering the most popular results, which is not always the same thing. The secrecy of their algorithm is what gives Google their competitive edge.
The algorithms that were initially used were fairly basic and objective. They were based such criterion as number of visits to a page or the number of pages that link to a certain page. The success and popularity of Google is attributed to their more subjective approach. Google implemented an algorithm that is meant to find what users are looking for instead of offering the most popular results, which is not always the same thing. With varying algorithms, each search engine will produce different results. The secrecy of their algorithm is what gives Google their competitive edge.


===Advertisements===
===Advertisements===
Including advertisements on a search engine is nothing new. For years, these ads had no affect on search results. Advertising practices include:
Including advertisements on a search engine is nothing new. For years, these ads had no affect on search results. Ads have been given three classifications:
*Paid placement is advertising that is outside of the editorial content of the search results, sometimes above or below the editorial content, or in a sidebar.
*Paid placement is advertising that is outside of the editorial content of the search results, sometimes above or below the editorial content, or in a sidebar.
*Paid inclusion is advertising within the editorial content of the search results, though it does not necessarily guarantee a certain position within the results.
*Paid inclusion is advertising within the editorial content of the search results, though it does not necessarily guarantee a certain position within the results.
*Paid submission is the practice of requiring payment to speed up the processing of a listing, though it rarely guarantees that a site will in fact be listed by the search engine.
*Paid submission is the practice of requiring payment to speed up the processing of a listing, though it rarely guarantees that a site will in fact be listed by the search engine.
The inclusion of advertisements in search results has given rise to many fair use and comparative advertising issues.
Consumers groups have objected only to the first two practices, claiming that ads should clearly be identified as ads. The inclusion of advertisements in search results has given rise to many fair use and comparative advertising issues. Another conflict lies in the use or query of common words as opposed to trademarks. Should advertisers be allowed to include ads in broader searches? Can one advertiser prevent another trademark from being included in search results for that trademark? One instance of such a case was when Estee Lauder sued iBeauty and [http://www.excite.com/ Excite] when a search for 'Clinique' would result in an ad and site for iBeauty. iBeauty eventually agreed to the terms set by Estee Lauder. Critics to the way Estee Lauder reacted would argue that the selling the ad space is their right. Revenue generated from paid listings allows them to provide unpaid editorial listings to searchers for free.


===Politics===
===Politics===
Nowadays search engines are playing the role of gatekeeper to the ocean of information on the web. Because of the same reason they are playing a critical role in society both ethically and politically. It is the search engines that decide what is relevant and what is not when one does a search. They are the ones who decide the order and the ranking of the websites on the search result.
Google has claimed that they have a clear separation between the editorial contents and the advertising content the same way that newspaper and magazines have editorial staffs and advertising staffs.


===Suppressed Content===
There were rumors that some companies would charge those who wanted their website to have a different ranking, or even sometimes they would do hand manipulation based on their own assumption that what should be first. Because of the very reason many search engine companies now claim there are algorithms that has been developed by their developers, then debugged, and now being used so that everything is done algorithmically and no one can buy their ranking, and moreover they will not reveal their algorithm since there is a chance that it would get abused.
 
The copyright holders of the Digital Millennium Copyright Act have the ability to file a complaint letter when some materials that are belonging to them are being displayed on other websites. These letters can be found on the [http://www.chillingeffects.org/ Chilling Effects website]. Now should search engines still display these pages on their search result? The way that Google handles this law is that they remove the pages that complaints has been filed against them and list them at the bottom of the page and direct the users to the Chilling Effects website where users can find all of the complaints that has been filed.


==External Links==
==External Links==
[http://www.infotoday.com/searcher/may01/liddy.htm How a Search Engine Works]
[http://www.infotoday.com/searcher/may01/liddy.htm How a Search Engine Works]
[http://realtytimes.com/rtpages/20010807_searchads.htm Search Engine Ads Stir Controversy]
[http://realtytimes.com/rtpages/20010807_searchads.htm Search Engine Ads Stir Controversy]
[http://news.cnet.com/2100-1017-244217.html iBeauty and Estee Lauder make up]
[http://searchenginewatch.com/showPage.html?page=2167941 Buying Your Way In: Search Engine Advertising Chart]
[http://www.i-r-i-e.net/inhalt/003/003_hinman.pdf Esse est indicato in Google: Ethical and Political Issues in Search Engines]
[http://www.i-r-i-e.net/inhalt/003/003_editorial.pdf The Ethics of Search Engines]
[http://www.boingboing.net/2008/04/04/usfunded-health-sear.html US-funded health search-engine censors all results for searches on "abortion"]
[http://www.scu.edu/ethics/publications/submitted/search-engine-panel.html  The Ethics and Politics of Search Engines]
[http://en.wikipedia.org/wiki/Search_engine Search Engines on Wikipedia]

Latest revision as of 18:09, 19 July 2008

History

Search engines fill an important role in our lives, helping us locate information within a wide array of multimedia. However many ethical considerations are involved in their operation; the ordering of rankings, the range of content indexed (or not), and how advertisements are incorporated, are a few. Broadly examine the ethics of search engine operation and use.

Evolution of the search engine

Tim Berners-Lee came up with the first way of searching the web which was holding a complete list of webservers on the CERN webserver. It was hard to keep the list up to date as the number of the webservers was growing.

Alan Emtage, from McGill University in Montreal developed “Archie” in 1990. The program would create a database from a list of files that was downloaded by the program from the files that were on public anonymous FTP. The content of these pages were not indexed and there for the database was searchable only by the name of the files.

Mark McCahill from the University of Minnesota developed “Gopher” in 1991 which was a listing very similar to Archie but the two program “Veronica” and “Jughead” had the ability to search the listing more efficiently.

In 1993 the web crawler started to coming out. It started with Wandex, which was developed by Mathew Gray at MIT. Aliweb and JumpStation came out later. The search was still limited to the title of the pages.

WebCrawler was the first search engine that could search the whole text came out in 1994. Later the same year Lycos came out which was developed at Carnegie Mellon University. Ever since the web crawlers became the standard for all of the search engines.

Google search engine started around 2000 and they became pioneer in web searching. They used a new method called PageRank that was iterative algorithm. This algorithm ranks the given webpage based on the ranking of the pages and sites that link to it.

During 2002 and 2003 Yahoo got hold of Inktomi and Overture and provide a search engine with the combination of these two technologies. Microsoft developed its own web crawler that is called “msnbot” in 2004 for the MSN search engine which was founded in 1998. Later a number of search engines that are specific for a particular country became popular. Some of these are Baidu in China and Guruji in India.

Function

A search engine is an information retrieval system that match queries with an index it creates. Search engines consist of four essential modules:

  1. Document Processor - this prepares, processes, and inputs the documents, pages, or sites that users search against.
  2. Query Processor - this consist of seven possible steps
    • Tokenizing usually by breaking inputs into strings separated by white space.
    • Parsing operators like reserved punctuation or reserved terms in specialized format (e.g., AND, OR). This may also include boolean, adjacency, or proximity operators.
    • Stop list and stemming might contain words from commonly occurring querying phrases. Engines may drop these two steps.
    • Creating the query depends on the method used to do the matching.
    • Query expansion employs synonyms to optimize the search results.
    • Query term weighting is used to judge the importance of each term in the query.
  3. Search and Matching Function - this is based on which theoretical model of information retrieval underlies the system's design philosophy.
  4. Ranking Capability - this is done several ways
    • Term frequency
    • Location of terms
    • Link analysis
    • Popularity
    • Date of Publication
    • Length
    • Proximity of query terms
    • Proper nouns

Issues

Algorithms

The algorithms that were initially used were fairly basic and objective. They were based such criterion as number of visits to a page or the number of pages that link to a certain page. The success and popularity of Google is attributed to their more subjective approach. Google implemented an algorithm that is meant to find what users are looking for instead of offering the most popular results, which is not always the same thing. With varying algorithms, each search engine will produce different results. The secrecy of their algorithm is what gives Google their competitive edge.

Advertisements

Including advertisements on a search engine is nothing new. For years, these ads had no affect on search results. Ads have been given three classifications:

  • Paid placement is advertising that is outside of the editorial content of the search results, sometimes above or below the editorial content, or in a sidebar.
  • Paid inclusion is advertising within the editorial content of the search results, though it does not necessarily guarantee a certain position within the results.
  • Paid submission is the practice of requiring payment to speed up the processing of a listing, though it rarely guarantees that a site will in fact be listed by the search engine.

Consumers groups have objected only to the first two practices, claiming that ads should clearly be identified as ads. The inclusion of advertisements in search results has given rise to many fair use and comparative advertising issues. Another conflict lies in the use or query of common words as opposed to trademarks. Should advertisers be allowed to include ads in broader searches? Can one advertiser prevent another trademark from being included in search results for that trademark? One instance of such a case was when Estee Lauder sued iBeauty and Excite when a search for 'Clinique' would result in an ad and site for iBeauty. iBeauty eventually agreed to the terms set by Estee Lauder. Critics to the way Estee Lauder reacted would argue that the selling the ad space is their right. Revenue generated from paid listings allows them to provide unpaid editorial listings to searchers for free.

Politics

Nowadays search engines are playing the role of gatekeeper to the ocean of information on the web. Because of the same reason they are playing a critical role in society both ethically and politically. It is the search engines that decide what is relevant and what is not when one does a search. They are the ones who decide the order and the ranking of the websites on the search result.

Google has claimed that they have a clear separation between the editorial contents and the advertising content the same way that newspaper and magazines have editorial staffs and advertising staffs.

There were rumors that some companies would charge those who wanted their website to have a different ranking, or even sometimes they would do hand manipulation based on their own assumption that what should be first. Because of the very reason many search engine companies now claim there are algorithms that has been developed by their developers, then debugged, and now being used so that everything is done algorithmically and no one can buy their ranking, and moreover they will not reveal their algorithm since there is a chance that it would get abused.

The copyright holders of the Digital Millennium Copyright Act have the ability to file a complaint letter when some materials that are belonging to them are being displayed on other websites. These letters can be found on the Chilling Effects website. Now should search engines still display these pages on their search result? The way that Google handles this law is that they remove the pages that complaints has been filed against them and list them at the bottom of the page and direct the users to the Chilling Effects website where users can find all of the complaints that has been filed.

External Links

How a Search Engine Works

Search Engine Ads Stir Controversy

iBeauty and Estee Lauder make up

Buying Your Way In: Search Engine Advertising Chart

Esse est indicato in Google: Ethical and Political Issues in Search Engines

The Ethics of Search Engines

US-funded health search-engine censors all results for searches on "abortion"

The Ethics and Politics of Search Engines

Search Engines on Wikipedia