As the largest interconnection of computers and computer networks, the World Wide Web makes information widely accessible, but information integrity and management remain key issues for individuals and firms using this platform. The Internet can provide a wealth of information, but the credibility and accuracy depend entirely on the source, and finding credible information can be time-consuming, requiring hours of sorting through largely irrelevant sites. These difficulties often arise because search engines -while widely used – are often not wisely used. For many people, arriving at the information desired, rather than at thousands of irrelevant hyperlinks, remains more an art form than a science. This article will provide some practical insight on using the Internet as a knowledge management tool by analyzing the features of different search engines.
How They Work: Spiders and Catalogues
Modern day search capabilities have essentially developed from the work of the major players of the early 1990s: Altavista, Excite, Yahoo, Webcrawler, Lycos, Open Text Index, and Infoseek (listed in order of their popularity). According to results from the Search Engine Strategy 2000, these companies and their relative popularity rankings have changed significantly over the decade. Among other things, considerable progress has been made to reduce the high signal-to-noise ratio of these search engines. But they are also more differentiated.
In the early days, all search engines basically did the same thing in slightly different ways. While some of these techniques have changed fundamentally, many are the essentially the same, only more sophisticated. Most search engines use “spiders” or “web robots” which traverse the Web’s hypertext structure, retrieving web pages, indexing them in a database, and then recursively retrieving the documents linked to those web-pages. Spiders are not an autonomous agent that can travel between sites and decide when to move and what to do. Instead, they simply visit sites by requesting documents from them. Spiders usually allow search engines and other databases to be updated automatically at regular intervals so that “dead” links in the databases can be detected and removed. They operate continually, often over months, and feature parallel retrieval.
In general, spiders start from an historical list of Uniform Resource Locators (URLs) which are popular (as measured by click-throughs) and that have many links. Most search engines and indexing services also allow users to submit URLs manually. These are then queued and visited by the spider. Additionally, other sources for URLs are sometimes used, such as scanning through Usenet postings and published mailing lists.
Internet catalogues, such as Yahoo, work differently. These systems are essentially searchable directories. They are designed in a tree structure allowing users to delve deeper into a particular category until they find the right topic. At the lowest level, the user receives a series of links relating to that topic. Catalogues usually don’t index the linked web pages but are indexes of specific topics that are then related to certain sites. One advantage of this structure is the ability to refine searches to include only those topics of interest. However, the disadvantages are that the search can be slow (depending on the number of branches) and imprecise if categories don’t match the ones requested. Some modern search engines employ both techniques.
Making the Most of Content and Feature
Most search services now have advanced search feature pages designed to give you more control over your search by using more complex queries, but these pages are often hard to find. Table 1 shows search engines with advanced search features, and Table 2 compares searching features for different engines. These pages provide less noise and greater accuracy if the practitioner knows how to use them. Furthermore, since the industry is constantly changing, checking appropriate sites (Table 3) for updates and reviews is the easiest way to keep current.
Search Engines with Advanced Features
|Search Engine||URL (Hyperlink)|
|AOL Search Options Page||adp|
|AltaVista Advanced Search|
|Excite Power Search|
|Infoseek Advanced Search|
|MSN Search Advanced|
|Snap Advanced Search|
|Yahoo Advanced Search Options|
Engines with News Searching Features
|Search Engine||Browse News||News-Only Searching||Free News Clipping: Saved Searches||Video/Audio Search|
Net Review and Other Sites for Search Engine Updates
|MSN Exchange Position Agent||PositionAgent by Link Exchange monitors website search engine rankings and reports all search engine listings for a selected URL and keyword for the top 10 search engines.|
|The PIPER Letter||Provides updates on the engines which allow users to search across all dimensions of the Web.|
|All-in-One Search Page||Provides over 500 of the best engines, indexes, directories and catalogs in one place.|
|Search Engine Watch||A site devoted to how search engines work, search engine news, search engine information and tips on using search engines|
|Worldwide Web Search Engines||Information on the continued evolution of search engines and its impact on ecommerce|
The best information management techniques will include grasping the different types of searches available, how to use search engine math, and the basis of net search language.
Optimizing Searches by Type
Based on the technical structures described earlier, major search engines have features which allow different types of searches. Understanding these mechanisms is important for finding and managing the information requested.
Related searches are the most common feature and are designed to allow the practitioner to narrow in on his or her search item. The related search feature is available in different places depending upon the search engine chosen. Altavista, AOL, Excite, Go(Infoseek), Hotbot, Snap and Yahoo all have related search features. For example, if you used the key word ‘cars’ in your search, these engines would provide a list of related terms in the ‘search box’. They might show, ‘car maintenance, automobiles, classics, clubs, etc.’
Results Clustering is a feature that prevents all of the top results from linking to just one site. Clustering allows only one-page per site to be represented in the top results. Therefore there is greater variety and a better probability of quickly finding what’s being queried. Major services offering clustering include Altavista, Go(Infoseek), Google, Goto, Hotbot, MSN and Northern Light.
Depending on how the search engine is set-up (spider or catalogue), you can either perform a search-within-a-search or go to a category and search within these broader areas. This capability will be evident on each page of the results if it is available. Also, some search engines will allow the user to sort by date or display pages only with a particular data range. In addition, several allow the user to change settings so that more than 10 results are displayed at a time.
Search Engine Math and Special Commands
Knowing where to go to optimize results based on search type is the first step. The next is to have a working knowledge of math and language features required for more search precision. This not only speeds up and narrows down the results but provides greater accuracy.
Practically all of the major search engines support search engine math. This is basically a very simple query language using math symbols to add (+), subtract (-) or multiply (” “) words or phrases. The minus and addition signs indicate the exclusion or inclusion of words. Quotation marks are usually used to designate a phrase, allowing that group of words to be searched as a complete phrase rather than separate words. Altavista is a little unusual in that it has automatic phrase detection, thus the user doesn’t need to use quotation marks. In some engines, such as Google, the practitioner can use the “+” symbol within quotation marks, allowing even greater specificity in search terms.
Snap, Lycos, MSN Search, Netscape Search and Yahoo are primarily directories of the catalogue variety. Each editor will list sites by category and give a short site description. Therefore, when you search a directory, you are first shown matching web sites. Then, you are shown matching web pages that come from search engine listings. Search engine math can still be used for directories even though they are different from the spider driven engines that crawl the web. The only difference will be that catalogue-type engines require precise wording and are less flexible. These mathematical methods for adding, subtracting and multiplying terms work for approximately 90% of search engine users. Furthermore, these capabilities can be enhanced using additional concepts beyond the math.
Search engines have a variety of ways to refine and control searches. Some have menu systems to assist, but most allow the use of special commands as part of the query. For example, “Match Any” will return pages that contain any of the search terms provided. In contrast, “Match All” or a series of “+” symbols, will provide only pages that match all of the search terms provided. Practically all engines support phrase and title searches as well as searches under particular hosts. This is a very powerful exclusionary method. For example, if someone was searching for information regarding e-commerce, but only wanted it from academic sources, the following format would exclude the items from the private or government sectors: “e-commerce”+host:edu.
The same technique for searching a particular host can also be used to exclude particular domains as well. For example, if information regarding the Mars landing was requested, but public opinion rather than NASA documents were sought, the following query would exclude all NASA sites: “mars landing” -site:nasa.gov
Using similar methodology, most engines offer the ability to search within the text of a URL. In fact, one of the most powerful features available is the ability to control which sites are included or excluded from a search and then searching within the specific sites discovered. Several of the major search engines offer new and additional commands that allow searching by media type or other queries bases. There are also desktop web searching utilities that allow you to send a query to multiple engines and sort results. These utilities are available for Windows platforms and allow the user to download actual web pages. Table 4 shows the most widely used desktop utilities available from the Internet.
Search Engine Desktop Utility Packages
|Search Utilities||URL (Hyperlinks)||Description|
|Alexa||Site discovery tool that works with your browser. Suggests interesting sites based on the ones you are visiting.|
|BullsEye||Includes the ability to spell check a query or display related words and homonyms. Also highlights the search terms on pages if they are downloaded for viewing.|
|Copernic||http://www.copernic.com||Offers powerful options in an easy-to-use interface, plus a great selection of specialty search options.|
|Infoseek Express||http://express/infoseek.com||Impressive package that pulls results from all major search engines.|
|Mata Hari||http://www.thewebtools.com||This program is designed so that the user can learn one set of power search commands that the software will translate for each search service.|
|WebFerret||http://www.ferretsoft.com/||Performs metasearches via an extremely simple interface; free for use with more features on the paid version.|
|EchoSearch||http://www.iconovex.com/products/||A metasearch tool for data management as well.|
The best knowledge management practice is to regularly explore the advanced search forms and search assistant pages provided by individual engines. For business practitioners focusing on e-commerce, regularly updating internet skills is a critical component of remaining competitive. This includes constantly finding new and better ways to position a firm in the techno-savvy, cyber-economy of 21st Century America.
Knowledge Management and Application to the Firm
The key to managing information and knowledge on the Internet is understanding and effectively using search engines. As firms increasingly embrace Internet business, and begin building a presence on the World Wide Web, search engines will play an increasingly important role. Not only do they serve a critical purpose in knowledge management and information access but they also serve an essential purpose in marketing. Search engines direct traffic on the Internet. Therefore, in addition to direct advertising and listing the site, it is essential to design the site so that search engines will easily detect it.
In this area, specializations are quickly emerging, such as how to create meta tags and doorway pages that can enhance search engine retrieval. In addition, understanding the basic elements of smart submission to the major spider driven engines as well as the human-powered services such as Yahoo can make a critical difference in listing and traffic. Finally, the savvy netropreneur must be knowledgeable in the use of search engines to access information and understand the growing importance of using search engines to enhance competitive positions in Net-based business.