NOTE: This document is now ancient history but I leave it here for historical interest. Some of the advice is still actually useful - but I would recommend Google over Altavista and Yahoo for any general-purpose search. Many of the other links in the article are now outdated.
How to Settle Bets on the
How to Settle Bets on the Internet
Jim Bumgardner, 10/23/98, revised 5/17/00
If youíre reading this, you probably donít need it Ė because youíve already demonstrated your searching ability by finding my website. However, you probably have some newbie/loser friends who can never seem to find what theyíre looking for on the Internet. They ask you to help and you find it in ten seconds. This document is for those friends. Cut out this paragraph and present it to them as a gift. Theyíll be grateful.
Iíve seen a few tutorials on the net that are intended to help teach how to effectively search for topics on the Internet, but most of them are focused on the mechanics of constructing Boolean search strings for Yahoo and AltaVista. Fascinating, but not particularly helpful, in my opinion.
This particular tutorial covers how to choose the right search engine based on what question you are trying to answer. It also provides a few useful tips for making up queries that are more likely to produce the hits you want. Iím not going to tell you much about the specific syntax and mechanics of each search engine, however Iíve provided links to the relevant sites at the end of this article. Each site contains copious help information.
Accuracy versus Overkill
While writing this tutorial, I perused the web, looking for similar tutorials. I found that many of them suggest writing down every possible word relating to your search topic and constructing your search queries from these. Yahoo suggests using both catalog-style and index-style search engines. In addition, there are even search engines (called meta-search engines) that will search a whole list of search engines for you.
In my opinion these broad strategies will return too many results. You'll have to wade through a mountain of trash in order to find the nugget of useful information you want.
I guess it depends on what youíre trying to accomplish. If you want to read everything there is on the web that is remotely related to Classical Music (and believe me, thereís a lot of it), then the above strategies are for you. However, if, like me, you seek the answer to a specific question, say, to settle a bet, then you need a little more precision in your life.
What I intend to demonstrate is how to find only the pages that provide the answers youíre looking for (or at least to come very close to that kind of accuracy). This is the searching equivalent of a "smart bomb" as opposed to a nuke.
There are basically two things to learn: How to choose an appropriate search engine, and what to search for once you get there.
The two main search engines covered are Yahoo (a catalog) and AltaVista (an index). In addition, some other sites, such as Deja-News, the Internet Movie Database and Amazon.com are covered as examples of special interest sites which can answer more focused questions.
Since I originally wrote this document, I've begun using a newer search engine quite a bit called Google (at Google.com). Google is now covered as well, as it is an excellent all-around search engine.
Iím well aware that there are zillions of general-purpose search engines out there besides Yahoo and AltaVista and Google (Lycos, Hotbot, Webcrawler, Ask Jeeves, Goto.com, Excite and Infoseek to name a few). Most of the major ones are suitable replacements for AltaVista Ė that is, they are full-text indices of the web. I donít care which search engine you use, just use the right kind of engine for the job.
Using the right kind of search engine
Most commonly, to answer a question on the Internet, youíll use a search engine. There are specific kinds of questions (movies and books being good examples) where it might be more useful to go straight to a website which contains information about these specific subjects. A few of these kinds of sites are discussed below, but first weíll talk about general-purpose search engines.
After youíve been searching the Internet for a while, finding an appropriate search engine will become second nature. But when youíre new, itís difficult to know where to begin Ė you may not even know what a search engine is.
A search engine (also called a portal) is a website which contains information (typically links to other websites, but not necessarily) and allows you to search that information in various ways.
Here are links to most of the search engines and websites discussed in depth in this tutorial:
Yahoo and AltaVista Ė General Purpose Engines
There are many other general purpose search engines beside Yahoo and AltaVista, however these two will be discussed here as representational of two general types: the Catalog (of which Yahoo is a good example) and the Index (of which AltaVista is a good example).
Yahoo is a catalog of websites. Each website is listed with a short description. Unlike search engines like AltaVista and Lycos, Yahoo is meticulously organized in a hierarchical manner. It contains brief descriptive information about each site. Yahoo lists sites that have been explicitly submitted to Yahoo, and not all the submissions are accepted. People (generally the operators of the sites) fill out the submission forms because they want the general public to visit. This means that many sites (for example, the syllabus of a college class) are not listed. Because of these policies the content in Yahoo is of a fairly high quality, even though there is less of it.
Note: Since that last paragraph was written, Yahoo has added full-text (Alta-Vista-like) searching as well - so if your search fails to generate any hits in Yahoo's catalog, it then does a full-text search
There are two ways to search Yahoo: You can type in a search query, like "romantic literature", as you can with most search engines, or you can click your way through, from the home page through the hierarchical categories that lead to your topic. In the case of Romantic Literature, you would click on the following in order:
Arts & Humanities
Periods and Movements
I generally prefer to type in a search query rather than click on the menus, for a few reasons. Itís faster. It produces more hits, because some related pages are stored in multiple categories. Finally, I find that browsing through the categories looking for what I want can be frustrating because I canít always predict how the topic was filed. You'll find that sometimes links which are seemingly in the same category are filed under different areas (probably by different people).
AltaVista is an index of websites. The AltaVista (like Lycos, Webcrawler and others) people use a program called a "web spider" to actively comb the web searching for sites to add to their database. They also accept submissions, like Yahoo. As a result, they list far more web pages than Yahoo does. Rather than providing only short descriptions and keywords, AltaVista indexes the full text of most of the sites that they list, so you can search for any word that appears on the page, such as "circumnavigate".
This sounds good in theory. If you search for "Jane Austen" you are going to get far more hits than you would if you searched Yahoo. On the other hand, with Yahoo you are going to find "hub" sites like the Jane Austen Society faster. In a search of "romantic literature" that I performed in late October 1998, Yahoo produced 58 hits, while AltaVista produced 1929. In this case, less is probably more because the sites that Yahoo produced are more likely to be focused on the topic at hand.
AltaVista has a feature that allows you to ask it questions in plain English, like "What is the capital of France?" Iíve rarely found this feature to be useful because the kinds of questions I have almost always are not in their database of questions. You may get lucky however - the capital of France certainly is in their database.
Note: If you like asking questions in "plain english" than check out the "Ask Jeeves" site - that is it's specialty. Again, I personally don't find it very effective.
Yahoo has advantages and disadvantages over AltaVista. Each serves different purposes. Although for some searches, you can use either one with equal success; for most searches, it is generally better to use one over the other.
In general, Yahoo is better when you are looking for general information (or sites focused on a particular topic), while AltaVista is better when you are looking for more specific information. Why? Because if you search for a general topic on AltaVista you will get far too many hits. On the other hand, if you search for the answer to a specific question on Yahoo, you probably wonít be directed to the exact page that answers your question (because Yahoo only contains general descriptions of the websites it catalogs).
An appropriate search query for Yahoo would be something like "Romantic Literature", which will produce a list of websites which are related to the topic of romantic literature. The same search on AltaVista will produce a lot of websites which arenít closely related to the subject, but happen to contain the words "romantic" or "literature" (nonetheless, AltaVista does a pretty good job of putting the better sites at the top of the list). Also, since AltaVista catalogs individual web pages from the same web site, a lot of the hits you get in AltaVista are duplicates.
AltaVista is good when you are trying to answer a more specific question, like "How many novels did Jane Austen write?" This is because you can include words in your search that occur on the web page, but donít necessarily occur in the description of the web site. So, for example, you can include the word "wrote" and
"novels" which wonít necessarily occur in brief descriptions of Jane Austen related websites. In addition, AltaVista indexes multiple pages from each website, so itís more likely to take you directly to the exact page that answers your question, rather than just the home page of the Jane Austen Society.
AltaVista is also excellent when you need to look up a specific quote or lyric. Simply put the text you are searching for inside quotation marks ("like so") and it will be able to find it. I've used AltaVista a few times, for example, when looking up bits of Shakespeare.
To provide an example of the contrast between the two sites, Yahoo would be appropriate if you wanted to find some websites which are devoted to the rocker Ozzy Osbourne, but AltaVista would be more appropriate if you wanted to know exactly what kinds of live animals Ozzy has been rumored to have eaten on stage.
Google Ė Great All-around searching
Since I originally wrote this article, I have begun using Google (at Google.com) as my primary search engine. Why? Because it combines some of the best features of Yahoo and Altavista, providing breadth AND accuracy.
Like AltaVista, Google uses a full-text index of the sites it lists. It also keeps a cached-copy of each of these sites, so you can look at older version of pages which may no longer be available at the actual website.
Google has a unique feature which improves it's accuracy. It examines the links of the pages it finds, and puts the pages which are being "pointed at" the most at the top of the list. As a result, the first pages listed are more likely to be "central" or "key" sites for the topic you are searching. For example, if you look up "sendmail" (a unix utility), the first page listed is the "sendmail home page".
Finally, Google (at the moment) is fast and the page isn't cluttered with a lot of ads and stuff.
Google is so good that I tend to use it for almost all my searches these days, that I would have otherwise used Yahoo and Altavista for. However, it doesn't entirely replace these services. First of all, Google can't search for text inside of quoted strings (like quotations). I still use AltaVista for that (if Google supported quoted string searches, I'd probably stop using AltaVista entirely). I still use Yahoo for basic category searches, but sometimes I'll use Google first.
Now weíll discuss some more specific information sources that can be incredibly helpful for certain kinds of questions.
Deja-News Ė When you need opinions and advice
Probably the single greatest source of information I use on the web after Yahoo and AltaVista is Deja-News Ė yet I find that a lot of people are unfamiliar with this great resource.
Deja-News archives and indexes all the traffic that occurs on Usenet, which contain thousands of Internet discussion groups, such as rec.food.preserving and alt.music.marilyn-manson. For almost every interest (broad or specific) under the sun, there is a discussion group in Usenet, and its messages are archived by Deja-News.
Although I rarely use an actual news reader to read these groups (I used to subscribe to a lot of them, but find I no longer have the time), I frequently use Deja News to search them when I am trying to answer specific questions.
Discussion groups contain exactly that: discussion. Therefore, Deja-News is particularly useful when you seek the kinds of information that happen in discussions: e.g. opinions, advice, recipes, directions, gossip, pop-culture and rumors.
Deja-News contains information that people wonít necessarily publish on a web page, but donít mind discussing on Usenet, which has a more ephemeral nature. Therefore its more likely to contain information that people donít consider to be important, or information that is controversial.
Among the kinds of questions Iíve found answers for on Deja-News:
"Whatís a good recipe for low fat Thai curry?"
"Whatís a good windows screen capture program?"
As a programmer, Iíve found Deja-News particularly useful for complicated technical questions about programming. I rarely have to ask these questions (by posting a message), because I almost always find that someone else has already posted the question (numerous times) and it has already been answered.
Not only are there a lot of newbies out their asking stupid questions, there are also a lot of knowledgeable (and not-so-knowledgeable) people who get a kick out of helping others.
Keep in mind that Deja-news contains topical information, and that it normally only searches "recent" messages. For the kinds of searches I do (e.g. settling bets), I almost always find it more beneficial to search the "old" messages Ė you can accomplish this by using a pulldown menu on the results screen. The "Power Search" feature is also handy, allowing you to narrow down the search to a specific period of time, and to specific newsgroups.
Finally, Deja-News has a feature in which you can read all the messages in a particular "thread". This is particularly useful, because you may have only hit upon one of the messages in the thread, but others in the thread actually answer your question.
NOTE: Unfortunately, since this article was first written, DejaNews has "expanded" their service by adding a lot of less-than-useful features (that is, they have turned it into a general-purpose portal) which makes it harder for newbies to find the usenet search area.
The good part is here: http://www.deja.com/usenet/
News Search Ė When you need the facts, just the facts
Most websites that carry news (in the form of current events, not Usenet), such as CNN.COM or Yahoo, offer the ability to search the news articles of the past few days by keyword. On Yahoo, for example, if you click on one of todayís headlines, a search box is provided that allows you to search for additional news items. CNN.COM has a search box right on their home page, which is even more convenient.
This is a great way to answer questions such as "Is the rumor true that Michael Jackson recently announced he is going to cryogenically freeze his body?" If the search on "Michael Jackson" only shows a concert in Budapest, itís fairly likely that heís going to be cremated and launched into space, like everybody else.
The Internet Movie Database
The Internet movie database is a wonderful resource that can be used to answer any of the following kinds of questions:
"Who is that character actor who played the angel in ĎItís a Wonderful Life?"
"What other movies did I see him in?"
"How many movies did India make in 1989?"
"Who were the Ďadditional second assistant directorsí for the 1998 movie ĎBelovedí?"
If you love movies, and havenít been to this site before, plan on spending an hour there just browsing.
The Library of Congress and Amazon.com
The Library of Congress contains an online catalog of all their books, which is very useful for getting exhaustive lists of books in and out of print by particular authors, among other things.
Amazon.com (www.amazon.com) can be even more useful, particularly if you are only interested in information about books in print. Among other things, Amazon.com provides descriptions of the books they sell, as well as user-provided critiques (both positive and negative).
Since this article was written, lots of other book-sellers have cropped up, some with better prices. Although I don't generally *buy* from Amazon.com anymore (I use the comparison shopping sites to get the best price), I still use Amazon for the book reviews.
At this point you may be wondering how youíre ever going to remember the URL for the Library of Congress. Donít bother Ė itís quite easy to find it from Yahoo Ė exactly the kind of thing that Yahoo is good for!
Did some poor kid die from mixing Pop-Rocks with Pepsi? This is a good place to find out, as well as the answers to other equally entertaining / horrifying rumors (most of them false). The Folklorists who maintain these pages have a vested interest in rabidly disproving any unfounded rumor that snakes its way across the net (or any other medium).
There are a number of "meta-search engines" which promise to simplify your searching by searching a whole collection of search engines for you. I found a list of them here by searching for the phrase "multiple search engines" on AltaVista. MetaCrawler is a good example.
I personally donít use these kinds of services because I prefer to apply more intelligence to my searches Ė getting narrower results that fit my search criteria more closely. These engines also tend to be a bit slower, since they can only be as fast as the slowest service they are searching.
However, if youíre searching a really obscure topic and coming up empty, then something like MetaCrawler is probably a good last resort. If you donít mind the sluggishness, you may even prefer to use MetaCrawler or another meta-search engine as a substitute for an index like AltaVista.
Exercise: Choosing a site:
For each of the following questions, indicate which of the above search engines would be most appropriate to use first in attempting to answer the question.
A) Where can I learn some general information about Jane Austen?
B) How many novels did Jane Austen write?
C) Are there any currently active Jane Austen sites out there?
D) What was that Jane Austen movie that Kate Winslet was in?
E) How many Dover Jane Austen books are in print?
Use the right strategy for the database
This next section is about what kind of strategy you use for constructing your search queries, once youíve chosen the correct search engine to use.
First letís learn by looking at some counterexamples. Here are some searches that wonít work as expected:
Mary, an interior decorator, is interested in finding an online gallery that sells oil paintings by women artists in the contemporary southwestern style. So she hikes on over to AltaVista and enters the following search string: "art".
Mary is guilty of underspecificity. A search of "art" on AltaVista is going to return 23,281,130hits! Things wonít be improved much by searching for art on Yahoo. Mary is in the right place Ė she just needs to add some more specifics to her search string. Words like "southwest", "gallery" and "paintings" will eventually lead her to the site she's looking for.
Tom, a psychology student, is interested in finding published papers on gender-identity issues in adolescent children. He goes to DejaNews and looks up "sex".
In this case, Tom is guilty of underspecificity AND is probably using the wrong search engine. He is searching a relatively obscure topic and may find it useful to search with something like MetaCrawler.
Tom, seeing the error of his ways goes to AltaVista and looks up "+gender-identity +adolescent +published"
Now Tom is probably guilty of overspecificity. By requiring the word "published", he is eliminating a number of pages that are relevant to his topic.
Jack, an undergraduate, is cheating on a take-home test, and needs to know the answer to the question "How many novels did Mark Twain write?" So he goes to AltaVista and enters the following for his search string: "How many novels did Mark Twain write?"
Jack is guilty of computer worship. He is assuming the computer has a far greater ability to understand natural language than it actually has. Although AltaVista can answer questions like "What is the capital of France?" it has trouble when the questions are more specific (as the useful ones often are). Jackís search will produce some useful hits, but there are far faster ways to answer the question. Unfortunately, Jack is too stupid (from cheating at tests) to use the more effective search string "+"Mark Twain" +wrote +novels".
NOTE: Since I wrote the above paragraph, Ask Jeeves has appeared, which is a little better at answering English Language queries. It's still not great though. I asked it the above Mark Twain question, and it wasn't able to answer it, although it was able to point me to a list of Twain books in print.
Next weíll discuss some effective search strategies that do work:
Find unique pairings
This strategy is primarily useful when searching on a full-text index, like AltaVista or Deja-News, and you need to narrow down the search.
What you want to do is isolate two or more phrases associated with your search subject which arenít normally associated together, like "Eggplant" and "Beer".
For example, recently I needed the full text of the "To be or not to be" speech from Hamlet, so I could write a parody of it. I knew that if I searched for "Shakespeare Hamlet" I could find it fast enough, but I found it much faster by searching for the following:
hamlet "to be or not" "slings and arrows"
Youíll note that I included "slings and arrows" (one of the more unique bits of the speech that happened to have stuck in my mind) so that I wouldnít merely get hits on pages which only quoted the first line. This is a good example of finding unique pairings in this case, the pairing of "to be or not to be" and "slings and arrows" proved quite effective. In fact, the word "hamlet" was quite unnecessary.
Forecast your target
This strategy is also primarily useful when searching on a full-text index, like AltaVista or Deja-News.
What you do is forecast the target by imagining how the text on your target sites might read.
For example, letís say you want to answer the question "What year did Ferdinand Magellanís ships finish circumnavigating the globe?" You imagine that a page that contains the answer might read "In the year XXXX, Ferdinand Magellanís ships completed circumnavigating the globe", so you use the search string "year Ferdinand Magellan completed".
One of the nice things about forecasting is that it causes you to use specific word tenses ("completed" instead of "complete") which help narrow down the search to pages that address your specific question. Forecasting also helps prevent searches from being over or under-specific.
Of course, this technique is only as good as your forecasting ability. If you find yourself on the edge of your seat at 6:25 wanting to know how "Gilliganís Island" is going to end, then you may find this difficult.
Create your own personalized search site
Most people have their web browser set to automatically go to a particular website (such as Yahoo) when they fire it up. I donít do this; instead I have my web browser configured to load in a web page from my own hard disk. This web page contains fill-in forms for each of the major search engines I use (Yahoo, Google, AltaVista, Deja-News, Internet Movie Database, etc.) as well as links to a few of my favorite websites. There are a few advantages to using this system.
Perhaps youíre wondering how to do it. Iíve provided the source code for my own personalized search-site here. You can modify it to display your name or your favorite background picture, as well as your favorite bookmarks.
To use it, save the html source into a file on your hard disk and call it something like "default.html". Then use the "Open->Browse" or "Open Page->Choose File" command on the File menu of your web browser and load it in Ė you should see the forms appear on the screen. Then do the following to make it the default home page for your web browser (so it automatically loads it each time you start it).
If using Internet Explorer, select "Internet Options" on the View menu. Then in the General panel, hit the "Use Current" button in the "Home Page" area. Then close the options window by hitting the OK button.
If using Netscape Navigator, use the "Preferences" command on the Edit Menu. Then in the "Navigator" section, hit the "Use Current Page" button in the "Home Page" area. Then close the preferences window by hitting the OK button.
Use Boolean Logic Effectively
Different search engines have different ways of specifying how words are to be combined in a search. When you search for John Smith do you mean you want sites that contain either the word John or the word Smith (but not necessarily both)? Sites that contain both words? Sites that contain both words in sequence?
When searching Yahoo, this isnít (or shouldnít be) much of a problem because youíre only searching for general topics. But in a full-text index like AltaVista or Deja-News you should know how the search engine is interpreting your queries. Here are some sample searches in AltaVista and Deja-News with interpretations:
All the major search engines have extensive help facilities Ė look at these if you want more information.
As a final aside, I should mention that I rarely need to do complex Boolean searches, because I find the other strategies usually get me what Iím look for. For example, instead of using the NOT keyword to eliminated unwanted hits, I can usually add a keyword to isolate the sites I want.
Real Life Examples
Iím now going to provide a couple of real-life examples of searches Iíve performed recently Ė showing the reasoning behind the decisions I made.
Example 1: Correcting a rumor
In late October, 98, I received an email from a co-worker which contained the following quote attached to his signature at the bottom of the letter.
"Public media should not contain explicit or implied descriptions of sex acts. Our society should be purged of the perverts who provide the media with pornographic material while pretending it has some redeeming social value under the public's 'right to know.' Pornography is pornography, regardless of the source." Kenneth Starr, 1987, interview with Dianne Sawyer - 60 minutes
Upon reading this, I recalled reading in a mailing list that this quote was actually a hoax. I began wondering if indeed the quote was a hoax, and if so, how I could succinctly and definitively show this guy that it was a hoax, so he would stop perpetuating it.
Which search engine to use? Since I had already seen evidence (from the mailing list) that people were talking about this quote, and because it was topical, I decided that the first place I should go was Deja-News, rather than AltaVista or another search engine. This turned out to be a good choice.
What to search for? I knew that a lot of people would be talking about Kenneth Starr on Deja News, and of course a search for "pornography" was going to turn up scads of unrelated documents. The key here was "Dianne Sawyer" Ė a name not often used in conjunction with Kenneth Starr. So I searched for the following:
"Kenneth Starr" "Dianne Sawyer"
Youíll note that I used quotes to turn the names into complete phrases, otherwise I might have gotten additional unrelated hits (352 instead of the 26 I actually got). For example, if you enter "John Robinson" without the quotes, youíll get every document that contains either John or Robinson, which is not what you want.
So, as I said, I got 26 messages. I looked at the first message and saw that it included the following phrase:
"First, this supposed quotation is a hoax"
Which, although inconclusive, at least gave me some hints that I might be barking up the right tree. Now I wanted to find an authoritative web site that might contain some information that disproved the hoax. I knew that this kind of information is commonly collected in the various "urban legends" web sites, so, using a bit of target forecasting, I modified my search as follows:
"Kenneth Starr" "Dianne Sawyer" Urban
This narrowed it down to 12 messages. The first message included the following:
>Urban Legend. See:http://www.snopes.com/spoons/noose/starr.htm >
And a quick glance at the website (a site I had visited before, it turned out) showed that indeed, the Starr rumor had been investigated and shown to be false by those folklore enthusiasts. In my next email to my co-worker, I included the following at the end of my message:
Btw, check this out: :http://www.snopes.com/spoons/noose/starr.htm
I later found him checking out the other interesting reading material at the snopes site, to his enjoyment.
Example 2: Falsifying personal information
I donít like receiving junk e-mails (commonly known as "spam"). So when I fill out forms on the Internet, unless absolutely necessary, I tend to falsify the personal information. I like to be creative about how I do this.
Recently, when filling out a marketing survey, I needed a fake phone number. I decided to use the number of a local psychiatric hospital. This left me with the problem, "How can I find the number of a local psychiatric hospital without getting up from my chair?" A search of the web seemed like a natural choice.
Now I knew that on Yahoo, and in other areas, there were copious white pages where I could look up a local hospital, if I just knew the name of it. Unfortunately, I didnít know the names of any of the local psychiatric hospitals.
So instead, I went to AltaVista and employed the principle of unique pairings. My search string was as follows:
+818 +psychiatric +hospital
The very first hit was a list of hospitals of which four were psychiatric hospitals in the 818 area code.
Appendix: Search Engine List
Here are links to most of the search engines and websites discussed in depth in this tutorial: