Module 6

Searching the Web

This module covers how to search for information on the Web.

When you complete this module, you will be able to:

- recognize the organization of the Web.
- identify how search engines work.
- recognize predatory journals.
- create better search strategies for search engines.

 

Anyone can publish information on the Web. However, no one controls the content of the information so there isn't a guarantee of its accuracy. It is a good idea to carefully evaluate information you find on the Web (or anywhere) and to use good judgement when using the Web for research. In Module 7, we will discuss how to evaluate information.

A URL (Uniform Resource Locator) is an address for a web page. Part of the URL is the domain name (e.g. peru.edu) that is unique to the host of the website. The top-level of the domain name (e.g. .edu) depends on who is hosting the website.

The most common are:
.edu = education (restricted)
.gov = U.S. government entities (restricted)
.mil = U.S. military entities (restricted)
.com = businesses (unrestricted)
.net = network infrastructures (unrestricted)
.org = organizations (unrestricted)

The Web has three levels to it: Surface, Deep, and Dark.

Use the image of an iceberg to visualize the levels.

surface portion of an iceberg

The Surface Web (also called the Visible Web and Open Web) is sources available openly to anyone.  These are the contents that can be found using search engines such as Google, Bing, and Yahoo. Only a small portion, 4% or less, of all the content on the Web is part of the Surface. Like an iceberg the majority of the content is beneath the surface.

Strengths - current information, billions of webpages

Weaknesses - Not reviewed for inaccuracy or bias, not organized

deep portion of an iceberg

The Deep Web (also called the Invisible Web and Hidden Web) is sources that cannot be indexed by search engines. Password-protected, unlinked content, dynamic content, and non-HTML/text content are a few of the reasons why a webpage may not be indexed. The library catalog and library databases are part of the Deep Web as well as your email account since they are password-protected.

Strengths - library pays for access to databases with full-text articles and eBooks, reviewed information, organized

Weaknesses - may not include current information, limited information for some topics

dark portion of an iceberg
The Dark Web may also be called the Deep Web. This is the level of illegal information and activities.

Search engines are tools that help you find information on the surface web. They use algorithms (often called spiders or crawlers) that are programs used to compile a database of webpages and other accessible documents.

When you use a search engine, it scans its database to match your words against the text retrieved from the webpages it has collected. It then ranks the most useful items according to their relevancy. Multiple factors influence the ranking including:

- distribution of words and how frequent they are used on each webpage.
- link analysis which reveals if the webpage is an authority by the number of other webpages that link to it or if it is a hub with it linking to other webpages.

Research Tip

You may want to use more than one search engine or to use a metasearch engine, which will be explained in the next tab.

No search engine has indexed the entire surface web, and each one has its own unique database that you search.

By using more than one search engine or a metasearch engine, your searches will cover more of the surface web rather than just a portion of it.

There are different types of search engines. General, metasearch, subject specific, and information specific are a few of them.

The below list of search engines is not all-inclusive. There are many more and in various languages that you can use.

  • Bing - search engine provided by Microsoft
  • DuckDuckGo - no user tracking; your privacy is protected
  • Google - one of the most popular search engines
  • Qwant - no user tracking; your privacy is protected
  • Yahoo! - directory based that allows you to browse subjects

Metasearch engines search more than one search engine database at once.

  • Dogpile - searches Google and Yahoo! with a single search
  • Info.com - searches Web and social media
  • WebCrawler - searches Google, Yahoo! and other search engines at once
  • Yippy - research-oriented with suggestions on words, sources, sites, and publication dates to focus your search; no user tracking

  • ERIC - education search engine

  • Bing Images - provides filters such as license (e.g. find only public domain images) to help find images

Google Scholar and other search engines that focus on scholarly research are great to weed out commercial sites. However, be careful that the scholarly source is credible.

Unlike library databases, a human isn't deciding and selecting what publications to search when you use Google Scholar. A spider is browsing the Web and indexing what it finds for you to search.

That would be fine except there are predatory journals.

Predatory journals charge publishing fees to authors and have no review process with an editorial board. The publishers mimic titles, layouts, designs, and websites of reputable journals to give an appearance of authenticity. They may even list academics and experts on their editorial board without permission of the individual.

The information in predatory journals since it isn't reviewed can be bias, inaccurate, and, simply, made up.

Evaluate the information to ensure it is reliable.

Searching the Web is different than searching a library catalog and library databases because of the way search engines work and how they created their database.

However, there are ways to improve your searches.

Think like the writers of the webpages you want to find. Unlike library resources, you want to use words that would appear on that webpage.

Remember that algorithms are indexing the Web. A librarian isn't adding subjects and other descriptors to a record to help you find the information.

Be specific.

Use nouns and unique words specific to your research.

Use multiple words.

For example, the topic "human morality in Star Wars" would have searches using morals, moral philosophy, ethics, movies, star wars ...

Use phrase searching.

Place quotes ("") around a string of words so the search engine searches for the phrase rather than each separate word.

For example, "Star Wars Epsiode VII"

Word order matters. Search engines pay attention to the order you enter words.

For example, Star Wars will retrieve different results than Wars Star.

Use a plus sign (+) in front of required words.

Search engines remove common words (e.g. an, and, the, I, where, what, how) from a search, so placing a plus sign in front of a common word it knows to keep it.

For example, star-wars may +the force

Use a minus sign (-) in front of words and/or domains to exclude them.

For example, star-wars -trek -com (Results with trek and .com webpages will not be retrieved.)

Also check the advanced search features of the search engine to discover more searching strategies.

Learn more by viewing the below video "Web Search Strategies in Plain English" by Lee & Sachi LeFever on The Commoncraft Show.

You have completed Module 6. You should now:

green check mark recognize the organization of the Web.

green check mark identify how search engines work.

green check mark recognize predatory journals.

green check mark create better search strategies for search engines.

You are ready for Module 7 - Evaluating Information.