Searching the Web
When you complete this module, you will be able to:
- recognize the organization of the Web.
- identify how search engines work.
- recognize predatory journals.
- create better search strategies for search engines.
Anyone can publish information on the Web. However, no one controls the content of
the information so there isn't a guarantee of its accuracy. It is a good idea to carefully
evaluate information you find on the Web (or anywhere) and to use good judgement when
using the Web for research. In Module 7, we will discuss how to evaluate information.
A URL (Uniform Resource Locator) is an address for a web page. Part of the URL is the domain name (e.g. peru.edu) that is unique to the host of the website. The top-level of the domain name (e.g. .edu) depends on who is hosting the website.
The most common are:
.edu = education (restricted)
.gov = U.S. government entities (restricted)
.mil = U.S. military entities (restricted)
.com = businesses (unrestricted)
.net = network infrastructures (unrestricted)
.org = organizations (unrestricted)
Use the image of an iceberg to visualize the levels.
The Surface Web (also called the Visible Web and Open Web) is sources available openly
to anyone. These are the contents that can be found using search engines such as
Google, Bing, and Yahoo. Only a small portion, 4% or less, of all the content on the
Web is part of the Surface. Like an iceberg the majority of the content is beneath
Strengths - current information, billions of webpages
Weaknesses - Not reviewed for inaccuracy or bias, not organized
The Deep Web (also called the Invisible Web and Hidden Web) is sources that cannot be indexed by search engines. Password-protected, unlinked content, dynamic content, and non-HTML/text content are a few of the reasons why a webpage may not be indexed. The library catalog and library databases are part of the Deep Web as well as your email account since they are password-protected.
Strengths - library pays for access to databases with full-text articles and eBooks, reviewed information, organized
Weaknesses - may not include current information, limited information for some topics
The Dark Web may also be called the Deep Web. This is the level of illegal information and activities.
Search engines are tools that help you find information on the surface web. They use algorithms (often called spiders or crawlers) that are programs used to compile a database of webpages and other accessible documents.
When you use a search engine, it scans its database to match your words against the
text retrieved from the webpages it has collected. It then ranks the most useful items according
to their relevancy. Multiple factors influence the ranking including:
- distribution of words and how frequent they are used on each webpage.
- link analysis which reveals if the webpage is an authority by the number of other webpages that link to it or if it is a hub with it linking to other webpages.
You may want to use more than one search engine or to use a metasearch engine, which will be explained in the next tab.
No search engine has indexed the entire surface web, and each one has its own unique database that you search.
By using more than one search engine or a metasearch engine, your searches will cover more of the surface web rather than just a portion of it.
The below list of search engines is not all-inclusive. There are many more and in various languages that you can use.
Metasearch engines search more than one search engine database at once.
- Dogpile - searches Google and Yahoo! with a single search
- Info.com - searches Web and social media
- WebCrawler - searches Google, Yahoo! and other search engines at once
- Yippy - research-oriented with suggestions on words, sources, sites, and publication dates to focus your search; no user tracking
Unlike library databases, a human isn't deciding and selecting what publications to search when you use Google Scholar. A spider is browsing the Web and indexing what it finds for you to search.
That would be fine except there are predatory journals.
Predatory journals charge publishing fees to authors and have no review process with an editorial board. The publishers mimic titles, layouts, designs, and websites of reputable journals to give an appearance of authenticity. They may even list academics and experts on their editorial board without permission of the individual.
The information in predatory journals since it isn't reviewed can be bias, inaccurate, and, simply, made up.
Evaluate the information to ensure it is reliable.
However, there are ways to improve your searches.
Think like the writers of the webpages you want to find. Unlike library resources, you want to use words that would appear on that webpage.
Remember that algorithms are indexing the Web. A librarian isn't adding subjects and other descriptors to a record to help you find the information.
Use nouns and unique words specific to your research.
Use multiple words.
For example, the topic "human morality in Star Wars" would have searches using morals, moral philosophy, ethics, movies, star wars ...
Use phrase searching.
Place quotes ("") around a string of words so the search engine searches for the phrase rather than each separate word.
For example, "Star Wars Epsiode VII"
Word order matters. Search engines pay attention to the order you enter words.
For example, Star Wars will retrieve different results than Wars Star.
Use a plus sign (+) in front of required words.
Search engines remove common words (e.g. an, and, the, I, where, what, how) from a
search, so placing a plus sign in front of a common word it knows to keep it.
For example, star-wars may +the force
Use a minus sign (-) in front of words and/or domains to exclude them.
For example, star-wars -trek -com (Results with trek and .com webpages will not be retrieved.)
Also check the advanced search features of the search engine to discover more searching
Learn more by viewing the below video "Web Search Strategies in Plain English" by Lee & Sachi LeFever on The Commoncraft Show.
recognize the organization of the Web.
identify how search engines work.
recognize predatory journals.
create better search strategies for search engines.
You are ready for Module 7 - Evaluating Information.