Chapter 3: Data (Types and Collection Methods) - Statistics for LIS with Open Source R

Data collection procedures are an important step. It is important to keep in mind both what our research question is about and how we will analyze the data we collect. However, before gathering information we need to identify the source of data and, based on that knowledge, decide the methodology we will employ to collect the data. This chapter will discuss some of the different methodologies behind data collection.

The first questions you as a researcher need to address are: What is the source of your data? How will you collect your data?
Researchers have historically used four types of data collection sources:
(1) data distributed by an organization or individual,
(2) an experiment that they designed,
(3) surveys, or
(4) observation
However, with the growth of the Internet, we often encounter another resource, where the researcher collects data from multiple sources and tries to make sense out of them by analyzing the correlations between them. This type of source is called:
(5) a data correlation.

In Chapter 1, we introduced the idea of the sample as part of a population being studied. Rather than taking a complete census of the whole population, statistical sampling procedures focus on collecting a small representative group of the larger population. We will discuss two types of sampling methods: probability samples and non-probability samples.

The advantages of sampling are evident: feasibility of the research, lower costs, economy of time, and better organization of the work. We discussed two types of sampling: probability sampling, with and without replacement, and non-probability sampling. Each type of sampling addresses specific needs and has certain advantages and disadvantages.

The sample begins with a frame. A frame is a list of all the items that compose the population. The frame is the source material or device from which a sample is drawn.
The sampling frame in this case is different from the population. It excludes members of the population, by its design, who may indeed be potential users of the proposed service.
A simple random sample is one in which every individual or item from a frame has the same equal chance of selection as every other individual or item from that frame. We look at these terminologies in Chapters 7, 8 and 9.

Next, Chapter 4, How to Run R?
Previous, Chapter 2, Research Design

A Primer for Using Open Source R Software for Accessibility and Visualization