Chapter 1 Introduction - Statistics for LIS with Open Source R

Statistics has often been divided into two areas of investigation: theoretical and applied. Over time, however, a third classification was developed that includes two types of statistical approaches: descriptive and inferential.

Descriptive statistics often deals with organizing, displaying, and describing the entire dataset by using the summary central tendency, frequencies, variation, and shape.

Inferential statistics, in contrast, consists of the use of sample results to help make decisions or predictions about the population.

Throughout the book, we will discuss the subject of sample vs. population. The term population refers to all members of a group that the researcher has an interest in studying. But, what happens when the population is very large and the researcher has neither the access nor the resources to measure every single item in this huge collection?

A sample is a subset of people, items, or events from a larger population that you collect and analyze to make inferences. To represent the population well, a sample should be randomly collected and adequately large.

In addition, we will also discuss three important terms: data, measurement, and variable. Data, in this book, is defined as the physical representation of a numerical collection of facts. The term measurement is defined as the way we determine the size, length, or amount of data. The term variable is defined as the attribute that describes a person, place, thing, or idea.

Long before computers, societies used to record their data on paper in order to facilitate their analysis and help them to make better decisions. The change from paper tablet to computer spreadsheet came in 1978, when Daniel Bricklin, a Harvard Business School student, came up with the idea for an interactive visible calculator. His product was called VisiCalc and it was the first computer based spreadsheet application. During the 1980s, Bill Gates, the creator of Microsoft Windows, introduced Microsoft Excel. Now we are no longer limited to using Excel spreadsheets. This book will present an alternative to work with data based on open source R. R is a powerful language and environment for statistical computing and graphics. It is also an open source program, and its popularity reflects a shift in the type of software used inside corporations. According to Ashlee Vance from the New York Times (2009) in “Data Analysts Captivated by R’s Power” it is free, and it also brings together statisticians, engineers and scientists who are constantly improving R or writing variations for specific tasks. Packages written for R add advanced algorithms, colored and textured graphs and mining techniques to dig deeper into databases.

To install R on your own desktop, click here or you can run R using R-Fiddle console.

Next, Chapter 2, Research Design

A Primer for Using Open Source R Software for Accessibility and Visualization