Chapter 14: Time Series and Predictive Analytics

Time series models have been the basis for the study of a behavior or metrics over a period of time. In decisions that involve a factor of uncertainty about the future, time series models have been found to be one of the most effective methods of forecasting. We often encounter different time series models in sales forecasting, weather forecasting, inventory studies, and so on. In the field of information science, Jeong and Kim (2010) reviewed a selected annotated bibliography of core books in order to conduct a time series analysis.

Often the first step in a time series analysis is to plot the data and observe any patterns that might occur over time by using a plot. This helps us to determine whether there appears to be a long-term upward or downward movement or whether the series seems to vary around a horizontal line over time.

The following scenario will show how the plot illustrates the direction of data. In this example, we follow the number of books borrowed from the library from 1982 to 2014 as found in the public library log in New York City. Table 14.1.

YearNumbers of books borrowed from the library

You will import the file to R by the following command:
>library_borrowing <-read.csv(“C:/Table14.1″, header=T, dec=”,”, sep=”;”)
Note that paths use forward slashes “/” instead of backslashes
>plot(library_borrowing[, 5], type=”1″, lowd=2, col=”red”, xlab=Years”, ylab=”Number of books”, main=”Number of books borrowed from the library” xl)

The result:

The number of books borrowed from the library.






As we can see, this illustration provides an indication and direction of the decline of book borrowing from the library. This trend started in 2000 and stopped in 2009.

The goal of the time series can be classified in five steps:
1. Descriptive: identifying patterns in correlated data—trends and seasonal variations
2. Explanation: understanding and modeling the data
3. Forecasting: prediction of short-term trends from previous patterns
4. Intervention analysis: discovering if a single event changes the time series
5. Quality control: deviations of a specified size indicate a problem

Another aspect of time series is smoothing, a common technique. Smoothing always involves some form of local averaging of data such that the nonsystematic components (variations away from main trend) of individual observations cancel each other out. The most common technique is moving average smoothing, which replaces each element of the series by either the simple or weighted average of n surrounding elements, where n is the width of the smoothing “window.” All of these methods will filter out the noise and convert the data into a smooth curve that is relatively unbiased by outliers.

Among the methods for fitting a straight line to a series of data, Least Square Method is the one used most frequently. The equation of a straight line is Y = a + bx where x is the time period, say year, and Y is the value of the item measured against time, a is the Y intercept, and b is the coefficient of x indicating slope of the trend line. In order to find a and b, the following two equations are calculated:

ΣY = ax + b Σx
ΣXY = a Σx + bΣx2

The code in R:
Based on the library’s example above, we selected the first five numbers:
>time <- c(1986, 1987, 1988, 1989, 1999)
>number <- c(270, 285, 295, 315, 330)
(Intercept) 1933.293
Number 0.189

In order to calculate the smoothing of Least Square data, we often use a linear operation. This process converts a single time series {yt} into another time series {xt} by a linear operation. The formula for smoothing using Moving Average:

Smoothing by a Moving Average
Smoothing by a Moving Average





where the analyst chooses the value of q for smoothing. Since averaging is a smoothing process, we can see why the moving average smooths out the noise in a time series. As a result, we will find there exist many variations of the filter described here.

Next, Chapter 15, Visualization Display
Previous, Chapter 13, Analysis of Variances and Chi-Square Test



A Primer for Using Open Source R Software for Accessibility and Visualization