R How To: Read CSV, plot rate in second resolution

Assuming a simple but large set of data as such (1 or 2 second resolution for 8 hours is 14k-28k)

  1. how to read from a csv and create a plot where multiple rates are plotted against the y-axis and time is on the x-axis.
  2. how to summarize the x-axis in 15 minute or 30 minute intervals.
  3. how to filter the set so i can zoom into a specific section of the data
  4. How to calculate an average for the previous x values (i.e. 10 seconds) so that the very low values are effectively ignored in ValueB

As someone who's programmed in a number of languages over 30 years it seems ridiculously difficult to work with dates in R. The questions and answers I've seen in the posts seem very specific to solutions (i.e. become a master in this whole package) making it hard to apply elsewhere. What am I missing? This feels like it should be stupid simple but its not. I've seen bits and pieces of the answers for all of above but putting them all together doesn't seem to be workable.

are there key concepts and terms that I should be reading about?

time,ValueA,ValueB
3/12/2014 11:12:14,15222,3882
3/12/2014 11:12:16,5462,9832
3/12/2014 11:12:18,8432,12281
3/12/2014 11:12:20,15325,19928
3/12/2014 11:12:22,17458,29382
3/12/2014 11:12:24,6541,12
3/12/2014 11:12:26,8287,17822
3/12/2014 11:12:28,14278,504
3/12/2014 11:12:30,11854,848
3/12/2014 11:12:32,7495,17899
3/12/2014 11:12:34,6387,38822
3/12/2014 11:12:36,12354,7732
3/12/2014 11:12:38,15422,2003
3/12/2014 11:12:40,8452,2
3/12/2014 11:12:42,5845,18388

Answers


I give here a complete answer for all your questions. I agree that working with dates is not very easy ( In any language I think , time zones headack, dates conversion,..). You are definitely looking for the xts/zoo package. It is time series specialized package. very fast, efficient and well used. Of course you can do in base R but it is easier once you master the xts package.

Read an plot data:
library(xts)
## you reaplce text = by filename= and you give your file
dts <- read.zoo(text='time,ValueA,ValueB
3/12/2014 11:12:14,15222,3882
3/12/2014 11:12:16,5462,9832
3/12/2014 11:12:18,8432,12281
3/12/2014 11:12:20,15325,19928
3/12/2014 11:12:22,17458,29382
3/12/2014 11:12:24,6541,12
3/12/2014 11:12:26,8287,17822
3/12/2014 11:12:28,14278,504
3/12/2014 11:12:30,11854,848
3/12/2014 11:12:32,7495,17899
3/12/2014 11:12:34,6387,38822
3/12/2014 11:12:36,12354,7732
3/12/2014 11:12:38,15422,2003
3/12/2014 11:12:40,8452,2
3/12/2014 11:12:42,5845,18388',header=TRUE,tz='',
         sep=',',format='%d/%m/%Y %H:%M:%S')
myColors <- c("red", "darkgreen")
plot(x = dts, xlab = "Time", ylab = "Value",
     main = "plot rate in second resolution", col = myColors, screens = 1)
legend(x = "topleft", legend = c("ValueA", "ValueB"),
       lty = 1, col = myColors])

How to summarize the x-axis

You should use period.apply and endpoints to create time intervals. Here I am showing it uwing seconds units, but in you cas you should use mins.

period.apply(dts,endpoints(dts,'seconds',k=15),mean)
                      ValueA ValueB
2014-12-03 11:12:14 15222.00   3882
2014-12-03 11:12:28 10826.14  12823
2014-12-03 11:12:42  9687.00  12242
Zooming data:

You can use xts has time-of-day subsetting

dts <- as.xts(dts)
dts['T11:12:16/T11:12:19']

                    ValueA ValueB
2014-12-03 11:12:16   5462   9832
2014-12-03 11:12:18   8432  12281
How to calculate an average for the previous x values
rollmean(dts,k=10)
                     ValueA  ValueB
2014-12-03 11:12:22 11035.4 11239.0
2014-12-03 11:12:24 10151.9 14733.0
2014-12-03 11:12:26 10841.1 14523.0
2014-12-03 11:12:28 11540.1 13495.2
2014-12-03 11:12:30 10852.8 11502.6
2014-12-03 11:12:32  9691.5 10403.2

1. How to read from a csv and create a plot where multiple rates are plotted against the y-axis and time is on the x-axis.

> dat <- read.csv(header = TRUE, 
          text = "time,ValueA,ValueB
          3/12/2014 11:12:14,15222,3882
          3/12/2014 11:12:16,5462,9832
          3/12/2014 11:12:18,8432,12281
          3/12/2014 11:12:20,15325,19928
          ... ", sep = ",")
> dat$time <- strptime(dat$time, format = "%m/%d/%Y %H:%M:%S")
> sapply(dat, class)
## $time 
## [1] "POSIXlt" "POSIXt" 
## $ValueA
## [1] "integer"
## $ValueB 
## [1] "integer"

@agstudy's plot is quite nice.

2. How to summarize the x-axis in 15 minute or 30 minute intervals.

Here I'll do seconds, since the data is set up that way already.

> spl <- split(dat , cut(dat$time, "15 secs"))
> spl 
## $`2014-03-12 11:12:14`
##                  time ValueA ValueB 
## 1 2014-03-12 11:12:14  15222   3882 
## 2 2014-03-12 11:12:16   5462   9832 
## 3 2014-03-12 11:12:18   8432  12281
## 4 2014-03-12 11:12:20  15325  19928 
## 5 2014-03-12 11:12:22  17458  29382 
## 6 2014-03-12 11:12:24   6541     12 
## 7 2014-03-12 11:12:26   8287  17822 
## 8 2014-03-12 11:12:28  14278    504


## $`2014-03-12 11:12:29`
##                  time ValueA ValueB 
## 9  2014-03-12 11:12:30  11854    848
## 10 2014-03-12 11:12:32   7495  17899 
## 11 2014-03-12 11:12:34   6387  38822 
## 12 2014-03-12 11:12:36  12354   7732 
## 13 2014-03-12 11:12:38  15422   2003 
## 14 2014-03-12 11:12:40   8452      2
## 15 2014-03-12 11:12:42   5845  18388

3. How to filter the set so i can zoom into a specific section of the data

Get values of ValueA greater than 10,000, for example.

> dat[dat$ValueA > 1e4, ]

##                   time ValueA ValueB
## 1  2014-03-12 11:12:14  15222   3882
## 4  2014-03-12 11:12:20  15325  19928
## 5  2014-03-12 11:12:22  17458  29382
## 8  2014-03-12 11:12:28  14278    504
## 9  2014-03-12 11:12:30  11854    848
## 12 2014-03-12 11:12:36  12354   7732
## 13 2014-03-12 11:12:38  15422   2003

4. How to calculate an average for the previous x values (i.e. 10 seconds) so that the very low values are effectively ignored in ValueB

Splitting the data into 10 second intervals and finding the mean of the valued columns.

spl <- split(dat , cut(dat$time, "10 secs"))
do.call(rbind, lapply(1:length(spl), function(i){
     A <- mean(spl[[i]]$ValueA)
     B <- mean(spl[[i]]$ValueB)
     data.frame(A, B)
     }))
##         A       B  
## 1 12379.8 15061.0  
## 2  9691.0  7417.0  
## 3  9692.0 13389.4

I would also recommend reading about difftime, as.Date, and all the linked functions in those help files. Sorry so long! Hope it helps.


First define the data for reproducibility:

Lines <- "time,ValueA,ValueB
3/12/2014 11:12:14,15222,3882
3/12/2014 11:12:16,5462,9832
3/12/2014 11:12:18,8432,12281
3/12/2014 11:12:20,15325,19928
3/12/2014 11:12:22,17458,29382
3/12/2014 11:12:24,6541,12
3/12/2014 11:12:26,8287,17822
3/12/2014 11:12:28,14278,504
3/12/2014 11:12:30,11854,848
3/12/2014 11:12:32,7495,17899
3/12/2014 11:12:34,6387,38822
3/12/2014 11:12:36,12354,7732
3/12/2014 11:12:38,15422,2003
3/12/2014 11:12:40,8452,2
3/12/2014 11:12:42,5845,18388
"

Now we perform the 4 steps.

library(zoo)
library(ggplot2)

## 1 - Read in and create some plots.
## Use something like file="myfile.csv" in place of text=Lines on real data.

fmt <- "%m/%d/%Y %H:%M:%S"
z <- read.zoo(text = Lines, header = TRUE, sep = ",", format = fmt, tz = "")

plot(z)
plot(z, screen = 1, col = 1:2)

autoplot(z)
autoplot(z) + facet_free()
autoplot(z, facet = NULL)

## 2 - Aggregate to 15 sec.  cut produces factor so convert back to POSIXct.
## Use "15 min" instead on real data.

z15 <- aggregate(z, as.POSIXct(cut(time(z), "15 sec")), mean)

## 3 - Subset to a window of times.
## Modify st and en as desired for real data.

st <- as.POSIXct("2014-03-12 11:12:10")
en <- as.POSIXct("2014-03-12 11:12:40")
zw <- window(z, start = st, end = en)

## 4 - Average last k points.
## Use k <- 10 on real data

k <- 3
rollmeanr(z, k)
rollapplyr(z, k, by = k, mean) # or do this for every kth point

Note that every xts object is also a zoo object and another answer already gives the xts solution. One can go back and forth between zoo and xts. e.g. x <- as.xts(z)

There are 5 vignettes (pdf documents) that come with zoo plus the help files/reference manual. Click on documents here.


Need Your Help

Custom Jackson ObjectMapper in Jersey 2 with Spring

spring jersey jackson jersey-2.0

I'm having some issues migrating Jersey from 1.x to 2.x. My application uses Jersey to provide REST web services, with data served in JSON via Jackson and Spring 4 to handle the dependency injectio...

C++ code file extension? .cc vs .cpp

c++ filenames

I have seen C++ code saved as both .cc and .cpp files. Is there a difference between the two?