summarize by time interval not working

I have the following data as a list of POSIXct times that span one month. Each of them represent a bike delivery. My aim is to find the average amount of bike deliveries per ten-minute interval over a 24-hour period (producing a total of 144 rows). First all of the trips need to be summed and binned into an interval, then divided by the number of days. So far, I've managed to write a code that sums trips per 10-minute interval, but it produces incorrect values. I am not sure where it went wrong.

The data looks like this:

head(start_times)
[1] "2014-10-21 16:58:13 EST" "2014-10-07 10:14:22 EST" "2014-10-20 01:45:11 EST"
[4] "2014-10-17 08:16:17 EST" "2014-10-07 17:46:36 EST" "2014-10-28 17:32:34 EST"
length(start_times)
[1] 1747


The code looks like this:

library(lubridate)
library(dplyr)

tripduration <- floor(runif(1747) * 1000)

time_bucket <- start_times - minutes(minute(start_times) %% 10) - seconds(second(start_times))

df <- data.frame(tripduration, start_times, time_bucket)
summarized <- df %>%
group_by(time_bucket) %>%
summarize(trip_count = n())
summarized <- as.data.frame(summarized)
out_buckets <- data.frame(out_buckets = seq(as.POSIXlt("2014-10-01 00:00:00"), as.POSIXct("2014-10-31 23:0:00"), by = 600))
out <- left_join(out_buckets, summarized, by = c("out_buckets" = "time_bucket"))
out$trip_count[is.na(out$trip_count)] <- 0

head(out) out_buckets trip_count 1 2014-10-01 00:00:00 0 2 2014-10-01 00:10:00 0 3 2014-10-01 00:20:00 0 4 2014-10-01 00:30:00 0 5 2014-10-01 00:40:00 0 6 2014-10-01 00:50:00 0 dim(out) [1] 4459 2

test <- format(out$out_buckets,"%H:%M:%S")
test2 <- out$trip_count
test <- cbind(test, test2)
colnames(test)[1] <- "interval"
colnames(test)[2] <- "count"
test <- as.data.frame(test)
test$count <- as.numeric(test$count) 
test <- aggregate(count~interval, test, sum)
head(test, n = 20)
   interval count
1  00:00:00    32
2  00:10:00    33
3  00:20:00    32
4  00:30:00    31
5  00:40:00    34
6  00:50:00    34
7  01:00:00    31
8  01:10:00    33
9  01:20:00    39
10 01:30:00    41
11 01:40:00    36
12 01:50:00    31
13 02:00:00    33
14 02:10:00    34
15 02:20:00    32
16 02:30:00    32
17 02:40:00    36
18 02:50:00    32
19 03:00:00    34
20 03:10:00    39

but this is impossible because when I sum the counts

sum(test$count) [1] 7494

I get 7494 whereas the number should be 1747

I'm not sure where I went wrong and how to simplify this code to get the same result.

Answers


I've done what I can, but I can't reproduce your issue without your data.

library(dplyr)

I created the full sequence of 10 minute blocks:

blocks.of.10mins <- data.frame(out_buckets=seq(as.POSIXct("2014/10/01 00:00"), by="10 mins", length.out=30*24*6))

Then split the start_times into the same bins. Note: I created a baseline time of midnight to force the blocks to align to 10 minute intervals. Removing this later is an exercise for the reader. I also changed one of your data points so that there was at least one example of multiple records in the same bin.

start_times <- as.POSIXct(c("2014-10-01 00:00:00", ## added
                            "2014-10-21 16:58:13",
                            "2014-10-07 10:14:22",
                            "2014-10-20 01:45:11",
                            "2014-10-17 08:16:17",
                            "2014-10-07 10:16:36", ## modified
                            "2014-10-28 17:32:34"))

trip_times <- data.frame(start_times) %>% 
    mutate(out_buckets = as.POSIXct(cut(start_times, breaks="10 mins")))

The start_times and all the 10 minute intervals can then be merged

trips_merged <- merge(trip_times, blocks.of.10mins, by="out_buckets", all=TRUE)

These can then be grouped by 10 minute block and counted

trips_merged %>% filter(!is.na(start_times)) %>% 
  group_by(out_buckets) %>% 
  summarise(trip_count=n())

Source: local data frame [6 x 2]

          out_buckets trip_count
               (time)      (int)
1 2014-10-01 00:00:00          1
2 2014-10-07 10:10:00          2
3 2014-10-17 08:10:00          1
4 2014-10-20 01:40:00          1
5 2014-10-21 16:50:00          1
6 2014-10-28 17:30:00          1    

Instead, if we only consider time, not date

trips_merged2 <- trips_merged
trips_merged2$out_buckets <- format(trips_merged2$out_buckets, "%H:%M:%S")

trips_merged2 %>% filter(!is.na(start_times)) %>% 
  group_by(out_buckets) %>% 
  summarise(trip_count=n())

Source: local data frame [6 x 2]

  out_buckets trip_count
        (chr)      (int)
1    00:00:00          1
2    01:40:00          1
3    08:10:00          1
4    10:10:00          2
5    16:50:00          1
6    17:30:00          1

Need Your Help

Check if the current time between specific time

php

In my database have start time and end time 24h format.I write a function to show result depend on time.

Autoselect picker row

iphone objective-c ios

I've been looking for quite a while after an answer for this, but can't seem to fint the solution...