SAS: generating an output database following several proc procedures
I am still new at SAS and I was wondering how I can do the following:
Say that I have a database with the following info:
Time_during_the day date prices volume_traded 930am sep02 42 300 10am sep02 41 200 ..4pm sep02 40 200 930am sep03 40 500 10am sep03 41 100 ..4pm sep03 40 350 .....
What I want is to take the average of the total daily volume and divide this number by 50 (always). So say avg.daily vol./50 = V; and what I want is to record the price/time/date at every interval of size V. Now, say that V=500, I start by recording the first price,time,and date in my database and then record the same info 500 volume trade later. It is possible that on one day that the traded volume is say 300 and half of it will cover the v=500, the other 150 will be use to fill up up the following interval.
How can I get this information in one database? Thank you!
Assume your input dataset is called tick_data, and that it is sorted by both date and time_during_the_day. Then here's what I got:
%LET n = 50; /* Calculate V - the breakpoint size */ PROC SUMMARY DATA=tick_data; BY date; OUTPUT OUT = temp_1 SUM (volume_traded)= volume_traded_agg; RUN; DATA temp_2 ; SET temp_1; V = volume_traded_agg / &n; RUN; /* Merge it into original dataset so that it is available */ DATA temp_3; MERGE tick_data temp_2; BY date; RUN; /* Final walk through tick data to output at breakpoints */ DATA results /* Comment out the KEEP to see what is happening under the hood */ (KEEP=date time_during_the_day price volume_traded) ; SET temp_3; /* The IF FIRST will not work without the BY below */ BY date; /* Stateful counters */ RETAIN volume_cumulative breakpoint_next breakpoint_counter ; /* Reset stateful counters at the beginning of each day */ IF (FIRST.date) THEN DO; volume_cumulative = 0; breakpoint_next = V; breakpoint_counter = 0; END; /* Breakpoint test */ volume_cumulative = volume_cumulative + volume_traded; IF (breakpoint_counter <= &n AND volume_cumulative >= breakpoint_next) THEN DO; OUTPUT; breakpoint_next = breakpoint_next + V; breakpoint_counter = breakpoint_counter + 1; END; RUN;
The key SAS language feature to keep in mind for the future is the use of BY, FIRST, and RETAIN together. This enables stateful walks through data like this one. Conditional OUTPUT also figures here.
Note that whenever you use BY <var>, the dataset must be sorted on a key that includes <var>. In the case of tick_data and all intermediate temporary tables, it is.
Additional: Alternative V
In order to make V equal the (average total daily volume / n), replace the matching code block above with this one:
. . . . . . /* Calculate V - the breakpoint size */ PROC SUMMARY DATA=tick_data; BY date; OUTPUT OUT = temp_1 SUM (volume_traded)= volume_traded_agg; RUN; PROC SUMMARY DATA = temp_1 OUTPUT OUT = temp_1a MEAN (volume_traded_agg) =; RUN; DATA temp_2 ; SET temp_1a; V = volume_traded_agg / &n; RUN; /* Merge it into original dataset so that it is available */ DATA temp_3 . . . . . . . . . . . .
Basically you just insert a second PROC SUMMARY to take the mean of the sums. Notice how there is no BY statement because we're averaging over the whole set, not by any groupings or buckets. Also notice the MEAN (...) = without a name after the =. That will make the output variable have the same name as the input variable.