Rcpp: Recommended code structure when using data frames with Rcpp (inline)
[I had this sketched out as a comment elsewhere but decided to create a proper question...]
What is currently considered "best practice" in terms of code structuring when using data frames in Rcpp? The ease with which one can "beam over" an input data frame from R to the C++ code is remarkable, but if the data frame has n columns, is the current thinking that this data should be split up into n separate (C++) vectors before being used?
The response to my previous question on making use of a string (character vector) column in a data frame suggests to me that yes, this is the right thing to do. In particular, there doesn't seem to be support for a notation such as df.name[i] to refer to the data frame information directly (as one might have in a C structure), unless I'm mistaken.
However, this leads us into a situation where subsetting down the data is much more cumbersome - instead of being able to subset a data frame in one line, each variable must be dealt with separately. So, is the thinking that subsetting in Rcpp is best done implicitly, via boolean vectors, say?
To summarise, I guess in a nutshell I wanted to check my current understanding that although a data frame can be beamed over to the C++ code, there is no way to refer directly to the individual elements of its columns in a "df.name[i]" fashion, and no simple method of generating a sub-dataframe of the input df by selecting rows satisfying simple criteria (e.g. df.date being in a given range).
Because data frames are in fact internally represented as list of vectors, the access by vectors really is the best you can do. There simply is no way to subset by row at the C or C++ level.
There was a good discussion about that on r-devel a few weeks ago in the context of a transpose of a data.frame (which you cannot do 'cheaply' for the same reason).