How to import excel data into R using column name and row name
I am a R novice and was wondering how to import excel data into R using row names and column names. Specifically i require a subset of the data in a number of worksheet within one excel file. Can i use row names and column names to identify and extract certain cells of data to R ?
Worksheet 1 ---------- * X Y Z A 1 2 2 B 1 1 1 C 1 3 4 D 4 2 2 E 2 2 2 ---------- Worksheet 2 ---------- * X Y1 Z1 A 1 2 2 B 1 2 3 C 1 3 4 D 4 1 1 E 2 1 1
For example in the above spreadsheet how could i extract the data (2,2,2,2) using the row and column names (D,Y) (D,Z) (E,Y) (E,Z) in worksheet 1
how could i extract the data (1,1,1,1) using the row and column names (D,Y1) (D,Z1) (E,Y1) (E,Z1) in worksheet 2 ?
Thanks for any help provided
@Andrie mentionned the XLConnect package, it's a very useful package for I/O between R and Excel with the possibility to select region in Excel sheet.
I created an Excel file like yours in my Dropbox public folder, you can download the example.xls file here.
require(XLConnect) ## A5:C5 correspond to (D,Y) (D,Z) (E,Y) (E,Z) in your example selectworksheet1 <- readWorksheetFromFile("/home/ahmadou/Dropbox/Public/example.xls", sheet = "Worksheet1", region = "A5:C5", header = FALSE) selectworksheet1 ## Col0 Col1 Col2 ## 1 2 2 2 ## B4:C5 correspond to (D,Y1) (D,Z1) (E,Y1) (E,Z1) in the second example selectworksheet2 <- readWorksheetFromFile("/home/ahmadou/Dropbox/Public/example.xls", sheet = "Worksheet2", region = "B4:C5", header = FALSE) selectworksheet2 ## Col0 Col1 ## 1 1 1 ## 2 1 1 unlist(selectworksheet2) ## Col01 Col02 Col11 Col12 ## 1 1 1 1
There are several packages which provide functions to import Excel data to R; see the R data import/export documentation.
I've found the xlsx package to be useful (it will read both .xls and .xlsx files). I don't believe that it will accept row/column names as input, but it will accept their numerical value (row 1, column 4 for example). In your case, something like this should work, assuming that X, Y and Z correspond to columns 1-3:
library(xlsx) # first example subset; call it ss1 # assume first row is not a header; otherwise requires header = T ss1 <- read.xlsx("myfile.xlsx", sheetIndex = 1, rowIndex = 4:5, colIndex = 2:3) # second example subset; call it ss2 # just the same except worksheet index = 2 ss2 <- read.xlsx("myfile.xlsx", sheetIndex = 2, rowIndex = 4:5, colIndex = 2:3)
However, you will need to experiment with your own file until things work as expected. You can also specify sheetName but I find sheetIndex normally works more reliably, once you figure out the correct index for each sheet. And take care if the first row is a header.
Having said all that: my preferred option would be to export the sheet to a text format such as CSV, use shell tools (cut, head, tail etc.) to get the required rows/columns and import that to R.