Skip to main content

2.3) Importing Data Frames (cont'd)


The recommended format for data is to have just one column per variable, with additional column/s for indexing variable/s if data come from different groups (see the function stack() in Session 4). Missing data are given the value NA (“not available”). NA is a special value that R recognizes; if you run a statistical model on data that include NAs, the rows where they occur will normally be omitted.

Having imported a dataset as a data frame, it’s important to check its structure. The whole data frame may be too big to examine on the screen, so there are some functions to summarise data frames. Try dim(), colnames(), str() (i.e. "structure") and summary() – putting your data frame name in the brackets in each case.

Click on Appendix on the left and download the CSV file. Import it into R, calling it GRASS. Explore the data frame structure. What are its mode and its class?

A useful argument in functions like sum(), mean(), etc is na.rm=. You should set this to TRUE if you want any NA's to be ignored – otherwise the result will be NA.