Extras#

Extracting summary statistics from a model fit in R#

If you are new to R, here we will just run through some details on the type of objects these data are stored in and how to access specific elements. This can be helpful for writing automated analysis scripts. Due to the need to contain different types of data in different formats and structures, the output of the regression model fit is stored in a bespoke object, with slots for the the different parts of the output. These slots are named and can be assessed using the $. For example to extract just the table of estimated regression coefficients, which are named coefficients we use the following command:

%%R
model <- lm(bmi ~ age + sex, demoDat)
summary(model)$coefficients
               Estimate Std. Error    t value     Pr(>|t|)
(Intercept) 24.22402808 2.26589809 10.6906962 3.581804e-22
age          0.04090023 0.04560467  0.8968430 3.706760e-01
sexmale      0.07188731 0.52593207  0.1366856 8.913907e-01

We can determine the type of object the coefficients table is stored in, using the function class().

%%R
class(summary(model)$coefficients)
mode(summary(model)$coefficients)
[1] "numeric"

The output of the command tells us it is stored in a matrix, which is a data-type in R, where you have rows and columns. A similar data-type is called a data.frame. The difference between these two data-types is that matrices can only contain one data type, which we can determine with the function mode(). Here it contains exclusively numeric values. In constrast, in a data frame each column can be a different data type. Our demoDat data is stored in a data.frame and the output of the str() function, tells us the data type assigned to each column.

Let’s say we wanted to extract a single value from this matrix, there are a number of commands we can use. For example, let’s extract the p-value for the age regression slope parameter using the slicing function [.

We can provide the row (2nd) and column (4th) number of the matrix entry we want:

%%R
summary(model)$coefficients[2,4]
[1] 0.370676

Alternatively we can specify either the column or row name:

%%R
summary(model)$coefficients["age",4]
[1] 0.370676

We can see a list of all components we can extract from the output of lm() by running names() on the lm object. All of these can be extracted with the $.

%%R
names(model)
model$call ## example of how to extract any of the components listed by the previous command.
Call:
lm(formula = bmi ~ age + sex, data = demoDat)

Coefficients:
(Intercept)          age      sexmale  
   24.22403      0.04090      0.07189  

Similarly we can run names() on the summary(lm) object as showed here to get a list of all the slots available from that object.

%%R
names(summary(model))
 [1] "call"          "terms"         "residuals"     "coefficients" 
 [5] "aliased"       "sigma"         "df"            "r.squared"    
 [9] "adj.r.squared" "fstatistic"    "cov.unscaled" 

Note these are different to those available for the model object, for example the \(R^{2}\) and \(\overline{R}^{2}\) are only extractable from the summary(model) object.

%%R
summary(model)$r.squared
summary(model)$adj.r.squared
[1] -0.004636059

Note that as well as directly assessing these slots using the $ command, there also exist some predefined functions to extract the commonly requested outputs from the model fit. We have already taken advantage of one of these, summary(), others include coef(), effects(), residuals(), fitted() and predict.lm().