Extras

Extras#

Extracting summary statistics from a model fit in R#

If you are new to R, here we will just run through some details on the type of objects these data are stored in and how to access specific elements. This can be helpful for writing automated analysis scripts. Due to the need to contain different types of data in different formats and structures, the output of the regression model fit is stored in a bespoke object, with slots for the the different parts of the output. These slots are named and can be assessed using the $. For example to extract just the table of estimated regression coefficients, which are named coefficients we use the following command:

%%R
model <- lm(bmi ~ age + sex, demoDat)
summary(model)$coefficients

               Estimate Std. Error    t value     Pr(>|t|)
(Intercept) 24.22402808 2.26589809 10.6906962 3.581804e-22
age          0.04090023 0.04560467  0.8968430 3.706760e-01
sexmale      0.07188731 0.52593207  0.1366856 8.913907e-01

We can determine the type of object the coefficients table is stored in, using the function class().

%%R
class(summary(model)$coefficients)
mode(summary(model)$coefficients)

[1] "numeric"

The output of the command tells us it is stored in a matrix, which is a data-type in R, where you have rows and columns. A similar data-type is called a data.frame. The difference between these two data-types is that matrices can only contain one data type, which we can determine with the function mode(). Here it contains exclusively numeric values. In constrast, in a data frame each column can be a different data type. Our demoDat data is stored in a data.frame and the output of the str() function, tells us the data type assigned to each column.

Let’s say we wanted to extract a single value from this matrix, there are a number of commands we can use. For example, let’s extract the p-value for the age regression slope parameter using the slicing function [.

We can provide the row (2nd) and column (4th) number of the matrix entry we want:

%%R
summary(model)$coefficients[2,4]

[1] 0.370676

Alternatively we can specify either the column or row name:

%%R
summary(model)$coefficients["age",4]

[1] 0.370676

We can see a list of all components we can extract from the output of lm() by running names() on the lm object. All of these can be extracted with the $.

%%R
names(model)
model$call ## example of how to extract any of the components listed by the previous command.

Call:
lm(formula = bmi ~ age + sex, data = demoDat)

Coefficients:
(Intercept)          age      sexmale  
   24.22403      0.04090      0.07189  

Similarly we can run names() on the summary(lm) object as showed here to get a list of all the slots available from that object.

%%R
names(summary(model))

 [1] "call"          "terms"         "residuals"     "coefficients" 
 [5] "aliased"       "sigma"         "df"            "r.squared"    
 [9] "adj.r.squared" "fstatistic"    "cov.unscaled" 

Note these are different to those available for the model object, for example the $R^{2}$ and $\overline{R}^{2}$ are only extractable from the summary(model) object.

%%R
summary(model)$r.squared
summary(model)$adj.r.squared

[1] -0.004636059

Note that as well as directly assessing these slots using the $ command, there also exist some predefined functions to extract the commonly requested outputs from the model fit. We have already taken advantage of one of these, summary(), others include coef(), effects(), residuals(), fitted() and predict.lm().

Simple Linear Regression Link between F-test and T-test#

We can also use an F-test to test a single predictor variable in our model.

%%R
model<-lm(weight ~ height)
summary(model)
anova(model)

Analysis of Variance Table

Response: weight
           Df Sum Sq Mean Sq F value    Pr(>F)    
height      1  46302   46302  295.74 < 2.2e-16 ***
Residuals 248  38828     157                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In the summary(model) output we can see at the bottom that the results of testing the full model with an F-test. If we want to see the full table of sums of squares statistics we can use the anova() function on our fitted regression model.

Comparing this table with the coefficients table, we can see that the p-value from the t-test of the age regression parameter and the F-test for the full model are identical. This is not a coincidence and is always true for the specific case of simple linear regerssion.

Extracting Variance Explained Statistics#

Finally, we will look at the $R^{2}$ and $\overline{R}^{2}$ statistics. We can see from the summary(model) output above these are automatically calculated. For the simple linear regression model we have fitted

$R^{2}$ = r summary(model)$r.squared

$\overline{R}^{2}$ = r summary(model)$adj.r.squared

Extras

Contents

Extras#

Extracting summary statistics from a model fit in R#

Simple Linear Regression Link between F-test and T-test#

Extracting Variance Explained Statistics#