Extras#
Extracting Summary Statistics From a Model Fit in R#
If you are new to R, here we will just run through some details on the type of objects these data are stored in and how to access specific elements. This can be helpful for writing automated analysis scripts. Due to the need to contain different types of data in different formats and structures, the output of the regression model fit is stored in a bespoke object, with slots for the different parts of the output. These slots are named and can be assessed using the $
. For example, to extract just the table of estimated regression coefficients which are named coefficients
, we use the following command:
%%R
model <- lm(bmi ~ age + sex, demoDat)
summary(model)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.22402808 2.26589809 10.6906962 3.581804e-22
age 0.04090023 0.04560467 0.8968430 3.706760e-01
sexmale 0.07188731 0.52593207 0.1366856 8.913907e-01
We can determine the type of object the coefficients table is stored in, using the function class()
.
%%R
class(summary(model)$coefficients)
mode(summary(model)$coefficients)
[1] "numeric"
The output of the command tells us it is stored in a matrix, which is a data-type in R, where you have rows and columns. A similar data-type is called a data.frame. The difference between these two data-types is that matrices can only contain one data type, which we can determine with the function mode()
. Here it contains exclusively numeric values. In contrast, in a data frame each column can be a different data type. Our demoDat data is stored in a data.frame and the output of the str()
function, tells us the data type assigned to each column.
Let’s say we wanted to extract a single value from this matrix, there are a number of commands we can use. For example, let’s extract the p-value for the age regression slope parameter using the slicing function [
.
We can provide the row (2nd) and column (4th) number of the matrix entry we want:
%%R
summary(model)$coefficients[2,4]
[1] 0.370676
Alternatively we can specify either the column or row name:
%%R
summary(model)$coefficients["age",4]
[1] 0.370676
We can see a list of all components we can extract from the output of lm()
by running names()
on the lm object. All of these can be extracted with the $
.
%%R
names(model)
model$call ## example of how to extract any of the components listed by the previous command.
Call:
lm(formula = bmi ~ age + sex, data = demoDat)
Coefficients:
(Intercept) age sexmale
24.22403 0.04090 0.07189
Similarly we can run names()
on the summary(lm)
object as showed here to get a list of all the slots available from that object.
%%R
names(summary(model))
[1] "call" "terms" "residuals" "coefficients"
[5] "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled"
Note these are different to those available for the model object, for example the \(R^{2}\) and \(\overline{R}^{2}\) are only extractable from the summary(model)
object.
%%R
summary(model)$r.squared
summary(model)$adj.r.squared
[1] -0.004636059
Note that as well as directly assessing these slots using the $
command, there also exist some predefined functions to extract the commonly requested outputs from the model fit. We have already taken advantage of one of these, summary()
, others include coef()
, effects()
, residuals()
, fitted()
and predict.lm()
.
Simple Linear Regression Link between F-test and T-test#
We can also use an F-test to test a single predictor variable in our model.
%%R
model<-lm(weight ~ height)
summary(model)
anova(model)
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
height 1 46302 46302 295.74 < 2.2e-16 ***
Residuals 248 38828 157
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In the summary(model)
output, we can see at the bottom the results of testing the full model with an F-test. If we want to see the full table of sums of squares statistics, we can use the anova()
function on our fitted regression model.
Comparing this table with the coefficients table, we can see that the p-value from the t-test of the age regression parameter and the F-test for the full model are identical. This is not a coincidence and is always true for the specific case of simple linear regression.
Extracting Variance Explained Statistics#
Finally, we will look at the \(R^{2}\) and \(\overline{R}^{2}\) statistics. We can see from the summary(model)
output above these are automatically calculated. For the simple linear regression model we have fitted
\(R^{2}\) = r summary(model)$r.squared
\(\overline{R}^{2}\) = r summary(model)$adj.r.squared