Plotting using `predict`

September 12, 2016

fork this on gitlab

One of my favorite functions in R is predict, and so in response to this recent question on SlackRs#plots (“Anyone know how to plot an interaction at a moderators mean, +1SD, and -1SD?”) I thought I’d write up a quick demo of it’s usefulness.

Let’s assume a simple interaction effect between two continuous variables:

set.seed(92299)
N <- 500
b0 <- 0
b1 <- .3
b2 <- .5
b3 <- -.2

aDF <- within(data.frame(x1=rnorm(N)), 
     {
	     x2 <- rnorm(N)
	     x1x2 <- x1*x2
	     y <- b0+b1*x1+b2*x2+b3*x1x2+rnorm(N,0,1)
     })

head(aDF)
##           x1           y        x1x2          x2
## 1 -0.4732279  0.79910376 -0.55652246  1.17601350
## 2  0.2699430  0.18102388 -0.31458272 -1.16536700
## 3 -1.1218179  1.15182465 -1.24691895  1.11151635
## 4  0.1047639  1.58974408  0.06749747  0.64428201
## 5 -0.7177402 -1.80769438 -0.14404438  0.20069152
## 6  1.5776371  0.07835544 -0.10464178 -0.06632817

Now that we’ve generated data, we can fit a model:

aMod <- lm(y~1+x1*x2, aDF)
summary(aMod)
## 
## Call:
## lm(formula = y ~ 1 + x1 * x2, data = aDF)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2827 -0.6737  0.0105  0.6840  3.1072 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.05362    0.04495  -1.193    0.233    
## x1           0.30942    0.04223   7.327 9.58e-13 ***
## x2           0.51358    0.04289  11.975  < 2e-16 ***
## x1:x2       -0.16785    0.03765  -4.458 1.02e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.001 on 496 degrees of freedom
## Multiple R-squared:  0.3147,	Adjusted R-squared:  0.3106 
## F-statistic: 75.93 on 3 and 496 DF,  p-value: < 2.2e-16

How do we plot the relationship between y and x1 at different levels of x2? We can use predict to get the expected y for every x1 at the mean and +/-1 SD of x2 by creating a new data frame with those values.

minx1 <- min(aDF$x1)
maxx1 <- max(aDF$x1)
steps <- 100 #we want to get y values for 100 values in the real range of x1 
x1_values <- seq(minx1, 
		 maxx1, 
		 length.out=steps)
#We'll use `rep` to repeat these values for every value of x2 we want below...
x2.sd <- sd(aDF$x2)
x2.mean <- mean(aDF$x2)

newData <- data.frame(x1=rep(x1_values, 3), #for each val of x2
		      x2=rep(c(x2.mean-x2.sd,
			       x2.mean,
			       x2.mean+x2.sd), 
			     each=steps),
		      x2_level=rep(c('-1 SD',
				     'Mean',
				     '+1 SD'),
				   each=steps)) #each val of x2 for all x1
head(newData)
##          x1        x2 x2_level
## 1 -2.992855 -1.065048    -1 SD
## 2 -2.931769 -1.065048    -1 SD
## 3 -2.870684 -1.065048    -1 SD
## 4 -2.809598 -1.065048    -1 SD
## 5 -2.748513 -1.065048    -1 SD
## 6 -2.687427 -1.065048    -1 SD

Now use predict to get y values.

newData$y <- predict(aMod,newdata = newData) 
head(newData)
##          x1        x2 x2_level         y
## 1 -2.992855 -1.065048    -1 SD -2.061668
## 2 -2.931769 -1.065048    -1 SD -2.031847
## 3 -2.870684 -1.065048    -1 SD -2.002026
## 4 -2.809598 -1.065048    -1 SD -1.972205
## 5 -2.748513 -1.065048    -1 SD -1.942384
## 6 -2.687427 -1.065048    -1 SD -1.912564

Let’s plot it using ggplot2.

library(ggplot2)
ggplot(newData, aes(x=x1, y=y, group=x2_level, color=x2_level))+
	geom_line()

center

Plotting using `predict` - September 12, 2016 - John C. Flournoy