The goal of this exercise is to explore and perform basic inference on the ToothGrowth data set availble in R. To investigate the effect of treatment factors on tooth growth response, two sample t tests and 95% confidence intervals are used to determine when differences are significant.
The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid). Making a boxplot of the response variable versus the independent variables shows that the response for a given treatment (dosage + delivery method) is roughly symmetric about the 50th percentile.
library(ggplot2)
data("ToothGrowth")
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
q<-ggplot(ToothGrowth,aes(x=dose, y=len, fill=supp)) +
labs(title="ToothGrowth Response vs. Dose ") +
xlab("Dose (mg/day)") + ylab("Growth Length (mm)")
q+geom_boxplot()
From the plot above, it appears likely that there are statistically different effects for the different treatments. The combination of dosage and delivery method generates 6 independent treatments, each having a sample size of 10. The effect of a treatment is measured by taking the mean of the response. Since we don’t know the underlying variance, comparing sample means relies on confidence intervals using the t-statistic.
First I subset the response by treatment to make doing t-tests simpler.
vc05<-subset(ToothGrowth, supp=="VC" & dose=="0.5",len)$len
oj05<-subset(ToothGrowth, supp=="OJ" & dose=="0.5",len)$len
vc10<-subset(ToothGrowth, supp=="VC" & dose=="1",len)$len
oj10<-subset(ToothGrowth, supp=="OJ" & dose=="1",len)$len
vc20<-subset(ToothGrowth, supp=="VC" & dose=="2",len)$len
oj20<-subset(ToothGrowth, supp=="OJ" & dose=="2",len)$len
Taking the mean of the response estimates the effect. For this case the relative effect is:
mean(oj05)-mean(vc05)
## [1] 5.25
This difference is significant at the 95% level if the 95% CI for the effect is entirely above 0. If we assume that the response variances are the same, then to get the 95% confidence interval we use the pooled variance and the t-statistic. At the 95% level, the effect is statistically significant.
mean(oj05)-mean(vc05) +c(-1,1)*qt(.975,20-2)*sqrt((sd(vc05)^2+sd(oj05)^2)/2)*sqrt(1/5)
## [1] 1.770262 8.729738
Note, we can arrive at the same result using the t.test command with equal variance.
t.test(oj05, vc05, var.equal=T)
##
## Two Sample t-test
##
## data: oj05 and vc05
## t = 3.1697, df = 18, p-value = 0.005304
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.770262 8.729738
## sample estimates:
## mean of x mean of y
## 13.23 7.98
For the sake of completeness, if we can not assume that the variances are the same, then we have to use a different formula to create a 95% CI. This interval is a little wider than the previous one, but still shows a significant effect.
df <- function(sdx, sdy, nx, ny) {
num<-(sdx^2/nx + sdy^2/ny)^2
den<-((sdx^2/nx)^2)/(nx-1) + ((sdy^2/ny)^2)/(ny-1)
return(num/den)
}
mean(oj05)-mean(vc05) +c(-1,1)*qt(.975,df(sd(oj05),sd(vc05),10,10)) * sqrt((sd(vc05)^2)/10+(sd(oj05)^2)/10)
## [1] 1.719057 8.780943
This can also be recreated from the t.test command without equal variance, (which is also the default behavior).
t.test(oj05, vc05, var.equal=F)
##
## Welch Two Sample t-test
##
## data: oj05 and vc05
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean of x mean of y
## 13.23 7.98
I will assume that tooth growth variance is primarily due to the underlying population variance, and will use the pooled variance in my CI estimates from here on. (Note, I make no attampt to thoroughly investigate or justify this assumption.)
The relative effect is:
mean(oj10)-mean(oj05)
## [1] 9.47
And this effect apperas to be significant at the 95% level
mean(oj10)-mean(oj05) +c(-1,1)*qt(.975,20-2)*sqrt((sd(oj10)^2+sd(oj05)^2)/2)*sqrt(1/5)
## [1] 5.529186 13.410814
The relative effect is:
mean(oj20)-mean(oj10)
## [1] 3.36
And this effect also apperas to be significant at the 95% level
mean(oj20)-mean(oj10) +c(-1,1)*qt(.975,20-2)*sqrt((sd(oj20)^2+sd(oj10)^2)/2)*sqrt(1/5)
## [1] 0.2194983 6.5005017
So we have found that increasing the dosage of Vitamin C via orange juice has a statstically significant increaseing effect on tooth growth.
The relative effect is:
mean(oj10)-mean(vc10)
## [1] 5.93
This effect also apperas to be significant at the 95% level
mean(oj10)-mean(vc10) +c(-1,1)*qt(.975,20-2)*sqrt((sd(oj10)^2+sd(vc10)^2)/2)*sqrt(1/5)
## [1] 2.840692 9.019308
The relative effect is:
mean(oj20)-mean(vc20)
## [1] -0.08
There is no statistical difference in these effects.
mean(oj20)-mean(vc20) +c(-1,1)*qt(.975,20-2)*sqrt((sd(oj20)^2+sd(vc20)^2)/2)*sqrt(1/5)
## [1] -3.722999 3.562999
In other words, we fail to reject the null hypothesis, that there is no difference in the effect.
We have found that for a dose of 0.5 mg and 1.0 mg, there is a greater effect when the dose is delivered via Orange Juice than when the dose is delivered via Acorbic Acid. At 2.0 mg we have not found any difference in the effect.
In this analysis, the response of tooth growth in guinea pigs due to various treatments was investigated using the t-statistic to create 95% confidence intervals to summaraize effects. It was found that there are significant response differences between dosages and deliver methods. It was found that increasing the dosage of Vitamin C delivered by Orange Juice causes an increasing effect for each of the three dosages tested. It was also found that for a given dosage, the response from Oragne Juice was greater than that from Asorbic Acid at the two lower dosages, while no difference was observed at the higher dosage.