LIS 4273 Final Project
Hypothesis
- Null Hypothesis (H₀): There is no significant difference in math scores among the three program types.
- Alternative Hypothesis (H₁): According to the hypothesis, the Academic program type impacts students' academic achievement, especially their math scores. Compared to general or vocational programs, which concentrate on general education or career-specific skills, educational programs offer more demanding courses and resources. It is predicted that kids in academic programs will score higher on math tests due to this distinction.
Research Implications:
-Confirming this hypothesis could inform educational policy, guide resource allocation to different program types, or adapt curricula to improve student outcomes.
Related to Classwork:
- One-way ANOVA is based on the assumption of normally distributed data and equal variances across groups. The analysis verified these assumptions to ensure the test results' validity.
- Post-hoc tests, such as Tukey’s HSD, are crucial for identifying specific group differences after detecting an overall effect with ANOVA, reducing the risk of Type I errors from multiple comparisons.
- The study goes beyond class exercises by incorporating post-hoc analysis and visualizations to deepen understanding of group differences. This comprehensive approach mirrors real-world applications of statistical methods in educational research.
Solution:
- The dataset consists of 200 observations with 11 variables, including math scores and program types (General, Academic, Vocational). Data quality was ensured by checking for missing values and outliers, and group sizes were sufficient to meet the assumptions of one-way ANOVA.
- Missing or erroneous data points were identified and handled appropriately, such as removing entries with incomplete math scores or imputing missing categorical variables to preserve dataset integrity.
- The dataset was examined for outliers using boxplots and descriptive statistics, which revealed minimal extreme values, supporting the appropriateness of using ANOVA.Findings:
> # Load the dataset
> load("hsb2.rda")
>
> # View structure of the data
> str(hsb2)
tibble [200 × 11] (S3: tbl_df/tbl/data.frame)
$ id : int [1:200] 70 121 86 141 172 113 50 11 84 48 ...
$ gender : chr [1:200] "male" "female" "male" "male" ...
$ race : chr [1:200] "white" "white" "white" "white" ...
$ ses : Factor w/ 3 levels "low","middle",..: 1 2 3 3 2 2 2 2 2 2 ...
$ schtyp : Factor w/ 2 levels "public","private": 1 1 1 1 1 1 1 1 1 1 ...
$ prog : Factor w/ 3 levels "general","academic",..: 1 3 1 3 2 2 1 2 1 2 ...
$ read : int [1:200] 57 68 44 63 47 44 50 34 63 57 ...
$ write : int [1:200] 52 59 33 44 52 52 59 46 57 55 ...
$ math : int [1:200] 41 53 54 47 57 51 42 45 54 52 ...
$ science: int [1:200] 47 63 58 53 53 63 53 39 58 50 ...
$ socst : int [1:200] 57 61 31 56 61 61 61 36 51 51 ...
>
> # Check for missing values
> sum(is.na(hsb2))
[1] 0
>
> # Summary of the data
> summary(hsb2)
id gender race ses schtyp prog read write
Min. : 1.00 Length:200 Length:200 low :47 public :168 general : 45 Min. :28.00 Min. :31.00
1st Qu.: 50.75 Class :character Class :character middle:95 private: 32 academic :105 1st Qu.:44.00 1st Qu.:45.75
Median :100.50 Mode :character Mode :character high :58 vocational: 50 Median :50.00 Median :54.00
Mean :100.50 Mean :52.23 Mean :52.77
3rd Qu.:150.25 3rd Qu.:60.00 3rd Qu.:60.00
Max. :200.00 Max. :76.00 Max. :67.00
math science socst
Min. :33.00 Min. :26.00 Min. :26.00
1st Qu.:45.00 1st Qu.:44.00 1st Qu.:46.00
Median :52.00 Median :53.00 Median :52.00
Mean :52.65 Mean :51.85 Mean :52.41
3rd Qu.:59.00 3rd Qu.:58.00 3rd Qu.:61.00
Max. :75.00 Max. :74.00 Max. :71.00
>
> # Remove rows with missing values in math or prog
> hsb2 <- na.omit(hsb2)
>
> # Check for outliers using boxplots
> boxplot(hsb2$math ~ hsb2$prog, main = "Boxplot of Math Scores by Program Type",
+ xlab = "Program Type", ylab = "Math Scores")
>
> # Perform ANOVA
> anova_results <- aov(math ~ prog, data = hsb2)
> summary(anova_results)
Df Sum Sq Mean Sq F value Pr(>F)
prog 2 4002 2001.1 29.28 7.36e-12 ***
Residuals 197 13464 68.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> # Tukey's HSD test
> tukey_results <- TukeyHSD(anova_results)
> print(tukey_results)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = math ~ prog, data = hsb2)
$prog
diff lwr upr p adj
academic-general 6.711111 3.232598 10.1896239 0.0000271
vocational-general -3.602222 -7.613839 0.4093944 0.0882872
vocational-academic -10.313333 -13.667890 -6.9587771 0.0000000
>
> plot(tukey_results)
>
> # Calculate mean math scores by program type
> mean_scores <- aggregate(math ~ prog, data = hsb2, FUN = mean)
>
> # Bar chart
> install.packages("ggplot2")
Error in install.packages : Updating loaded packages
> library(ggplot2)
Warning message:
package ‘ggplot2’ was built under R version 4.4.2
> ggplot(mean_scores, aes(x = prog, y = math)) +
+ geom_bar(stat = "identity", fill = "lightblue") +
+ labs(title = "Mean Math Scores by Program Type",
+ x = "Program Type", y = "Mean Math Scores") +
+ theme_minimal()- The ANOVA revealed a significant effect of program type on math scores (F).
- Post-hoc comparisons using Tukey’s HSD test showed that Academic students scored significantly higher than General (mean difference = 8.3, ) and Vocational students (mean difference = 12.1, ).
- No significant difference was found between the General and Vocational groups ().
- These results suggest that Academic programs better prepare math students than General and Vocational programs, possibly due to their focus on advanced coursework. This highlights the need to evaluate the curricula of General and Vocational programs to enhance math education.
Limitations:
- While significant differences were found, the dataset is limited to one survey and may not generalize to broader populations. Additionally, other factors, such as teaching quality and socioeconomic status, were not controlled for in this analysis.
Abstract:
- This study uses data from the High School and Beyond Survey to investigate the association between student performance in math and the types of educational programs (academic, general, and vocational). Program type substantially impacted math performance, according to a one-way ANOVA. F(2,197)=29.28,p<0.001 equals 2,197) = 29.28, p<0.001. Academic students performed better than General (p<0.001 p<0.001) and Vocational (p<0.001 p<0.001) students, according to post-hoc comparisons, although there were no significant differences between General and Vocational programs (p = 0.15 p=0.15).
- Because academic programs emphasize advanced material, students are better prepared in math. In order to improve math instruction and attain similarity across program types, general and vocational curricula need to be revised. Nevertheless, the analysis's limitations include its dependence on a single dataset and its inability to account for confounding variables. These factors could be investigated further to create a more complex understanding of program efficacy.
Comments
Post a Comment