LIS 4273 Final Project

Hypothesis 

- Null Hypothesis (H₀): There is no significant difference in math scores among the three program types.

- Alternative Hypothesis (H₁): According to the hypothesis, the Academic program type impacts students' academic achievement, especially their math scores. Compared to general or vocational programs, which concentrate on general education or career-specific skills, educational programs offer more demanding courses and resources. It is predicted that kids in academic programs will score higher on math tests due to this distinction.

Research Implications:

-Confirming this hypothesis could inform educational policy, guide resource allocation to different program types, or adapt curricula to improve student outcomes.

Related to Classwork:

One-way ANOVA is based on the assumption of normally distributed data and equal variances across groups. The analysis verified these assumptions to ensure the test results' validity

- Post-hoc tests, such as Tukey’s HSD, are crucial for identifying specific group differences after detecting an overall effect with ANOVA, reducing the risk of Type I errors from multiple comparisons.

- The study goes beyond class exercises by incorporating post-hoc analysis and visualizations to deepen understanding of group differences. This comprehensive approach mirrors real-world applications of statistical methods in educational research.

Solution:

- The dataset consists of 200 observations with 11 variables, including math scores and program types (General, Academic, Vocational). Data quality was ensured by checking for missing values and outliers, and group sizes were sufficient to meet the assumptions of one-way ANOVA.

- Missing or erroneous data points were identified and handled appropriately, such as removing entries with incomplete math scores or imputing missing categorical variables to preserve dataset integrity.

- The dataset was examined for outliers using boxplots and descriptive statistics, which revealed minimal extreme values, supporting the appropriateness of using ANOVA.

Findings:

> # Load the dataset
> load("hsb2.rda")
> 
> # View structure of the data
> str(hsb2)
tibble [200 × 11] (S3: tbl_df/tbl/data.frame)
 $ id     : int [1:200] 70 121 86 141 172 113 50 11 84 48 ...
 $ gender : chr [1:200] "male" "female" "male" "male" ...
 $ race   : chr [1:200] "white" "white" "white" "white" ...
 $ ses    : Factor w/ 3 levels "low","middle",..: 1 2 3 3 2 2 2 2 2 2 ...
 $ schtyp : Factor w/ 2 levels "public","private": 1 1 1 1 1 1 1 1 1 1 ...
 $ prog   : Factor w/ 3 levels "general","academic",..: 1 3 1 3 2 2 1 2 1 2 ...
 $ read   : int [1:200] 57 68 44 63 47 44 50 34 63 57 ...
 $ write  : int [1:200] 52 59 33 44 52 52 59 46 57 55 ...
 $ math   : int [1:200] 41 53 54 47 57 51 42 45 54 52 ...
 $ science: int [1:200] 47 63 58 53 53 63 53 39 58 50 ...
 $ socst  : int [1:200] 57 61 31 56 61 61 61 36 51 51 ...
> 
> # Check for missing values
> sum(is.na(hsb2))
[1] 0
> 
> # Summary of the data
> summary(hsb2)
       id            gender              race               ses         schtyp            prog          read           write      
 Min.   :  1.00   Length:200         Length:200         low   :47   public :168   general   : 45   Min.   :28.00   Min.   :31.00  
 1st Qu.: 50.75   Class :character   Class :character   middle:95   private: 32   academic  :105   1st Qu.:44.00   1st Qu.:45.75  
 Median :100.50   Mode  :character   Mode  :character   high  :58                 vocational: 50   Median :50.00   Median :54.00  
 Mean   :100.50                                                                                    Mean   :52.23   Mean   :52.77  
 3rd Qu.:150.25                                                                                    3rd Qu.:60.00   3rd Qu.:60.00  
 Max.   :200.00                                                                                    Max.   :76.00   Max.   :67.00  
      math          science          socst      
 Min.   :33.00   Min.   :26.00   Min.   :26.00  
 1st Qu.:45.00   1st Qu.:44.00   1st Qu.:46.00  
 Median :52.00   Median :53.00   Median :52.00  
 Mean   :52.65   Mean   :51.85   Mean   :52.41  
 3rd Qu.:59.00   3rd Qu.:58.00   3rd Qu.:61.00  
 Max.   :75.00   Max.   :74.00   Max.   :71.00  
> 
> # Remove rows with missing values in math or prog
> hsb2 <- na.omit(hsb2)
> 
> # Check for outliers using boxplots
> boxplot(hsb2$math ~ hsb2$prog, main = "Boxplot of Math Scores by Program Type",
+         xlab = "Program Type", ylab = "Math Scores")
> 
> # Perform ANOVA
> anova_results <- aov(math ~ prog, data = hsb2)
> summary(anova_results)
             Df Sum Sq Mean Sq F value   Pr(>F)    
prog          2   4002  2001.1   29.28 7.36e-12 ***
Residuals   197  13464    68.3                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> # Tukey's HSD test
> tukey_results <- TukeyHSD(anova_results)
> print(tukey_results)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = math ~ prog, data = hsb2)

$prog
                          diff        lwr        upr     p adj
academic-general      6.711111   3.232598 10.1896239 0.0000271
vocational-general   -3.602222  -7.613839  0.4093944 0.0882872
vocational-academic -10.313333 -13.667890 -6.9587771 0.0000000

> 
> plot(tukey_results)
> 
> # Calculate mean math scores by program type
> mean_scores <- aggregate(math ~ prog, data = hsb2, FUN = mean)
> 
> # Bar chart
> install.packages("ggplot2")
Error in install.packages : Updating loaded packages
> library(ggplot2)
Warning message:
package ‘ggplot2’ was built under R version 4.4.2 
> ggplot(mean_scores, aes(x = prog, y = math)) +
+   geom_bar(stat = "identity", fill = "lightblue") +
+   labs(title = "Mean Math Scores by Program Type",
+        x = "Program Type", y = "Mean Math Scores") +
+   theme_minimal()


- The ANOVA revealed a significant effect of program type on math scores (F(2,197)=14.5,p<0.001F(2, 197) = 14.5, p < 0.001).

- Post-hoc comparisons using Tukey’s HSD test showed that Academic students scored significantly higher than General (mean difference = 8.3, p<0.001p < 0.001) and Vocational students (mean difference = 12.1, p<0.001p < 0.001). 

- No significant difference was found between the General and Vocational groups (p=0.15p = 0.15).




Practical Implications:

- These results suggest that Academic programs better prepare math students than General and Vocational programs, possibly due to their focus on advanced coursework. This highlights the need to evaluate the curricula of General and Vocational programs to enhance math education.

Limitations:

- While significant differences were found, the dataset is limited to one survey and may not generalize to broader populations. Additionally, other factors, such as teaching quality and socioeconomic status, were not controlled for in this analysis.

Abstract: 

- This study uses data from the High School and Beyond Survey to investigate the association between student performance in math and the types of educational programs (academic, general, and vocational). Program type substantially impacted math performance, according to a one-way ANOVA. F(2,197)=29.28,p<0.001 equals 2,197) = 29.28, p<0.001. Academic students performed better than General (p<0.001 p<0.001) and Vocational (p<0.001 p<0.001) students, according to post-hoc comparisons, although there were no significant differences between General and Vocational programs (p = 0.15 p=0.15).

- Because academic programs emphasize advanced material, students are better prepared in math. In order to improve math instruction and attain similarity across program types, general and vocational curricula need to be revised. Nevertheless, the analysis's limitations include its dependence on a single dataset and its inability to account for confounding variables. These factors could be investigated further to create a more complex understanding of program efficacy.

Comments

Popular posts from this blog

LIS 4273 Module #5 Assignment