Predictive Modelling for Graduate Placement Success: A Comparative Analysis of Ensemble Methods, Feature Engineering, and Business Impact

Abstract

Predicting graduate student placement is critical for educational institutions aiming to enhance student outcomes and optimize resource allocation. This study develops and evaluates machine learning models for graduate placement prediction through a multi-faceted approach that integrates methodological rigor with practical business applications¹.

Utilizing a dataset of 215 graduate students, we systematically compared the performance of individual classifiers against advanced ensemble methods. The methodology is distinguished by its use of domain-driven feature engineering to create novel predictors, such as an Experience-Weighted Test Score, and a comprehensive cost-benefit framework to quantify the financial impact of model implementation.

The results reveal that while a well-tuned Logistic Regression model achieved the highest individual accuracy (86.51%), the Voting Ensemble provided the best balance of performance and reliability, with a superior AUC of 0.932. Crucially, the analysis uncovered powerful and counter-intuitive predictors, including a strong negative correlation between prior work experience and placement odds, and a significant placement advantage for students from an Arts academic stream.

Introduction

In the highly competitive landscape of higher education, the reputation and success of graduate business programs are intrinsically linked to the career outcomes of their graduates². For prospective students, placement rates represent the tangible return on a significant investment, while for institutions, strong placement outcomes are vital for attracting top-tier applicants and securing prominent positions in university rankings.

The ability to accurately predict student placement has emerged as a critical strategic imperative, leading to a growing body of research in Educational Data Mining (EDM)³. While numerous studies have applied standard machine learning classifiers to predict student performance⁴⁵, several critical gaps remain in the literature.

Research Objectives

This research undertakes a comprehensive investigation into predicting graduate placement success, guided by four primary objectives:

  1. Systematic Performance Comparison - Compare ensemble methods against individual machine learning classifiers to determine the boundaries of ensemble effectiveness
  2. Domain-Driven Feature Engineering - Design and validate novel engineered features that outperform traditional academic metrics
  3. Business Value Translation - Develop a cost-benefit framework that translates model performance into quantifiable ROI
  4. Pattern Discovery - Identify counter-intuitive predictive patterns that challenge conventional wisdom about placement factors

Methodology

Research Design and Approach

This study employed a supervised machine learning methodology to create a binary prediction system for student placement outcomes. The approach incorporated data preprocessing, feature engineering, algorithmic model building, ensemble learning methods, and structured business impact assessment⁶.

All experiments were conducted using WEKA 3.8.6, a powerful open-source machine learning workbench⁷. WEKA’s comprehensive interface and extensive classifier library made it highly effective for analyzing tabular educational datasets while ensuring reproducibility and standardization.

Dataset Description and Preparation

The dataset consisted of academic performance history, demographic profiles, educational details, and performance indicators from 215 graduate students. Key variables included:

  • Academic Performance: Secondary education percentage, higher secondary percentage, degree percentage
  • Educational Background: Board of education, academic specialization, degree type
  • Demographics: Gender, work experience status
  • Target Variable: Binary placement outcome (placed/not placed)

Data Preprocessing

The preliminary data audit revealed missing salary observations specifically in the “Not Placed” category. To prevent data leakage, the salary feature was removed entirely, as its missingness pattern perfectly correlated with the target variable⁸.

Categorical Encoding: All nominal variables were systematically encoded using binary transformation, including gender, education boards, academic streams, specializations, and work experience indicators. This expanded the dataset to 18 transformed features while maintaining interpretability.

Feature Engineering Strategy

Three novel engineered features were developed to capture domain-specific insights:

1. Composite Academic Score

A weighted average across all educational levels: SSC (20%) + HSC (30%) + Degree (30%) + Graduate (20%). This feature rewards recent academic performance while maintaining historical context, aligning with recruiter preferences for recent credentials⁹.

2. Experience-Weighted Test Score

This feature combines aptitude testing with professional experience context. Students with work experience received a 20% boost to their test scores, reflecting the complementary nature of cognitive ability and real-world exposure that recruiters value¹⁰.

3. Academic Trajectory Score

Calculated as the average improvement across educational transitions: ((HSC-SSC) + (Degree-HSC) + (Graduate-Degree))/3. This captures longitudinal academic development patterns, identifying students with consistent improvement trends.

Algorithm Selection and Evaluation

Individual Classifiers

Five diverse machine learning algorithms were selected to ensure comprehensive evaluation:

  • Logistic Regression: Chosen for interpretability and reliability in binary classification
  • Random Forest: Selected for handling mixed data types and providing feature importance rankings
  • Multilayer Perceptron: Included to capture complex non-linear relationships
  • Decision Tree (REPTree): Implemented for transparency and reduced overfitting through pruning
  • k-Nearest Neighbors: Retained for modeling local data structures and identifying outliers

Ensemble Methods

Two advanced ensemble strategies were implemented:

Voting Ensemble (Soft Voting): Combined probability distributions from four high-performing base classifiers using average aggregation. This approach leverages confidence levels rather than simple majority voting, making it particularly effective for imbalanced datasets¹¹.

Stacking Ensemble: Employed a two-level meta-learning architecture with Logistic Regression as the meta-learner. This method learns optimal combination strategies from base classifier predictions through 10-fold internal cross-validation¹².

Evaluation Framework

Cross-Validation Strategy: All models were evaluated using stratified 10-fold cross-validation with fixed random seeds (S=1) to ensure reproducibility and fair comparison. Stratification maintained the 2:1 class ratio across all folds.

Performance Metrics: Comprehensive evaluation included accuracy, precision, recall, F1-score, AUC, and Kappa statistics. Special attention was paid to class-specific performance given the inherent class imbalance in placement data.

Statistical Significance: McNemar’s test was employed to verify that performance improvements over baseline were statistically significant rather than due to random chance¹³.

Results and Analysis

Overall Model Performance

The machine learning models demonstrated significant improvements over the baseline classifier across all performance metrics. The baseline ZeroR classifier achieved 68.84% accuracy with an AUC of 0.48, representing random performance.

Best Individual Performance: Logistic Regression achieved the highest accuracy at 86.51% with an AUC of 0.926, demonstrating a substantial 17.67% improvement over baseline. This translates to correctly predicting an additional 38 student placements.

Ensemble Performance: The Voting Ensemble achieved 85.58% accuracy with the highest AUC of 0.932, while the Stacking Ensemble matched the accuracy at 85.58% with an AUC of 0.906. Although ensemble methods typically outperform individual classifiers, the minimal differences suggest that Logistic Regression captured most of the predictive signal in the dataset.

Feature Importance and Business Insights

The feature importance analysis revealed several counter-intuitive findings that challenge conventional assumptions:

Top Positive Predictors

  • Experience-Weighted Test Score (Odds Ratio: 1.65): The 20% boost for work experience significantly improved placement odds
  • Arts Academic Background (Odds Ratio: 2.91): Arts students showed nearly 3x higher placement probability
  • Commerce & Management Degree (Odds Ratio: 2.64): Strong advantage for business-oriented educational backgrounds

Surprising Negative Predictors

  • Work Experience Presence (Odds Ratio: 0.009): Students with prior work experience were dramatically less likely to secure placement
  • Female Gender (Odds Ratio: 0.24): Concerning evidence of systematic gender bias in recruitment
  • Science & Technology Background (Odds Ratio: 0.40): Technical backgrounds showed placement disadvantages

The most surprising finding was the strongly negative coefficient for work experience (-4.672). This suggests that companies may prefer fresh graduates they can train according to their preferences, or that the quality of work experience matters more than its mere presence¹⁴.

Ensemble Model Analysis

The Stacking Ensemble’s meta-learner revealed how different models contributed to final predictions:

  • Random Forest: Dominant contributor (Weight: 1.94, Odds Ratio: 6.96)
  • Logistic Regression: Strong secondary contributor (Weight: 0.84, Odds Ratio: 2.31)
  • Decision Tree: Moderate influence (Weight: 0.34)
  • Neural Network: Minimal contribution (Weight: 0.16)

Despite the sophisticated ensemble architecture, performance gains over the best individual model were minimal. However, the Voting Ensemble achieved the highest AUC of 0.932, demonstrating superior ranking ability across probability thresholds.

Business Impact and Cost-Benefit Analysis

The economic analysis highlights substantial value from implementing predictive modeling. Using conservative cost estimates of $8,000 per false positive (wasted recruitment resources) and $15,000 per false negative (missed talent opportunity), the financial impact was significant:

Return on Investment

  • Logistic Regression: Annual savings of $334,000 with 568% first-year ROI
  • Voting Ensemble: Annual savings of $304,000 with 305% first-year ROI

Over three years, these exceptional figures increase to 1902% and 1115% ROI respectively, assuming implementation costs of $50,000-$75,000 including development, validation, and deployment¹⁵.

Student Profiling and Intervention Strategies

The analysis enabled evidence-based student profiling for targeted interventions:

High Placement Probability Students (85-95% chance)

  • Experience-Weighted Test scores above 75
  • Arts academic backgrounds
  • Commerce & Management specializations
  • Marketing & Finance focus areas

These students require minimal intervention but benefit from advanced placement preparation and networking opportunities.

At-Risk Student Profiles (40-60% chance)

  • Students with work experience (paradoxically)
  • Female students facing systematic barriers
  • Science & Technology graduates
  • Below-average test performance

Intervention Framework: Approximately 25 high-risk students require early career counseling with expected 25-40% placement improvement at $3,000 investment per student. Female students need enhanced career support to counter systematic barriers, while technical graduates benefit from cross-functional business training¹⁶.

Conclusions and Implications

Key Findings Summary

This research successfully developed a robust framework for predicting graduate placement outcomes while uncovering several counter-intuitive insights. The Voting Ensemble emerged as the optimal model, balancing accuracy (85.58%) with superior reliability (AUC: 0.932) for practical deployment.

The domain-driven feature engineering proved highly effective, with the Experience-Weighted Test Score becoming a top predictor. However, the most significant finding was the negative impact of work experience on placement odds, challenging conventional wisdom and highlighting the complexity of recruitment dynamics.

Business Value and Strategic Impact

The cost-benefit analysis demonstrated exceptional ROI, with projected annual savings exceeding $300,000 and first-year returns above 300%. This provides a compelling business case for implementing predictive analytics in educational settings.

Practical Recommendations

For Institutions:

  • Deploy the Voting Ensemble as a decision-support system for early identification of at-risk students
  • Develop targeted intervention programs addressing gender bias and technical graduate challenges
  • Investigate the work experience paradox through qualitative research with recruiters

For Students:

  • Arts and Commerce graduates should leverage their inherent placement advantages
  • Students with work experience need specialized coaching to articulate their value effectively
  • Female students require additional career support and confidence-building programs

Future Research Directions

Future work should focus on multi-institutional validation to confirm the generalizability of these findings, particularly the counter-intuitive work experience effect. Longitudinal studies tracking cohorts over time would provide dynamic insights into changing recruitment patterns.

Additionally, causal inference techniques should be employed to move beyond correlation and establish whether specific interventions directly improve placement outcomes¹⁷.


References

  1. Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics: The New Science of Winning. Harvard Business Review Press.

  2. Pascarella, E. T., & Terenzini, P. T. (2005). How College Affects Students: A Third Decade of Research. Jossey-Bass.

  3. Alturki, S., Alturki, N., & Stuckenschmidt, H. (2021). Using Educational Data Mining to Predict Students’ Academic Performance for Applying Early Interventions. Journal of Information Technology Education: Innovations in Practice, 20, 121-137.

  4. Shukla, M., & Malviya, A. K. (2018). Analysis and Comparison of Classification Algorithms for Student Placement Prediction. International Journal of Computer Sciences and Engineering, 6(6), 69-81.

  5. Çakıt, E., & Dağdeviren, M. (2021). Predicting the percentage of student placement: A comparative study of machine learning algorithms. Education and Information Technologies, 26, 6817-6839.

  6. Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.

  8. Benabou, A., & Touhami, F. (2025). Integration of Artificial Intelligence in Human Resource Management: Analyzing Opportunities, Challenges, and Human-AI Collaboration. Management and Accounting Review, 24(1).

  9. Heaton, J. (2016). An empirical analysis of feature engineering for predictive modeling. Proceedings of the IEEE SoutheastCon, 1-7.

  10. Jalal, N., Mehmood, A., Choi, G. S., & Ashraf, I. (2022). A novel improved random forest for text classification using feature ranking and optimal number of trees. Journal of King Saud University-Computer and Information Sciences, 34(6), 2733-2742.

  11. Usha, P. M., & Balaji, N. V. (2021). A comparative study on machine learning algorithms for employee attrition prediction. IOP Conference Series: Materials Science and Engineering, 1085(1), 012029.

  12. Barman, S., Biswas, M. R., Sultana, M., Nahar, N., Imam, M. H., Mahmud, T., Kaiser, M. S., Hossain, M. S., & Andersson, K. (2024). A Two-Stage Stacking Ensemble Learning for Employee Attrition Prediction. Lecture Notes in Networks and Systems, 119-132.

  13. Maaliw, R. R., Lagman, A. C., Ugalde, B. H., Ballera, M. A., & De Leon, M. A. (2022). Employability Prediction of Engineering Graduates Using Ensemble Classification Modeling. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), 1-6.

  14. Nair, S. K., & Ghosh, S. (2006). Factors Affecting the Placement Prospects of Students: An Exploratory Study. Vision: The Journal of Business Perspective, 10(1), 41-49.

  15. Adekunle, B. I., Chukwuma-Eke, E. C., Balogun, E. D., & Ogunsola, K. O. (2021). A Predictive Modeling Approach to Optimizing Business Operations: A Case Study on Reducing Operational Inefficiencies through Machine Learning. International Journal of Multidisciplinary Research and Growth Evaluation, 2(1), 791-799.

  16. Nasrin Akhter, M. S. A., Muhammad Siddique, Rukhsana Abbas. (2022). Exploring the Role and Importance of Career Counselling in Developing Awareness of Graduate Students’ Career Choices during Covid 19. Multicultural Education, 7(11), 603.

  17. Naser, M. Z. (2024). Causality and causal inference for engineers: Beyond correlation, regression, prediction and artificial intelligence. WIREs Data Mining and Knowledge Discovery, e1533.