Background: Diabetes mellitus (DM) is highly prevalent and often remains undiagnosed until complications appear, especially in low‑ and middle‑income countries. Simple tools that use routinely collected clinical and demographic variables may support earlier identification of individuals at increased risk. Objective: This study aimed to build a supervised achine‑learning model to classify individuals as diabetic or non‑diabetic using a large publicly available dataset, and to identify which variables contributed most to the model decisions. Methods: We analysed a cleaned subset of 89,540 records from a Kaggle diabetes dataset. A multilayer perceptron artificial neural network (ANN) was trained and tested on separate subsets. Model performance was evaluated by overall accuracy and misclassification rates, and post‑hoc variable importance scores were used to summarise the contribution of each predictor. Results: The ANN achieved an overall prediction accuracy of 96.8% in both the training and testing samples. Most records were correctly classified, although the error pattern suggested that non‑diabetic cases were recognised more easily than diabetic cases. Blood glucose, HbA1c and body mass index (BMI) showed the highest importance values, whereas demographic and lifestyle variables contributed less to the classification.Conclusion: In this dataset, an ANN based on simple clinical and demographic variables was able to distinguish between diabetic and non‑diabetic records with high internal accuracy and a plausible pattern of variable importance. The model could form the basis for a practical screening aid, but it requires external validation and further work on handling class imbalance and explainability before use in routine care.
Key words: Diabetes mellitus, Machine learning, Artificial neural network, Risk prediction, Feature importance
|