Background: In everyday oncology practice, routine blood tests and a small panel of tumour markers are almost always requested when a patient is being worked up for possible malignancy. With the rise of machine‑learning methods, it is tempting to treat these familiar measurements as raw material for automated cancer prediction. In reality, it is not clear how far such data can help once a patient has already entered a specialist pathway, and careless modelling may easily introduce target leakage. Objective: We set out to explore, in a straightforward way, how cancer diagnosis (cancer vs no cancer) relates to a set of basic demographic, haematological, biochemical, and clinical variables in a cancer‑enriched cohort, and to reflect on what this means for neural‑network based prediction. Methods: We undertook a secondary, analytical cross‑sectional study using 1,000 cases from the Cancer Risk Stratification Using Lab Parameters dataset. Age, sex, smoking status, family history, complete blood count indices, blood glucose, CA‑125, PSA, CEA, risk level, stage, treatment outcome and survival were extracted. Cancer status was coded as cancer vs no cancer. Neural network analysis was conducted. Results: Just over four out of five patients in the cohort had a malignant diagnosis. Apart from haemoglobin, most routine laboratory and biochemical values did not differ significantly between cancer and non‑cancer cases, and the raw means generally lay within conventional reference ranges in both groups. Haemoglobin was modestly but significantly lower in patients with cancer. Cancer status was, by definition, strongly linked to stage and showed a weaker relationship with survival. Conclusion: In this referred, cancer‑enriched population, familiar clinical risk factors and single routine laboratory parameters offered little additional discrimination beyond the simple presence or absence of a cancer stage. Haemoglobin behaved as a non‑specific indicator of overall illness rather than a diagnostic test.
Key words: Cancer diagnosis, routine laboratory tests, neural networks, target leakage, risk stratification.
|