Manuscript submitted · Code release upon publication

Routine laboratory trajectories encode the onset of organ-level complications in cancer

A transformer trained on longitudinal lab data from 3,905 patients predicts 162 treatment-associated complications up to two years before clinical onset, using only routine blood work already collected in standard oncological care.

Authors
Jannik Lübberstedt, Krischan Braitsch, Jacqueline Lammert, Christof Winter, Florian Gabriel, Tristan Lemke, Christopher Zirn, Markus Graf, Friedrich Puttkammer, Hartmut Häntze, Johannes Moll, Anirudh Narayanan, Andrei Zhukov, Fabian Drexel, Zeineb Ben Chaaben, Sebastian Ziegelmayer, Su Hwan Kim, Marion Högner, Jan Kirschke, Florian Bassermann, Marcus Makowski, Christian Wachinger, Lisa Adams* & Keno Bressem*
TUM University Hospital, Klinikum rechts der Isar · German Heart Center Munich · Charité Berlin
3,905
Patients
2.8M
Lab measurements
162
Predicted diagnoses
39
Routine analytes

Routine laboratory panels drawn during cancer treatment constitute longitudinal physiological recordings of organ function, yet their temporal structure is discarded by single-timepoint prognostic tools. A transformer trained on 2,777,595 laboratory measurements from 3,905 patients with multiple myeloma or ovarian cancer predicted the two-year onset of 162 treatment-associated complications, including therapy-related myelodysplastic syndromes, spanning eight clinical categories, achieving 1.5- to 6.1-fold enrichment above prevalence at the group level. It matched or outperformed non-sequential baselines across grouped endpoints (AUROC gains up to +0.11), demonstrating that longitudinal laboratory trajectories capture evolving complication-specific physiology inaccessible from isolated measurements. Predictions generalised across both cancers, divergence concentrating in disease-specific complications, and biomarker masking recovered signatures consistent with established pathophysiology. External validation on MIMIC-IV and MMRF CoMMpass confirmed transferability across independent healthcare systems (AUROC up to 0.85). Routine oncological laboratory data encode organ deterioration weeks to months before clinical onset, enabling complication-specific surveillance without additional testing infrastructure.

Temporal signal scales with biological tempo
The transformer's advantage over cross-sectional baselines is largest for slowly evolving complications: +0.11 AUROC for MDS, +0.09 for fungal infections, +0.05 for cardiovascular disease. These are exactly the endpoints offering the longest window for preventive intervention.
Biological coherence across cancers
Five of eight grouped endpoints generalised across myeloma and ovarian cancer with AUROC differences ≤0.07. Where predictions diverged, they tracked observable biology: peritoneal metastases dominate in ovarian cancer (28% vs 1% prevalence), pulmonary aspergillosis is confined to myeloma.
Interpretable biomarker signatures
Feature masking recovers diagnosis-specific patterns aligned with pathophysiology, without prior biological constraints. Glucose dominates diabetes, filtration markers (eGFR) dominate kidney disease, red blood cell count dominates MDS, age leads cardiovascular and mortality predictions.
Cross-system external validation
Applied without retraining to 1,626 MIMIC-IV and 1,143 MMRF CoMMpass patients. Renal and metabolic endpoints transferred most strongly: individual CKD stages reach AUROC 0.85, with type 2 diabetes matching internal discrimination across both cohorts.
No additional testing required
All predictions derive from 39 routine laboratory analytes already collected at every clinical encounter. The data infrastructure for anticipatory toxicity management is already in place in standard oncological care.
Endpoint AUROC AP Enrichment Δ vs. best baseline
Myelodysplastic syndromes 0.71 0.081 6.1× +0.11
Fungal infections 0.75 0.101 3.1× +0.09
Kidney disease 0.74 0.372 2.5× 0.00
Metastatic disease 0.73 0.483 2.1× 0.00
All-cause mortality 0.70 0.213 1.8× +0.04
Type 2 diabetes 0.65 0.074 1.8× +0.04
Cardiovascular diseases 0.68 0.480 1.6× +0.05
Bacterial infections 0.66 0.403 1.5× +0.01
Kidney disease
0.69 / 0.67
MIMIC / MMRF · internal 0.74
Type 2 diabetes
0.65 / 0.68
MIMIC / MMRF · internal 0.65
CKD stage 3–5
0.83–0.85
MIMIC · individual endpoints
All-cause mortality
0.64 / 0.69
MIMIC / MMRF · internal 0.70
Metastatic disease
0.67
MIMIC · internal 0.73
Myelodysplastic syndrome
0.66
MIMIC · internal 0.71 · 2.6× enrichment
01
Lab Trajectories
39 routine analytes segmented into 10–30 day intervals
02
Imputation
Transformer-based masked prediction fills missing values
03
Encoder
Bidirectional transformer (Qwen-3 architecture) processes sequences
04
Prediction
Multi-label sigmoid head predicts 162 diagnoses within 2 years
@article{lubberstedt2026labtrajectory, title = {Routine laboratory trajectories encode the onset of organ-level complications in cancer}, author = {L{\"u}bberstedt, Jannik and Braitsch, Krischan and Lammert, Jacqueline and Winter, Christof and Gabriel, Florian and Lemke, Tristan and Zirn, Christopher and Graf, Markus and Puttkammer, Friedrich and H{\"a}ntze, Hartmut and Moll, Johannes and Narayanan, Anirudh and Zhukov, Andrei and Drexel, Fabian and Ben Chaaben, Zeineb and Ziegelmayer, Sebastian and Kim, Su Hwan and H{\"o}gner, Marion and Kirschke, Jan and Bassermann, Florian and Makowski, Marcus and Wachinger, Christian and Adams, Lisa and Bressem, Keno}, journal = {Manuscript submitted}, year = {2026} }