FormalPara Take-home message

Clustering analysis of very old patients in critical condition reveals predictive phenotypes on admission to the intensive care unit. Information about geriatric characteristics can be used for planning tailored interventions.

Introduction

The outcome of very old patients admitted to the intensive care unit (ICU) is determined by both the severity of the acute condition and the age-related decline of resilience to stress. However, that decline is characterised by substantial inter-individual heterogeneity which, of note, is considered a hallmark of the ageing process [1]. Thus, it is unsurprising that disease severity scores developed for critical conditions in younger cohorts show poorer prognostic discrimination in cohorts of very old individuals [2]. Geriatric conditions in very old patients, such as frailty, multimorbidity, cognitive impairment and functional deficits in daily life, were shown to delineate groups with different prognoses in that population [3,4,5,6]. The clinical frailty scale (CFS) was introduced into clinical practice as a prognostic aid to triaging individuals for ICU admission during the initial surge of the coronavirus disease 2019 (COVID-19) pandemic [7]. However, the variability and complexity of very old patients appears to require a more holistic approach, using a broader spectrum of patient characteristics for prognostic stratification [8].

Substantial heterogeneity within a patient population and non-linear relationships between patient characteristics may interfere with the accuracy of techniques for predictive modelling [9, 10]. Clustering analysis is an intuitive approach to utilise these features advantageously to classify individuals by their similarity [11]. This technique is gaining popularity since it is an efficient approach to exploring high-dimensional datasets for identifying phenotypes (clusters) of patients with distinct characteristics, indicating specific needs and divergent outcomes in healthcare [12,13,14]. The additional information gained from clusters may, thus, enhance prognostic accuracy and help to develop interventions targeting the deficiencies shared by the members of a particular phenotype [15,16,17,18,19,20,21,22]. For example, Seymour et al. [21] identified four sepsis phenotypes with 28-day mortality varying from 5 to 40%. One of these phenotypes representing 27% of patients was characterised by an enhanced inflammatory response which might benefit from tailored interventions in the future. Very few of the past studies, however, focused on very old patients with critical conditions [5].

The VIP2 study examined the association of survival with acute and geriatric features in ICU patients aged 80 years or older [23]. The current study investigated the heterogeneity within that population by clustering analysis of patient characteristics recorded on admission to the ICU. We hypothesised that acute and geriatric characteristics delineate distinct phenotypes which can assist early prognostication and indicate options for anticipatory interventions, such as early rehabilitation in frail patients with reversible conditions or palliative care at the predicted end of life. To this end, particular emphasis was put on the robustness and information content of phenotypes.

Patients and methods

Patients

The Very elderly Intensive care Patient (VIP)2 study was a prospective observational study in patients aged 80 years or older admitted as emergencies to ICUs in 22 countries [23]. The study objective was to investigate the impact of acute and geriatric patient characteristics on survival in ICU and after 30 days. National coordinators were responsible for registering ICUs and obtaining ethics committee approval in their countries. Participating units recruited consecutive patients who met the above demographic and clinical criteria during any 6-month period between May 2018 and May 2019. This new study included all patients from the VIP2 dataset with complete data and, initially, without limitations (withholding or withdrawing) of life-sustaining treatment (LST) (Figure S1, supplementary material). The latter criterion was introduced to minimise the impact of subjective and variable decision-making on outcome [24]. For sensitivity analysis, we later included patients with limitations of LST.

To validate the findings in the VIP2 population with an independent dataset, we extracted a subset of patients from the COVID-19 in VIP (COVIP) study, a prospective observational study of old patients admitted to the ICU with COVID-19 in 14 countries between March and December 2020 [25]. This subset encompassed all patients aged 80 years and older with complete data for age, sequential organ failure assessment (SOFA) score, CFS and outcome and without limitations of LST (Figure S1, supplementary material).

Measurements

The following patient characteristics were recorded in the VIP2 study on admission to the ICU [23] and used for phenotyping in the current study: age, gender, residence prior to admission, reason for admission to ICU, SOFA score and its sub-scores (Table 1) as well as the following geriatric features: CFS, informant questionnaire on cognitive decline in the elderly (IQCODE), Katz score to assess activities of daily living, comorbidity and polypharmacy score (CPS) (Sect. 2 in the supplementary material). Ventilatory and vasopressor support and renal replacement therapy were recorded as interventions. Death in ICU and within 30 days, regardless of place, and length of stay in ICU were documented as outcome. The COVIP study recorded a similar set of variables, except IQCODE and CPS [25].

Table 1 Admission characteristics, interventions and outcome for patients aged 80 years or older without limitations of life-sustaining treatment from the VIP2 and COVIP studies

Phenotyping by clustering analysis

Clustering is a method for grouping of patients who are assigned different labels based on the (dis)similarity of their characteristics [11]. We implemented a variant of the K-means algorithm which partitions patient populations into a pre-defined number K of clusters in which distances between the patients' data points are minimised [11]. This algorithm was embedded in a three-step procedure to obtain robust clusters (phenotypes): (1) estimation of an upper bound KUB for the total number of clusters, (2) initial clustering of the patient population with K = KUB and (3) repeated re-sampling by bootstrapping and re-clustering to identify those clusters which are robust against random variations of the data. Although the initial number K of clusters was set, the final number of phenotypes was determined by the intrinsic properties of the data. Moreover, we examined if the number of phenotypes can be reduced without substantial loss of information by grouping clusters according to their similarity. Additional details about these methods are provided in the supplementary material.

For internal validation, supervised classifiers (gradient boosting) were used to verify the discrimination of phenotypes. The classifiers were trained on the phenotype labels and based on the same set of patient characteristics used for clustering. The area under the receiver operating characteristic (AUROC) was obtained as a measure of discrimination between patients of a specific phenotype and all other patients. External validation of predictive phenotyping was based on comparing the mortality of VIP2 phenotypes with that of similar patients from the COVIP study. Similarity was approximated by measuring the distance between the COVIP patient’s data points to the centroids of VIP2 phenotypes in the data space using three simple and informative patient characteristics (age, SOFA, CFS) having the smallest number of missing values [23].

Statistical analysis

Descriptive characteristics were reported as medians with inter-quartile ranges (IQRs) for continuous variables and proportions for nominal variables. One-way ANOVA test was used to examine differences of continuous variables and Fisher’s exact test and its variants for nominal variables. The resulting p values were adjusted to control for false discovery rates by applying the Benjamini–Hochberg procedure [27]. Odds ratios with 95% confidence intervals were calculated for binary variables (outcome, interventions) for each phenotype using the phenotype with the lowest mortality as reference [28]. All analyses were performed using Python 3 (Python Software Foundation, Beaverton, OR, USA).

Results

Table 1 depicts the characteristics of the patient cohorts from the VIP2 (n = 1977) and COVIP (n = 280) studies without limitations of LST. Based on the complete set of acute and geriatric characteristics recorded on admission to ICU, clustering analysis of the VIP2 cohort identified seven phenotypes encompassing 26% of this patient population (Table 2). Internal validation of phenotyping by supervised classification yielded AUROCs greater than 0.95 for distinguishing each of these phenotypes from all other patients in that cohort. Figure S7 (supplementary material) shows the distribution of countries with respect to these phenotypes.

Table 2 Admission characteristics, interventions and outcome for phenotypes in the VIP2 study cohort

Phenotypes F and G (Table 2, Fig. 1A) are considered ‘geriatric’ since most of these patients are frail (CFS ≥ 5), have limitations in their activities of daily living (Katz < 5) or cognitive impairments (IQCODE ≥ 3.5). However, these two phenotypes significantly differ in their SOFA scores (p < 0.001). In contrast, phenotype A is characterised by the absence of geriatric features and very low SOFA scores. The other ‘non-geriatric’ phenotypes B–E are distinguishable by age (phenotype C), respiratory and cardiovascular (phenotypes B and D) or renal SOFA sub-scores (phenotype E). Forty-five percent of patients in the latter phenotype received renal replacement therapy in the ICU.

Fig. 1
figure 1

Profiles of patient characteristics (median values) for all patients and phenotypes A–G, displayed as clock hours in each plot, in the VIP2 cohort without limitations of LST (A) and in the cohort encompassing all VIP2 patients including those with limitations of LST (B). The grey plots on the right side depict the number of patients in each phenotype. The magnitude of the measurement is displayed between circle centres and the 3 o'clock positions as respective values

Sensitivity analyses were performed with respect to the inclusion of patients with limitations of LST and the number of phenotypical categories. Phenotyping of all VIP2 patients with complete data, i.e. after adding patients with limitations of LST (n = 895) to the main study cohort (n = 1977), resulted in the same number of phenotypes with similar profiles (Fig. 1B). Major differences were only found for phenotype D where the inclusion of patients with limitations of LST led to an increase of the fraction of patients with severe respiratory or cardiovascular failure. An attempt to reduce the total number of phenotypical categories by relaxing the reproducibility criterion for clusters and grouping overlapping clusters caused a loss of information about geriatric characteristics (Sect. 3, supplementary material).

Phenotyping was based on information available on admission to ICU and did not consider later events. The phenotypes, however, were found to be associated with variable rates of organ support in ICU as well as a different ICU and 30-day mortality (Tables 2 and 3, Fig. 2). Phenotype A was characterised by very low mortality (2% in ICU, 3% after 30 days) and the lowest rates of organ support. The highest mortality was found for phenotype G (50% in ICU, 57% within 30 days) that also showed the highest rate of respiratory and cardiovascular support. Odds ratios for mortality and organ support were calculated with phenotype A as reference (Table 3). Note that large odds ratios and wide confidence intervals for phenotype G are related to relatively high event rates in a small sample.

Table 3 Odds ratios with 95% confidence intervals for invasive ventilation, vasopressor support, mortality in ICU and within 30 days analysed as binary variables with phenotype A as reference
Fig. 2
figure 2

Mortality in ICU (A) and within 30 days (B) for the phenotypes from the VIP2 cohort (dots) and patients from the COVIP study (triangles) who are similar to a specific VIP2 phenotypes according to age, CFS and SOFA score. The box plots show the inter-quartile distribution of mortality for VIP2 patients who are similar to the respective phenotype

We also investigated whether the mortality estimates for phenotypes can be used for patients who are similar but not identical to these phenotypes. Figure 2 depicts the distributions of ICU and 30-day mortality for all clusters obtained during repeated bootstrapping runs and associated with specific phenotypes. These clusters can be considered random samples of the immediate neighbourhood of phenotypes within the data space. The small width of these distributions indicates that there is only a small variation of mortality in this neighbourhood suggesting that patients who are formally outside but still similar to these phenotypes have a comparable outcome.

Finally, we examined whether outcome predictions for phenotypes in the VIP2 dataset are generalisable. In the absence of another study with an identical design and set of patient characteristics, we chose the dataset of the recent COVIP study [25] for that purpose. Patients from the COVIP study, who were 80 years or older and admitted to the ICU with COVID-19, were associated with the closest VIP2 phenotype based on similarity with regard to three admission characteristics (age, SOFA, CFS). ICU and 30-day mortality in these COVIP sub-groups is depicted in Fig. 2. Although the absolute mortality differed between the phenotypes from VIP2 and the associated sub-groups from COVIP, the trend, especially with phenotype A representing the lowest and phenotype G the highest mortality, was reproducible.

Discussion

This study investigated the heterogeneity of very old patients on admission to the ICU. Using clustering analysis with rigorous criteria for reproducibility, we identified seven phenotypes in patients from the VIP2 study. These phenotypes represent distinct profiles of acute and geriatric characteristics and are associated with different mortality. Considering that up to five distinct phenotypes are required to describe populations of younger patients with sepsis [19, 21], seven phenotypes appear to be an appropriate number to reflect the heterogeneity in very old patients admitted to the ICU with a broad spectrum of conditions. Sensitivity analyses further validated that number. Although the fraction of patients formally represented by these phenotypes was rather small, we demonstrated that their prognostic information can also be used for additional patients who have similar characteristics on admission.

There are considerable differences in outcome across the spectrum of phenotypes with phenotype A having the lowest ICU and 30-day mortality. The two phenotypes with a prominent geriatric profile (phenotypes F and G) exhibit the highest mortality. The combination of a geriatric profile with a high SOFA score (phenotype G) resulted in an ICU mortality of 50% emphasising the major impact of reduced resilience in geriatric patients. Phenotype C includes the oldest old patients with low SOFA scores and only very mild geriatric characteristics. A 30-day mortality of < 10% in this group underlines that advanced age alone should not be considered a negative prognosticator. Younger patients with substantial organ failure but without frailty or other major chronic impairments were found in phenotypes B, D and E with a 30-day mortality between 12 and 14%. These findings underline that mortality in very old ICU patients is determined by geriatric vulnerabilities and the acute illness. Both types of conditions need to be considered when prognosticating in that heterogeneous population.

Previous studies on old patients in ICU did not record the ensemble of geriatric and acute characteristics to a similar extent as the VIP2 study did. In one of the few other investigations, Ferrante et al. [4] showed an association of outcome with markers of organ failure, cognitive impairment and functional disabilities. However, that study did not distinguish between patients with limitations of LST and those without. Survival data were thus impacted by subjective decisions about LST. In contrast, the design of the VIP2 study allowed us to identify and exclude patients with limitations of LST from outcome analysis [23]. Our findings, therefore, mainly reflect the natural course of critical conditions in very old individuals without major interference by the variable assessment of decision-makers [24]. In comparison to VIP2, the COVIP study [25] had a similar design but recorded a smaller set of geriatric features. Using three simple admission characteristics (age, CFS, SOFA), the trend of mortality differences between phenotypes found in VIP2 could be replicated in the COVIP dataset. This indicates generalisability of our findings despite marked differences in absolute mortality between VIP2 and COVIP, which can be attributed to the substantial severity of COVID-19 within the population of very old individuals [29]. The reductionist approach to approximate similarity between patients with only three easy to obtain characteristics may facilitate comparisons in future studies but requires further validation. Of note, phenotyping based on this limited set of variables would not result in identical clusters, since several phenotypes, e.g. renal phenotype E, are mostly defined by variables not included in that set.

How can the knowledge about phenotypes be used in clinical practice? Predicted 30-day mortality smaller than 10% permit early prognostication of a favourable outcome for phenotypes A and C. These phenotypes represent patients of all age groups with low SOFA scores and without marked geriatric characteristics. Phenotypes B, D and E are close to that prognostic category. Frail patients with advanced cognitive and functional impairments (phenotypes F and G) have a guarded prognosis, especially those with a substantial degree of organ failure (phenotype G). Importantly, phenotype G has an ICU mortality of approximately 50%, i.e. prognostication on admission equals random guessing. This uncertainty is reflected by the substantial variability of ICU length of stay in these patients. In this situation, more information, such as about the individual’s response to treatment during a time-limited trial (TLT), will be necessary for more informative outcome predictions [30, 31]. However, the specific management of these patients, including decisions about ICU admission and TLT, may depend on the preferences within a particular healthcare system.

The geriatric components of the phenotypes provide actionable information for anticipatory management in ICU. The higher mortality of phenotypes with marked geriatric features reflect the enhanced vulnerability and reduced resilience to stress [32]. This situation should trigger targeted interventions [13] which may become part of standard care in ICU. This could especially be pertinent to the prevention and management of delirium in patients with pre-existing cognitive impairments in phenotypes F and G [33], dealing with problems of polypharmacy by thoroughly reviewing prescriptions in individuals with multimorbidity, e.g. in phenotype F, [34] and planning early rehabilitation for patients with frailty and functional impairments [35]. For example, the 82-year-old patient with frailty (CFS 6) who requires only limited ventilatory but no other organ support (phenotype F) may substantially benefit from these interventions to return to the previous level of functioning. Importantly, the 95-year-old patient with a low SOFA score, who is only borderline frail (phenotype C) and has an excellent prognosis, might be the one who benefits most from early rehabilitation. Specific interventions though, will differ from those for phenotype F as suggested by different functional baselines (Katz score). Preparing discussions about palliative care might be indicated for individuals matching phenotype G, such as an 85-year-old ventilated patient with cardiovascular as well as renal failure and CFS 7, who already required substantial assistance at home [36]. Importantly, these interventions should be coordinated at an interdisciplinary level and require organisational adjustments, e.g. by facilitating ICU discharge to suitable step-down units. Whether these measures can eventually offer a benefit for survival and functional outcome remains to be elucidated in prospective trials. Recruitment to such trials can now focus on specific phenotypes [37]. This situation is analogous to the current research into new sepsis phenotypes [21].

The most important limitation of our study is the focus on mortality in the short term. Despite the consensus that quality of life is the most important outcome for very old ICU patients, short-term mortality still plays an important role. First, one can argue that evaluating quality of life necessitates physical survival and it is still important for benchmarking [38]. Second, in various countries, triage criteria for admission to ICU during periods of resource constraints, in particular during the COVID-19 pandemic, included the probability of short-term survival [39]. More precise predictions based on both acute and geriatric characteristics are necessary to support a more accurate decision-making in that situation [40].

In conclusion, we detected reproducible phenotypes in very old ICU patients based on both acute and geriatric characteristics available on admission to ICU. These phenotypes are associated with substantial differences in short-term mortality. Future studies can focus on specific interventions tailored for specific phenotypes and their impact on long-term outcome measures.