Introduction

The discovery of the JAK2V617F mutation in 20051 represented a major breakthrough in the understanding of the molecular pathogenesis of Philadelphia chromosome-negative chronic myeloproliferative neoplasms (MPN).2 The JAK2V617F mutation is harbored by nearly all polycythemia vera (PV) patients and 50–60% of patients with essential thrombocythemia (ET) and primary myelofibrosis (PMF).3 The resulting constitutively activated JAK/STAT signaling is considered central to the pathogenesis4 and phenotype of MPN and therefore serves as a rational drug target for therapy.5 Additional mutations were described at MPL codon 10 (W515L/K/A) in 5–8% of ET and PMF patients.6, 7, 8 However, the significant proportion of JAK2V617F- and MPLW515L-negative MPN cases required additional effort to identify novel genetic lesions contributing to disease pathogenesis. Recent studies based on single-nucleotide polymorphism (SNP) array-based karyotyping resulted in the detection of several copy number alterations such as a loss of heterozygosity and a copy neutral loss of heterozygosity in genomic regions containing multiple members of the polycomb repressive complex 29, 10, 11, 12 and other genes previously implicated in different hematological malignancies.13 Moreover, via candidate gene sequencing, several novel bona fide somatic mutations14 were detected at frequencies ranging from 1% to 20–30% in genes frequently mutated in other myeloid neoplasms, as well as MDS and acute myeloid leukemia, and some of these mutations have been positively correlated with clinical outcome.15, 16, 17, 18 Nevertheless, because a significant proportion of MPN cases are negative for molecular aberrations, a complete portrait of MPN genetic abnormalities remains to be depicted. The two major theoretical and technical drawbacks to the identification of new somatic mutations are represented, respectively, by the huge number of genes potentially involved in MPN tumorigenesis and by the availability of ‘pure’ germline control DNA. Buccal swabs and saliva have generally been considered as readily available sources of non-hematopoietic DNA, but detection of the JAK2V617F mutation in at least some of these samples suggested the presence of myeloid cell contamination. In addition, no evidence for germline transmission of the JAK2V617F mutation has been elucidated until now.

Given these previous results and with the goal of further exploring the molecular complexity of MPN, we investigated the incidence of mutations in genes already known to be implicated in cancer pathogenesis. Therefore, we designed a two-tiered next-generation sequencing (NGS) study. We first evaluated the somatic mutational status of a mostly inclusive list of known cancer-related genes in a 25 MPN sample learning set. We then tested the recurrence of the truly somatic variants in a broader validation set of 189 patients via an amplicon-sequencing NGS approach.

Materials and methods

Patients and samples

Patients were diagnosed as having PV, PMF and post PV-MF according to the World Health Organization (WHO)19 and the International Working Group for Myelofibrosis Research and Treatment (IWG-MRT) criteria.20 All subjects provided informed written consent, and the study was performed under the Florence Institutional Review Board’s approved protocol. The study was conducted in accordance with the Declaration of Helsinki. The presence of the JAK2V617F and MPLW515L mutations and the mutated allele burden were determined via quantitative real-time PCR (QRT–PCR), as previously described.21

The mutational status of ASXL1, EZH2, IDH1, as well as IDH2, SRSF2, TET2, DNMT3A and CBL was assessed using Sanger sequencing, as previously described.18 Cytogenetic analysis was performed on Giemsa-stained slides. All patients were examined within 1 year of diagnosis.

Granulocyte and CD3+ T cell purification and genomic DNA extraction

Granulocytes were obtained via the density gradient centrifugation of peripheral blood samples, and CD3+ cells were immunomagnetically selected (Miltenyi Biotec GmbH, Bergisch Gladbach, Germany) from the peripheral blood mononuclear cell fraction recovered from the density gradient. After sorting, CD3+ cells were expanded in vitro. The purity of the CD3+ cells was determined using flow cytometry. The culture conditions as well the DNA extraction quality control procedures are fully described in the Materials and Methods section of the Supplementary Information.

Target enrichment and 454 sequencing

A solution-based capture custom panel was designed for target enrichment according to the NimbleGen (Roche NimbleGen, Inc., Madison, WI, USA) guidelines. The final list of genes and miRNAs were obtained by combining the complete list of mutations present in the latest release of the Sanger Institute Cancer Gene Census Database (www.sanger.ac.uk/genetics/CGP/Census) with the most inclusive list of DNA repair genes present in the OMIM Database (www.ncbi.nlm.nih.gov/omim) as well as a manually curated literature screening. The custom design was inserted into a 5-Mb SeqCap EZ Choice Library (Roche NimbleGen, Inc., Madison, WI, USA) containing the exonic portions of 1400 genes and 600 miRNA coding sequences (Supplementary Table 1).

A sample library preparation was performed using 500 ng of DNA from the granulocyte and CD3+T-cell samples. Each step of the working procedure to perform sequencing runs on the Roche 454 GS FLX platform is fully described in the Materials and Methods section of the Supplementary Information.

Variant detection, filtering and classification

The processing of the samples, evaluation of genuine somatic mutations and their classification are entirely described in the Materials and Methods section of the Supplementary Information.

Recurring variant validation test

Recurrence testing for genuine somatic variants was performed using Ion AmpliSeq technology with an Ion Torrent Personal Genome Machine (PGM) platform. The Ion AmpliSeq panel design, sample processing, barcoding and sequencing are fully described in the Materials and Methods section of the Supplementary Information.

N-RAS c.35 G>A mutation analysis

An independent set of 139 patients with PMF were recruited from the archive in Florence and used for NRAS and KRAS mutation analysis (see Materials and Methods section of the Supplementary Information for details).

Statistical analysis

The χ2, Fisher’s exact test (2 × 2 table) or χ2 test for trend (larger contingency table) were used as appropriate to compare variables from different patient groups that had been categorized according to mutational status. The analysis of continuous variables among the groups was performed using the Mann-Whitney U test (two groups) or the Kruskal–Wallis test with the Dunn method for multiple comparisons. P<0.05 was considered to indicate statistical significance; all tests were two-tailed. Data were processed using SPSS Version 19.0 software (StatSoft, Tulsa, OK, USA).

Results

A total of 25 tumor granulocyte and paired germline samples comprised the learning cohort. The NGS samples had been collected at the time of diagnosis in 9 PV subjects and 11 PMF subjects, while the additional 5 DNA samples were obtained from 5 of the 9 PV patients at the time that they evolved to post-PV myelofibrosis.

The learning cohort for PMF was deliberately selected as being predominantly JAK2V617F-negative (9/11 subjects), with only 2 JAK2V617F-positive patients. The stratification of patients according to the dynamic international prognostic scoring system (DIPSS)22 and other clinical features of the cohort are summarized in Supplementary Table 2.

Identification of genuine somatic mutations in coding sequences and microRNAs

According to the NimbleGen procedure, the tumor granulocytic, germline salivary and CD3+ lymphocytic gDNA samples were sheared, barcoded and mixed in an appropriate number of template libraries that were subsequently captured. At the end of the gDNA sequence selection procedure, all the DNA libraries were checked using QRT–PCR to verify the relative fold enrichment of the panel; all libraries produced a fully satisfactory result and confirmed successful captures (data not shown). Next, each library was sequenced with 454 Titanium technology using the Roche GS FLX platform. To exclude unbalanced libraries, that is, libraries composed of a nonequimolar quantity of the samples, we checked for any possible unequal sequencing depth in the barcoded samples. As exemplified for the library shown in Supplementary Figure 1, balanced quantities of samples were pooled, captured and then processed.

DNA libraries were sequenced until the median 30-fold coverage was reached for each tumor or control sample. A very high capture specificity was observed (94% of unique reads in the target region) with a similar uniformity throughout the chromosomes (average standard deviation assessed to 1.6) (Figure 1).

Figure 1
figure 1

Enrichment uniformity landscape. The X-axis graphs the number of target regions included in the NimbleGen capture ‘cancer exome’ panel (approximately 29600 target regions in total). The light bars represent the number of target regions within the design, whereas the dark bars correspond to the effective number of enriched target regions for each chromosome. The Y-axis displays the chromosomes.

Genuine germline sample selection

To identify somatic mutations in the MPN learning dataset, we sequenced paired DNA samples from granulocytes and germline cells. The candidate mutations detected in the granulocytes were then screened by subtracting those occurring in the germline to enable the identification of the variants that could be reliably considered truly acquired somatic variants.

In the first set of experiments, we employed DNA obtained from paired saliva samples, but this DNA consistently presented variants belonging to the neoplastic clone, notably the JAK2V617F mutation, with a comparable allele burden. These results prompted us to consider that saliva samples could not be considered genuine germline sources due to contamination by myeloid cells. Thus, we replicated the experiments using expanded CD3+ T cell DNA for control samples, and a very low level of somatic contamination was found in just 1 DNA sample from CD3+ T cells. Table 1 displays the mutational burden comparison for JAK2, MPL and IDH2 in libraries prepared from different sources of DNA (granulocytes, saliva and CD3+ T cells). As a result, we discarded the data obtained from the salivary samples and selected for further analyses only those obtained from CD3+ T cells, which were definitively considered germline control samples.

Table 1 Comparison of allele burden for selected mutations in libraries prepared from different DNA sources

Somatic mutations identification in genes and microRNAs in the learning cohort

The tumor and paired germline sample data were both mapped against the human reference genome (hg19). A total of 11006 and 9691 unique variants in 1057 and 1039 genes for the germline and somatic samples, respectively, were detected, as shown in Supplementary Table 3.

The somatic variant identification procedure was intended to minimize the false positive somatic mutation rate. To this end, we used a two-step stringent approach. First, using a ‘somatic’ filter, we selected only DNA variants in tumor samples with no mutated reads either in the paired germline counterpart or in any other germline samples of the cohort. A ‘functional’ filter was then applied to eliminate the synonymous variants and all variants annotated in the 1000 Genomes database as having a frequency higher than 1% (see Materials and Methods for details). Supplementary Table 3 summarizes the narrowing of the unique detected variants after each of the analysis steps described above. Then, for evaluating for possible sequencing errors and the false positive rate, we validated these variants using a different type of NGS technology; specifically, we designed an Ion AmpliSeq panel containing all detected variants that was employed to re-sequence the same patient cohort via Ion Torrent PGM (1000-fold coverage).

Using this strategy, we estimated a very low sequencing error rate (<1%), and we finally confirmed 136 genuine somatic, non-synonymous mutations affecting 121 genes. Twenty-five percent of these mutations are indexed in the dbSNP archive, and 2% of these specific variants are listed in COSMIC catalogue. The majority of mutations (89%) were estimated to be ‘damaging’ by at least 1 of the 5 algorithms that we used to investigate disease-causing potential (PolyPhen2, SIFT, Provean, Mutation Taster, LTR) (Supplementary Table 4 and Figure 2).

Figure 2
figure 2

Circular diagram of mutations found in MPN. Chromosomes are illustrated in the outer perimeter. Grey dots show the ‘cancer exome’ regions of the NimbleGen panel, whereas the histograms show the captured (blue) and failed (red) target regions. MicroRNA or Gene Symbol with amino acidic change refers to the variants found in our cohort.

The vast majority of the identified somatic mutations were missense (92%), whereas the minority (8%) were indels (small insertions and deletions). Despite patients harboring different numbers of somatic mutations spanning from 1 to 21 variants (Table 2), only 14 genes appeared to be recurrently mutated (Figure 3) in at least two patients. It should be noted that the acquisition of additional mutations and/or the occurrence of loss of some mutations at the time of disease evolution from PV to post-PV myelofibrosis in patients for whom samples were available at both disease phases suggested the occurrence of sub-clone selection during disease evolution (Supplementary Table 5).

Table 2 Number of genuine somatic, non-synonymous mutations harboured by each patient. Variants in MPN known mutated genes are shown in bold
Figure 3
figure 3

Heatmap of the found known variants and of genes presenting one or more variants in two or more patients in the data set. The horizontal axis presents the sequenced complete dataset of patients, with the PV samples grouped on the left, the evolution to post-PV MF patients (PPV) in the center and the PMF on the right. The vertical axis illustrates the recurrently mutated gene as exemplified in the legend.

Five missense variants were identified in the 600 microRNAs coding sequences tested. These missense variants are summarized in Table 3. The MIR662, MIR663 and MIR542 sequences harbored missense mutations in their stem-loop coding region, and the MIR17 mutation was shown to affect the miR-17-5p mature miRNA sequence 5 bases downstream of its seed region.

Table 3 Somatic mutations in microRNAs coding regions

Somatic mutation recurrence in the validation cohort

To distinguish between the identified somatic variants and possible novel clonal drivers from clonal passenger mutations, we tested the recurrence of the above-described variants in a broader cohort of 189 patients diagnosed with PMF (91 samples, 48.2%), PV (50 patients, 26.4%) or post-PV myelofibrosis (48 samples, 25.4%). We utilized the Ion AmpliSeq panel and PGM sequencing as previously described to obtain an ultra-deep amplicon sequencing (see Supplementary Information for technical details) of the 141 variants, achieving a sample median of 1000-fold coverage. The clinical parameters of the patients comprising the validation set are summarized in Supplementary Table 2.

Excluding the JAK2, MPL, IDH2, ASXL1, TET2, CBL and DNMT3A known variants, 80 patients (42% of total) harbored at least 1 of the 141 somatic mutations tested for recurrence. Thirty somatic mutations (18.9% of the total) were displayed in at least 1 of the 189 patients; these were all missense mutations with the exception of a single frameshift mutation occurring in the BRD4 gene (Table 4).

Table 4 Recurrent mutations in the validation dataset

In addition, 8 genes (SCRIB, MIR662, BARD1, TCF12, FAT4, DAP3, POLG and NRAS) appeared as recurrently mutated in the cohort, some of which (SCRIB 7.9%, MIR662 7.4% and BARD1 5.3%) were more frequently mutated than previously identified, well-known mutational hotspots18 (Table 4).

Correlations between recurrent mutations and clinical features: NRAS c.35 G>A mutation analysis

The groupwise associations between recurrent mutations and clinical and biological features were assessed using the χ2 test or Fisher’s exact test. Possibly because of the small number of subjects harboring each unique mutation abnormality, no significant association with clinical features was found, with the exception of mutations at codon 12 of NRAS (NRASG12V and NRASG12D) that occurred in 5 out 102 PMF patients included in the learning and validation cohorts. This association resulted in P-values <0.05 for the highest DIPSS-plus score categories. DIPPS-plus23 effectively combines prognostic information from the DIPSS with karyotype, platelet count, and transfusion status to predict overall survival in PMF. This evidence prompted us to screen the NRAS gene for mutations in codon 12 in an independent cohort of 66 PMF patients via high-resolution melting analysis followed by Sanger sequencing validation, finding an additional 3 mutated subjects. Moreover, because NRAS and KRAS mutations have been described as possibly mutually exclusive, we also tested this cohort for KRAS mutations. As a whole, we found 8 of 168 MF patients (4.7%) harboring a heterozygous NRAS mutation in codon 12 (5 harbored the NRASG12V mutation and 3 harbored the NRASG12D mutation). In addition, 3 patients of 168 evaluated (1.6%) harbored heterozygous KRAS mutations (G12R, G12S, and G13D); the patient carrying the KRASG13D mutation also harbored the NRASG12V variant. Of note, NRAS variants preferentially clustered among JAK2 wild-type subjects since only 1 of 8 mutated NRAS patients also harbored the JAK2V617F.

Finally, we confirmed a significant association of NRAS variants with DIPSS-plus scoring (P=0.022) since all the 8 mutated patients were included in the highest (intermediate-2 and high) risk category, as shown in Table 5.

Table 5 Clinical features of PMF patients screened for NRAS mutations

Discussion

The discovery of the JAK2V617F mutation represents the single most significant contribution to the characterization of the pathophysiology of MPN thus far and has major implications also for the treatment of these malignancies. Mutations characterizing genes other than JAK2 involve less than the 20% of MPN patients and often are co-expressed with the JAK2 mutation; thus, even though the impact of the mutational status of a specific set of genes (ASXL1, EZH2, SRSF2 and IDH) on disease outcome has been demonstrated in patients with PMF,18 a comprehensive molecular landscape of MPN has not yet been completely depicted. Here, we performed the first large targeted NGS analysis aimed at exploring the mutational status of the broadest panel of known cancer-associated genes in MPN. This is the first report describing a robust NGS study design and an accurate data analysis pipeline aimed at minimizing the somatic mutation false positive rate. We also demonstrated that saliva samples are often heavily contaminated by myeloid cells and that expanded CD3+ T cells in culture therefore serve as the most reliable germline control for identifying true somatic mutations in MPN. We set up an analysis pipeline using the most stringent procedure to avoid false-positive calls of somatic mutations. In particular, paired germline and somatic DNA samples of the learning dataset were sequenced for reaching the same fold coverage. Moreover, we called somatic only those variants with no reads both in the paired germline DNA and in any other germline sample of the cohort. To these stringent ‘somatic’ filters, two additional controls were added to discard any possible polymorphisms (only variants with a frequency <1% in the 1000 genomes database were retained) as well as possible benign mutations (only non-synonymous variants were retained). Finally, all filtered variants were annotated with the functional effect prediction of five different algorithms.

Using this multistep bioinformatics pipeline, we finally identified 141 ‘genuine’ somatic non-synonymous mutations affecting 121 genes and 5 miRNAs that were then tested for recurrence in a larger cohort of 189 patients. The variants found in the SCRIB, MIR662, BARD1, TCF12, FAT4, DAP3, POLG and NRAS genes were recurrent with a frequency higher than 3%. In particular, SCRIB, MIR662 and BARD1 showed frequencies of 7.9, 7.4, and 5.3%, respectively, which were higher than those described for some well-known mutational hotspots.18

Some findings appear to join some of these genes, suggesting the potential role of these genes in the pathogenesis of MPN. SCRIB and FAT4 are two proteins that regulate planar cell polarity differentiation. SCRIB encodes a cytoplasmic scaffolding protein consisting of leucine-rich repeats and PDZ domains that regulates protein–protein interactions,24 while FAT4 belongs to the E-cadherin family and may control noncanonical Wnt/planar cell polarity signaling;25 both pathways play a crucial role in the regulation of polarity and tissue homeostasis. Interestingly, TCF12 protein, also known as HEB, could be linked with this pathway. TCF12 forms heterodimers with other bHLH E-proteins and with chimeric protein AML1-ETO26, 27, 28 and works as a transcriptional repressor of E-cadherin, thus playing an important role in cancer cell progression by enhancing the epithelial–mesenchymal transition process.29 The endothelial-mesenchymal transition is a form of the more widely known epithelial–mesenchymal transition; similarly to epithelial–mesenchymal transition, endothelial-mesenchymal transition can be induced by transforming growth factor-β and allows a polarized cell, which normally interacts with the basement membrane via its basal surface, to undergo multiple changes that enable it to assume a mesenchymal cell phenotype, which includes enhanced migratory capacity, invasiveness, elevated resistance to apoptosis and an increased ability to induce fibrosis. Interestingly, endothelial-mesenchymal transition and the resulting endothelial cell fate have been recently implicated in the pathogenesis of PMF.30, 31, 32

The SCRIBH1217P variant is a missense mutation predicted to be damaging via SIFT and PolyPhen2. In particular, missense mutations in the same SCRIB c-terminal domain region significantly disrupt the membrane subcellular localization of the protein, and this was suggested to be one possible pathogenic mechanisms for planar cell polarity alterations in mammals.33, 34 Similarly, the FAT4R175L mutation is located in the extracellular cadherin domain, and Polyphen2 together with MutationTaster were used to reveal that this FAT4 mutation is predicted to adversely affect protein function and alter the planar cell polarity environment. Furthermore, studies in Drosophila and mammalian cell lines have shown that SCRIB loss and RAS activation cooperate (interclonally or intraclonally) to promote invasion.35, 36, 37 In Drosophila, interclonal cooperation in RasV12 and Scrib-minus tumor clones revealed a two-level mechanism in which Scrib-minus cells promote the neoplastic development of RasV12 cells. Specifically, this mechanism involves (1) the spread of stress-induced JNK activity from scrib-minus cells to RasV12-activated cells followed by (2) the expression of JAK/STAT-activating cytokines downstream of JNK.38

Further insights into the molecular and pathogenetic complexity of MPN emerged from the discovery of BARD1, POLG and DAP3 mutations. All three of these variants are predicted to be damaging, and while DAP3 and POLG are mitochondrial proteins involved in DNA repair and apoptosis pathways,39, 40 BARD1 is a nuclear BRCA1-independent mediator between genotoxic stress and p53-dependent apoptosis41 (see the Discussion section of the Supplementary Information for details).

In addition to above considerations, we were intrigued by the fact that the two PMF patients in the learning cohort showing the NRASG12V mutation presented a rapid progression into an accelerated form of the disease. Therefore, we attempted to validate this association in an additional 66 PMF cases, composing a final cohort of 168 PMF. Moreover, because NRAS and KRAS mutations have been described as potentially mutually exclusive,42, 43 we tested this additional cohort for KRAS mutations to verify mutual exclusivity. We found that 4.7% of the PMF patients harbored a heterozygous NRAS mutation in codon 12. Conceivably, considering all the 8 NRAS mutated patients all together, we found that this mutation was associated with a poorer prognosis as supported by the fact that the 8 patients clustered in the intermediate-2 and high-risk category of the DIPSS-plus score. Only 2% of all PMF patients harbored heterozygous KRAS mutations (G12R, G12S and G13D), while the co-occurrence of NRASG12V and KRASG13D or JAK2V617F mutations was observed only for a single patient (we were unable to determine whether the two mutations arose from the same or different clone since frozen cell samples were not available for this patient). The low number of patients harboring KRAS mutations alone, precluded any meaningful analysis of the association with disease progression. Overall, these data suggest that NRAS mutations specifically associate with a poorer outcome, although the molecular mechanism remains to be investigated.

Mutations in microRNAs warrant a separate discussion. Even if additionally specific studies are necessary to support a functional role of the MIR662 variant rs74656628, some interesting features are worth mentioning. The human MIR662 is an intragenic microRNA that resides in a non-coding exon sequence of the mesothelin-like gene. This is a heterozygous missense mutation that occurs in the precursor sequence of the micro-RNA.38 A bioinformatics prediction analysis run with the In-Silico-Dicer and RNAFold suggested that this mutation could modify the RNA secondary structure and lead to the production of a different mature miRNA (see Supplementary Information for details).

In summary, this NGS study presents new data that contribute to elucidating the very high genomic complexity in MPN disorders and identifies new variants in cancer-related genes that are potentially involved in the pathogenesis of the disease and may deserve further studies.