Introduction

Common variable immunodeficiency (CVID, OMIM 240500) is the most frequent symptomatic primary immunodeficiency. It is characterized by recurrent infections and deficiencies of immunoglobulin (Ig)A and IgG and, in half of the patients, IgM. The phenotype also includes autoimmune disorders in about 25% of patients, a similar fraction suffering from gastrointestinal diseases. A smaller proportion have granulomatous disease, and there is also an increased risk of malignancies.1, 2 The standard treatment for CVID is Ig substitution. The incidence of CVID is estimated to be between 1/25 000 and 1/66 000,3, 4 being much less common than selective IgA deficiency (IgAD), which has an incidence between 1/600 and 1/800. Patients are given a diagnosis of CVID when there is no known cause for the hypogammaglobulinemia. Thus, the etiology is probably diverse and likely to encompass both monogenic and polygenic disorders, and to be influenced by environmental factors, including chronic infections.

Genetic studies of both CVID and IgAD show that between 20–25% of cases are familial,5 with a predominance of autosomal-dominant over recessive inheritance.5, 6, 7 In addition, CVID and IgAD commonly occur in different members of the same family and occasionally IgAD progresses to CVID.8, 9, 10, 11 Most previous genetic studies of CVID or IgAD have concentrated on the HLA region on chromosome 6. Some of these studies used a case–control design,12, 13, 14, 15, 16 while others used a genetic linkage or haplotype sharing analysis.5, 17, 18 Vořechovský et al,18 designated the HLA susceptibility locus IGAD1.19 A previous genome-wide study in our laboratory on a three-generation German CVID family in 2002 revealed linkage to chromosome 5p,20 but the disease-causing gene still remains elusive.

Candidate gene approaches have recently been successful in the identification of the molecular cause of some patients with CVID. Salzer et al21 and Castigli et al22 found that approximately 10% of CVID patients have either heterozygous or homozygous mutations of the TNFRSF13B gene, which encodes TACI. One patient has been reported with a homozygous mutation of TNFRSF13C, which encodes the BAFF-R (Warnatz K et al. XIth Meeting of the European Society for Immunodeficiency, Versailles, 2004; abstract #B72). Nine CVID patients share an identical homozygous deletion in the ICOS gene,23 and at least four patients have been identified with a homozygous mutation in CD19.24 In addition, a few patients originally diagnosed with CVID have later been shown to be affected by X-linked lymphoproliferative syndrome (OMIM 308240) caused by mutations in SH2D1A25, 26 or X-linked agammaglobulinemia (OMIM 307200) caused by mutations in BTK.27, 28, 29

In this paper, we present a genetic linkage study in the largest CVID family published to date.30 The family, NL1, exhibits autosomal-dominant inheritance with reduced penetrance and cases of CVID (6 individuals), IgAD (5 individuals), and dysgammaglobulinemia (three individuals). We show that the phenotype of CVID or IgAD in this family is linked to a locus on chromosome 4q. In an attempt to replicate this finding, we also present linkage analysis on 32 multiplex families with at least one CVID case, referred to below as the ‘EU cohort’. The EU families come from a larger family collection that was ascertained for IgAD and previously used to map the IGAD1 locus on chromosome 6.18, 19

Patients, materials, and methods

Patients and diagnostic measurements

The five-generation NL1 family with 54 individuals was described and illustrated in reference30 (Figure 1). Of the 54 individuals, six adults have CVID and five of these are alive and had previously donated blood, which could be used for genotyping. There are an additional eight individuals with a heterogeneous assortment of dysgammaglobulinemias (Table 1); five of these have IgAD (defined as a serum IgA concentration of <0.9 g/l), while three have IgA concentrations in the normal range. One of the five IgAD individuals is both the son and father of CVID patients, so if the hypothesis of dominant inheritance suggested in reference30 is correct, he is an obligate carrier of the CVID-associated mutation.

Figure 1
figure 1

Pedigree of NL1. Filled symbols illustrate CVID patients and gray symbols individuals with dysgammaglobulinemia and IgAD. Circles represent female individuals, and squares represent male. A crossing line indicates deceased individuals. Individuals 25, 34, 38, 40 and 51 are IgA deficient. Genotyped individuals have an asterisk next to their number.

Table 1 Immunoglobulin levels [g/l] in family NL1

Vořechovský et al5, 18, 19 described 101 families with multiple cases of immunodeficiency, where 43 families had at least one case of CVID. Among the 43 families, 34 had sufficient sample material available for the genotyping. Two families were found to have distinct mutations in the candidate gene TNFRSF13B, which encodes TACI,21 leaving 32 EU families for this study.

There is no consensus in the literature on what level of serum IgA is compatible with a diagnosis of CVID/IgAD. Some patients in a large CVID cohort reported by Cunningham-Rundles and Bodian31 had levels above 0.5 g/l. In the EU families, the diagnosis of IgAD required a level below 0.05 g/l, considered necessary to reduce uncertainty about affection status in small families. The reason why a higher threshold of 0.9 g/l was used for the NL1 family30 was because the obligate carrier had an IgA concentration of 0.76 g/l and also had IgM deficiency. None of the five CVID patients in this family had an IgA concentration <0.06 g/l.

Informed written consent was obtained from each individual prior to participation under an internal ethics review board-approved clinical study protocol (#239/99 for BG and 435/99 for LH). Twenty-eight of the 54 individuals provided samples for genotyping (indicated in Figure 1).

Genotyping

A total of 324 microsatellite markers were genotyped on the NL1 family for the genome-wide scan and fine mapping. A total of nine microsatellite markers on chromosome 4q, overlapping with the best markers in the NL1 family, were genotyped in the EU families. Markers for fine mapping were selected with the aid of the Marshfield map32 where the given intermarker distances were used in multipoint genetic linkage analysis. The EU families had been genotyped across the genome as described previously.18 However, the nearest usable markers from that data are D4S398 (72.5 cM from the 4p telomere) and D4S430 (126.1 cM), which appear to fall outside the linkage interval of the NL1 family.

Primers and other reagents for genotyping were purchased from Invitrogen Research Genetics (Karlsruhe, Germany), biomers.net (Ulm, Germany) and Qiagen (Hilden, Germany). The polymerase chain reactions (PCR) for genotyping were performed according to the protocols accompanying the reagents. The PCR products were sequenced on an ABI377 sequencer (PE Applied Biosystems, Foster City, CA, USA) using the COLLECTION and ANALYSIS software. Integer allele lengths were assigned in a semiautomatic manner using the GENOTYPER (PE Applied Biosystems) software package.

Genetic linkage analysis

Genetic linkage analysis of both the NL1 and EU families was carried out with a model-based approach assuming dominant inheritance in all analyses, and variable penetrance in some. One and two-marker LOD scores were computed using the FASTLINK software package.33, 34, 35 Four-marker multipoint LOD scores for the NL1 family were computed using VITESSE.36 Full (all nine markers) multipoint LOD scores for the EU families were computed with GENEHUNTER.37, 38 The possibility of locus heterogeneity for the EU families was evaluated with HOMOG39 for single-marker LOD scores and with GENEHUNTER for the full multipoint LOD scores. LOD scores achieved under different models of affection status in the NL1 family were assessed with the simulation software FASTSLINK.34, 40, 41 By generating and analyzing all linked replicates, we computed the highest LOD score achievable. By simulating with all unlinked replicates, we estimated empirical P-values for the true LOD scores.

All analyses shown here used a disease allele frequency of 0.001. For the NL1 family, we used all equal marker allele frequencies due to the small sample size. For the EU families, marker allele frequencies were estimated using the downfreq program.42 All analyses assigned individuals to either a penetrance class with no uncertainty (encoded in LINKAGE notation as 0.0, 1.0, 1.0) or considerable uncertainty (0.05, 0.75, 0.75), where the latter class represents a 5% phenocopy rate and 75% penetrance. We used a phenocopy rate much larger than the population incidence to allow for substantial locus heterogeneity, since this has been established for CVID.18, 20, 21, 22, 23

In the NL1 family, we show results on two different assignments for the seven hypogammaglobulinemic individuals who are not obligate carriers: (A) all unknown (0) status, and (B) IgAD individuals affected in the equivocal class and non-IgAD individuals unknown. For the EU families, all phenotyped individuals who were neither married in nor CVID-affected were assigned to the equivocal class and assigned affection status 1 (unaffected) if there was no evidence of IgAD, and affection status 2 (affected) if they had IgAD.

The extension from model A to model B was carried out because if the NL family maps to the same genetic locus as some of the EU families, which were ascertained for IgAD, then deficiency of IgA ought to be considered as affected. Unlike the NL1 family, the EU families do not include healthy individuals identified as having only IgG or IgM deficiency and normal IgA. Therefore, we assigned the unknown status (0) to the three hypogammaglobulinemic adults in the NL1 family who have normal IgA concentrations.

Candidate gene sequencing

Candidate genes on chromosome 4 were evaluated by sequencing either the coding regions of the genes on genomic DNA (for NFκB1, SCYE, CASP6, and DAPP1) or the respective cDNA (BANK1). All primers were sought with the aid of the Primer Select software (PE Applied Biosystems); sequences are available upon request. RNA was reverse transcribed into cDNA using ImProm-IIReverse Transcription System (Promega Corporation, Madison, WI, USA), cDNA was amplified by PCR and subsequently sequenced with the amplification primers. After gel electrophoresis on an ABI Prism™ 377 DNA Sequencer, the data were analyzed by the DNA Sequencing Analysis software, version 3.4 (PE Applied Biosystems) and Sequencer™, version 3.4.1 (Gene Codes Corporation, Ann Arbor, MI, USA).

Results

Genetic linkage analysis of the NL1 family

We initially tried an affection status assignment where every individual with any form of hypogammaglobulinemia was considered affected and any individual without hypogammaglobulinemia was considered unaffected. This approach yielded no promising loci for two reasons. First, different individuals are deficient in nonoverlapping subsets of Ig subtypes, so the simple definition that hypogammaglobulinemic=affected is internally inconsistent. Second, the broad definition does not follow autosomal-dominant inheritance, while the narrow definition of CVID does follow dominant inheritance, except for one obligate carrier. Therefore, the analyses shown here use stricter criteria for affected status. CVID-affected individuals and the one obligate carrier were always assigned in a class with no uncertainty as affected (2) and married-in individuals were always assigned as unaffected (1), also in a penetrance class with no uncertainty. Unphenotyped individuals used solely to connect the families were assigned an unknown (0) affection status. In the NL1 family, and for the analyses shown here, an individual who had no signs of CVID or IgAD was assigned an unaffected (1) status only if that individual was the full sibling of a CVID-affected individual, or otherwise was assigned an unknown affection status (0). This reduces power, but greatly reduces the risk of confounding due to polygenic effects; it makes the analysis close to an affected-only model, which is often preferred for complex diseases such as CVID.

Single-marker analysis with a cautious model in which only CVID-affected individuals and one obligate carrier are affected, and the unaffected siblings are put in the equivocal penetrance class shows promising single-marker scores along a broad interval on chromosome 4 (Table 2). Using the consecutive markers D4S1534, D4S423, D4S1572, and D4S2623 in a multipoint analysis yields a LOD score of 2.30, which is the maximum achievable for these affection and penetrance assignments.

Table 2 Markers in the linkage region on chromosome 4 and single marker LOD scores under a cautious model where only CVID patients and one obligate carrier are treated as affected

To get a better idea of the most likely linkage interval, we then moved to a stricter model with full penetrance, but no additional assignments of affected or unaffected status over the cautious model. The stricter model yields the LOD scores shown in Table 3. The four-point LOD score is 2.71, again the maximum achievable.

Table 3 Markers in the linkage region on chromosome 4 and single marker LOD scores under a strict model where only CVID patients and one obligate carrier are treated as affected

The EU data set available for replication was ascertained for IgAD, so we then extended the cautious model in the NL1 family by assigning affected status to the four adult individuals, besides the obligate carrier, who had IgAD but not CVID. These individuals were assigned to the equivocal (0.05, 0.75, 0.75) penetrance class, just as the unaffected siblings were in the cautious model. The single-marker LOD scores for the extended model are shown in Table 4. Marker D4S423 has a peak single-marker score of 3.25 at recombination fraction θ=0, indicating that the marker alleles segregate perfectly with affected status. The four-marker LOD score peaks at 3.38 with a nearly flat curve across all four markers. The multipoint LOD score of 3.38 is again the maximum achievable for these affection status and penetrance assignments. Using FASTSLINK, we generated and analyzed 3000 unlinked replicates of the pedigree with the same locus specifications as for D4S423. The highest LOD score among the unlinked replicates is 2.67, well below the observed score of 3.25. Thus, the observed score is significant at P<0.001.39

Table 4 Markers in the linkage region on chromosome 4 and single marker LOD scores under a extension of the cautious model analyzed in Table 2

Based on the three models, we conclude that in the NL1 family, the phenotype ‘CVID or IgAD’ is linked to a region on chromosome 4q. The linkage interval extends at least from marker D4S2361 at 85.4 Mb from the top of chromosome 4 through marker D4S1572 at 104.1 Mb (Figure 2). It might extend further, if the strict model treats the unaffected siblings of the CVID-affected individuals with too much certainty. The single-marker LOD scores vary quite a bit due to variations in the informativeness of markers in the NL1 pedigree. However, the multipoint LOD scores show no significant preference for any disease gene located within the (D4S2361,D4S1572) interval. In fact, in the extended model, the peak LOD score for D4S1572 occurs at θ=0.03 (and not 0.00), suggesting that the CVID-associated gene lies above D4S1572. However, we cannot declare D4S1572 as a definitive linkage interval boundary based on any accepted syllogisms for inference in genetic linkage analysis.39

Figure 2
figure 2

Haplotype analysis including the linkage region of chromosome 4. The numbers in the box indicate the length in bases of the microsatellite's allele. Alleles in brackets were inferred. The gray shading illustrates the disease-associated alleles/haplotype. The black lines indicate the position of the limiting crossovers under a strict model (Table 3) where unaffected siblings cannot carry the disease allele. Individuals appearing in italics were assigned unknown (0) disease status in the linkage analysis under the strict model (Table 3).

Supporting evidence from EU families

An important goal in genetic studies of complex diseases such as CVID is replication of any statistical finding in additional data sets. Therefore, we studied the 32 EU families using nine markers in, or near, the linkage interval of the NL1 family; five of the markers are shared in both parts of the study. We used the same penetrance function and method of analysis, but this is not a pure replication attempt as the EU families were ascertained for low IgA. Peak single-marker LOD scores are shown in Table 5. Eight of the nine markers achieve positive scores, and the remaining marker (D4S2623) achieves a neutral score of 0. Two of the markers, D4S423 and D4S1572, achieve peak scores 1.0, and these are among the perfectly segregating markers for the NL1 family. Since there is known locus heterogeneity for CVID based on the four causative genes found so far, and the peak scores occur at recombination fractions significantly above 0, we tested for locus heterogeneity within the EU family collection. Using HOMOG, we estimated that for D4S423 the LOD score under heterogeneity (HLOD) at θ=0 is 1.25, and the estimated proportion of linked families (α) is 0.48. For D4S1572, the HLOD at θ=0 is 1.03 with α=0.34. For marker D4S1572, the LOD score under heterogeneity is slightly higher than the peak (over all recombination fractions) LOD score under heterogeneity.

Table 5 Peak total single-marker scores under homogeneity for EU families and the recombination fraction (θ) at which the peak scores occur, for nine markers in or near the linkage interval of the NL1 family

Finally, we used GENEHUNTER to estimate a full (all nine markers) LOD score under heterogeneity, which peaks at 0.96, with an estimated 32% of the families linked. The peak occurs between markers D4S423 and D4S1572. GENEHUNTER also computes a model-free NPL score, which peaks at 1.73 (P<0.03). The GENEHUNTER-reported average (over the 32 families) information content is 0.85 at the location of the peak NPL score. In sum, analysis of the EU families provides support for the chromosome 4q locus suggested by the NL1 family.

Our study design was to genotype and analyze family NL1 first, followed by targeted genotyping of the EU cohort in the region(s) where NL1 showed evidence of linkage. Even though the families were ascertained using different clinical criteria, one can combine the data sets for markers they share in common, and analyze the 33 families together. As the marker allele frequencies are estimated jointly, the results may be different from summing LOD scores for the NL1 family (Table 4) and the EU cohort (Table 5). In particular, this is one way to confirm that using uniform marker allele frequencies in the NL1 analysis does not inflate LOD scores significantly. We carried out a combined analysis for four of the shared markers and yielded peak single markers under homogeneity as follows: D4S1534 (score=0.76; θ=0.26), D4S423 (score=3.98; θ=0.12), D4S1572 (score=1.92; θ=0.20), and D4S427 (score=0.73; θ=0.27). Thus, the peak scores for D4S423 and D4S1572 increase substantially over their peak scores in NL1 alone (Table 4). As these are the two markers genotyped in both data sets that appear to be in the minimal linkage interval for NL1, the combined results provide additional evidence in favor of linkage to this region of chromosome 4q.

Functional candidate genes in the linkage interval

Using NCBI's MapViewer (http://www.ncbi.nlm.nih.gov/mapview) and associated hyperlinks, we identified five genes that are located within the linkage interval with known functions that make them plausible candidates to be mutated in CVID. As the etiology of CVID includes abnormalities in the number of B and T cells, cytokine production, apoptosis, and other functions,43 the list should by no means be considered exhaustive. DAPP1 is a candidate because of its role in B-cell signaling and because DAPP1−/− mice have a deficiency in IgG3.44 BANK1 is a candidate because of its role in B cell response to antigens.45 NFκB is a candidate because of its prominent role in regulating immune responses.46 SCYE1 is a candidate because of its role in inflammatory response and apoptosis in T cells.47 CASP6 is a candidate because it induces apoptosis in response to infection by Streptococous pneumoniae.48 We sequenced all exons of CASP6, DAPP1, NFκB1, and SCYE1 on genomic DNA, and the cDNA of BANK1 and NFκB1 in at least one CVID patient from the NL1 family. However, no mutations were found in these genes/transcripts.

Discussion

Up until 2002, genetic linkage studies of CVID and/or IgAD concentrated on the HLA region on chromosome 6 using a variety of study designs.12, 13, 14, 15, 16, 17, 18, 19 Based on the positive results of Vořechovský et al,18 the HLA region was designated as the susceptibility locus IGAD1, and fine-mapped in a follow-up study.49 There is substantial disagreement in the cited studies regarding where in the HLA region a putative disease-causing gene is located, if there is one, and whether it is a susceptibility locus for CVID or IgAD or both.50, 51, 52

In 2003, we reported on an autosomal genome-wide linkage scan of three families with multiple cases of CVID, IgAD, and dysgammaglobulinemia, exhibiting autosomal-dominant inheritance and reported linkage of the disease phenotype to the telomeric region of chromosome 5p.20 The positive linkage evidence in this study came primarily from one large family. The 5p locus is not replicated in the EU cohort (data not shown) and clinical follow-up suggests that some members of the large 5p family have mycobacterial tuberculosis, an infection that is atypical for CVID.

Subsequent research has given strong support to the hypothesis that small numbers of families originally classified as multiplex CVID families have detectable monogenic immunodeficiencies, which are difficult to identify a priori by immunological assays alone. Genetic defects identified so far in patients with CVID include mutations in ICOS on chromosome 2q,23 CD19 on chromosome 16p,24 TNFRSF13C encoding the BAFF receptor on chromosome 22q (Warnatz K et al. XIth ESID Meeting, Versailles 2004; Abstract #B72), and TNFRSF13B encoding TACI on chromosome 17p.21, 22 With the possible exception of mutations in TNFRSF13B, which show autosomal-dominant transmission, these defects all display pure autosomal-recessive traits and heterozygous individuals appear to be healthy.

In the studies listed in the previous paragraph, positional reasoning was either not used at all or only used to assess whether marker and phenotype data in a specific family could be consistent with a mutation in a nearby gene. In contrast, for the NL1 family upon which we report in this study, we started with a positional approach. As a consequence, candidate gene sequencing can restrict attention to approximately 1% of the human genome. Thus, the discovery of a disease-associated genetic locus in multiplex CVID/IgAD families with an autosomal-dominant trait contributes to the search of disease genes causing hypogammaglobulinemia.

CVID is a complex disease. The previous identification of at least four distinct genes mutated in CVID establishes that there is considerable locus heterogeneity. A few families in these studies have nonpenetrant individuals, strongly suggesting a polygenic effect creating more complexity. The interaction between the genome of the human host and the pathogens that cause the infections in CVID adds another layer of complexity. However, genetic studies may gradually change CVID from one common diagnosis of exclusion to numerous rarer diagnoses of inclusion (eg, ICOS deficiency, CD19 deficiency, TACI deficiency, BAFFR deficiency). Since the age of onset of CVID is often in adulthood, and the diagnosis often delayed due to the insidious onset of symptoms,2 a genetic understanding of the etiology offers hope for earlier diagnosis and intervention.

The autosomal-dominant inheritance of CVID/IgAD in the largest pedigree published to date will most likely be explained by a mutation on chromosome 4q. A replication study in a collection of 32 smaller CVID/IgAD families suggests that some of these patients may also have a mutation in the same gene. It will be difficult to identify the gene solely by positional reasoning (ie genetic linkage or linkage disequilibrium) because the linkage interval is wide and other large CVID/IgAD families are hard to find. One approach we are currently pursuing is to determine by gene expression analysis which of the transcripts located within the linkage interval are differentially expressed in affected members of family NL1 when compared to healthy family members.