Skip to main content

Standard Methods for the Management of Immunogenetic Data

  • Protocol
  • First Online:
Immunogenetics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 882))

Abstract

In this chapter, we outline some basic principles for the consistent management of immunogenetic data. These include the preparation of a single master data file that can serve as the basis for all subsequent analyses, a focus on the quality and homogeneity of the data to be analyzed, the documentation of the coding systems used to represent the data, and the application of nomenclature standards specific for each immunogenetic system being evaluated. The data management principles discussed here are intended to provide a foundation for the data analysis methods detailed in Chaps. 13 and 14. The relationship between the data management and analysis methods covered in these three chapters is illustrated in Fig. 3.

The application of these data management principles is a first step toward consistent and reproducible data analyses. While it may take extra time and effort to apply them, we feel that it is better to take this approach than to assume that low data quality can be compensated for by large sample sizes.

In addition to their relevance for analytical reproducibility, it is important to consider these data management principles from an ethical perspective. The reliability of the data collected and generated as part of a research study should be as important a component of the ethical review of a research application as the security of those data. Finally, in addition to ensuring the integrity of the data from collection to publication, the application of these data management principles will provide a means to foster research integrity and to improve the potential for collaborative data sharing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gourraud PA, Feolo M (2010) The Babel Tower revisited: SNPs—Indels—CNVs. Confusion in naming sequence variant always rises from ashes Tissue Antigens 75:199–200

    CAS  Google Scholar 

  2. Marsh SG, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, Geraghty DE, Hansen JA, Mach B, Mayr WR, Parham P, Petersdorf EW, Sasazuki T, Schreuder GM, Strominger JL, Svejgaard A, Terasaki PI (2002) Nomenclature for factors of the HLA system, 2002. Tissue Antigens 60:407–464

    Article  PubMed  CAS  Google Scholar 

  3. Marsh SG, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, Fernández-Viña M, Geraghty DE, Holdsworth R, Hurley CK, Lau M, Lee KW, Mach B, Maiers M, Mayr WR, Müller CR, Parham P, Petersdorf EW, Sasazuki T, Strominger JL, Svejgaard A, Terasaki PI, Tiercy JM, Trowsdale J (2010) Nomenclature for factors of the HLA system, 2010. Tissue Antigens 75:291–455

    Article  PubMed  CAS  Google Scholar 

  4. Cano P, Klitz W, Mack SJ, Maiers M, Marsh SG, Noreen H, Reed EF, Senitzer D, Setterholm M, Smith A, Fernández-Viña M (2007) Common and well-documented HLA alleles: report of the Ad-Hoc committee of the American society for histocompatibility and immunogenetics. Hum Immunol 68:392–417

    Article  PubMed  CAS  Google Scholar 

  5. Robinson J, Mistry K, Marsh SGE (2010) Exon identity and ambiguous typing combinations. Anthony Nolan Research Institute. http://www.ebi.ac.uk/imgt/hla/pdf/ambiguity_v2280.pdf

  6. Robinson J, Mistry K, McWilliam H, Lopez R, Parham P, Marsh SG (2011) The IMGT/HLA database. Nucleic Acids Res 39(Database Issue):D1171–D1176

    Article  PubMed  Google Scholar 

  7. Mack SJ, Hollenbach JA (2010) Allele Name Translation Tool and Update NomenCLature: software tools for the automated translation of HLA allele names between successive nomenclatures. Tissue Antigens 75:457–461

    Article  PubMed  CAS  Google Scholar 

  8. Helmberg W, Lanzer G, Zahn R, Weinmayr B, Wagner T, Albert E (1998) Virtual DNA analysis—a new tool for combination and standardised evaluation of SSO, SSP and sequencing-based typing results. Tissue Antigens 51:587–592

    Article  PubMed  CAS  Google Scholar 

  9. Helmberg W (2000) Storage and utilization of HLA genomic data—new approaches to HLA typing. Rev Immunogenet 2:468–476

    PubMed  CAS  Google Scholar 

  10. Gourraud PA, Cambon-Thomsen A, Dauber EM, Feolo M, Hansen J, Mickelson E, Single RM, Thomsen M, Mayr WR (2007) Nomenclature for HLA microsatellites. Tissue Antigens 69(Suppl 1):210–213

    Article  PubMed  CAS  Google Scholar 

  11. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311

    Article  PubMed  CAS  Google Scholar 

  12. den Dunnen JT, Antonarakis SE (2000) Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mut 15:7–12

    Article  Google Scholar 

  13. den Dunnen J (2010) Nomenclature for the description of sequence variants. Human Genome Variation Society. http://www.hgvs.org/mutnomen/

  14. Bodmer JG, Marsh SGE, Parham P, Erlich HA, Albert E, Bodmer WF, Dupont B, Mach B, Mayr WR, Sasasuki T, Schreuder GMT, Strominger JL, Svejgaard A, Terasaki PI (1990) Nomenclature for factors of the HLA system, 1989. Tissue Antigens 35(1):1990

    Article  Google Scholar 

  15. Who Nomenclature Committee (1988) Nomenclature for factors of the HLA system, 1987. Tissue Antigens 32:177–187

    Article  Google Scholar 

  16. Bodmer JG, Marsh SG, Albert ED, Bodmer WF, Dupont B, Erlich HA, Mach B, Mayr WR, Parham P, Sasazuki T et al (1991) Nomenclature for factors of the HLA system, 1990. Hum Immunol 31(3):186–194

    Article  PubMed  CAS  Google Scholar 

  17. Bodmer JG, Marsh SG, Albert ED, Bodmer WF, Bontrop RE, Charron D, Dupont B, Erlich HA, Mach B, Mayr WR (1995) Nomenclature for factors of the HLA system, 1995. Tissue Antigens 46:1–18

    Article  PubMed  CAS  Google Scholar 

  18. Bodmer JG, Marsh SG, Albert ED, Bodmer WF, Bontrop RE, Charron D, Dupont B, Erlich HA, Fauchet R, Mach B, Mayr WR, Parham P, Sasazuki T, Schreuder GM, Strominger JL, Svejgaard A, Terasaki PI (1997) Nomenclature for factors of the HLA system, 1996. Tissue Antigens 49:297–321

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This work was supported by National Institutes of Health (NIH) grants U01AI067068 (JAH, SJM) and U19 AI067152 (PAG) awarded by the National Institute of Allergy and Infectious Diseases (NIAID) and by NIH/NIAID contract AI40076 (RMS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Allergy and Infectious Diseases or the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre-Antoine Gourraud .

Editor information

Editors and Affiliations

Glossary

Genetic data

1.Allele: Any of the alternative forms (sets of forms) of DNA sequence at locus. These variants may occur for genes and/or genetic markers.

Example: B, HLA-DRB1*01:01:01, HLA-A*01, D6S1666 (184).

2.Diplotype: The pair of haplotypes within a given genotype. The chromosomal phase between alleles is always known.

Example: HLA (A*01 B*08 DR*17, A*26 B*27 DR*17).

HLA (A*01 B*27 DR*17, A*26 B*08 DR*17).

When analyzing data for more than one locus, diplotypic data must be distinguished from genotypic data. Because the alleles at different loci in a given genotype can be combined to make many possible haplotype pairs, a genotype must be considered to correspond to multiple diplotypes. Unfortunately, the term “genotype” is sometimes used to refer to a given pair of haplotypes, especially when familial segregation has been studied.

3.Gene: The functional and physical unit of heredity. A gene consists of a DNA segment with a specific sequence. It includes information for the synthesis of mRNA molecules that direct the synthesis of proteins.

Example: ABO Glycosyltransferase gene, HLA-DRB1 gene, KIR-2DS3/S5 gene.

4.Genotype: The genetic makeup at one or more loci of an individual. It refers to a set of alleles carried by an individual (regardless of the expression of those alleles). The chromosomal phase (chromosomal identity of alleles at different loci) between alleles may not be known.

Example: HLA-A (1, 2); HLA-B (8, 44); HLA-DRB1 (03, 04).

KIR: KIR (A, A).

Microsatellites: D6S273 (134, 136) D6S273 (*(GT) 19 , *(GT) 20 ).

SNP: RS345336443 (G/G).

5.Haplotype: Set of alleles of contiguous loci. They are usually co-transmitted on a parental chromosome.

Example: HLA-A*01-B*08-DRB1*03.

6.Locus: Literally, “place” in Latin, it is the specific usual physical location of genes, an individual genetic marker, or set of genetic markers in a genome.

Example: ABO locus, HLA-DRB locus, KIR-2DS3/S5cen locus, D6S1666 microsatellite locus.

7.Phenotype: The observable expression of alleles as a physical or biochemical trait resulting from the interaction of the genome, the environment, and the experimental settings. In disease studies it may refer to the presence or a manifestation of the disease under study. Disease phenotypes may be reflected in a variety of ways as quantitative or qualitative variables.

This term may also refer to a set of alleles (expressed or not) detected by a technique. In codominant or heterozygous situations, phenotypes are noted as pairs of data; each pair is specific to a particular gene and locus.

Example: ABO system: [A].

HLA system: [HLA-A (1, 2); HLA-B (8,44); HLA-DR (3, 4)].

Phenotypic and demographic data

1.Admixture: The outcome of interbreeding between members of different populations. An admixed population is generally derived from populations in different geographic regions.

2.Collection site: The location where the sample was collected. This can be identified using latitude and longitude coordinates, or by specifying the country or nation, and city/town/village, or other locale where the collection took place.

3.Complexity: An ordinal variable that represents an estimate of the degree of admixture and population sub-structure in each population sample.

Example:

Complexity 1: a population sample collected from a single settlement or group of closely related settlements.

Complexity 2: a population sample collected from a group of separate but discrete settlements.

Complexity 3: a population sample collected in a metropolitan area or across an entire nation.

Complexity 4: an admixed population.

4.Data management methods: The approaches used in storing and processing the data in preparation for analysis. This can include the formats and programs used to store and edit the data (e.g., a specific spreadsheet program or database system), as well as any modifications that were made to the data between the generation of the data resulting from the typing assay and the inclusion of the data in the master data file. For example, if ambiguities were resolved, the approach used to resolve them should be documented in the data dictionary; if HLA allele data were truncated to a common level, or “binned” into a common sequence category (e.g., treating all alleles that encode the same peptide-binding region as the same allele) this should documented.

5. Ethnicity: A group of individuals (or populations) sharing a common language, culture, or religion, and who are assumed to share a common ancestry. Ethnicity should be distinguished from geography (e.g., “North American” is not an ethnicity), and though ethnicity is often associated with indigenous nationality (e.g., “Irish,” “Chinese”) qualifiers are often necessary to distinguish ethnicity from nationality (e.g., “Han Chinese”).

6. Family: If individuals in the study belong to discrete familial groups, a family ID is qualitative variable identifying membership in a particular pedigree, as well as the relationship to the index case (proband).

7. Geographic region: A specific continental or subcontinental area comprised by multiple nations in which the population is located, or from which the population was derived, if the population is a migrant population. For example, European Americans or European Australians would be assigned to the European region, or to a specific subregion of Europe. Conversely North America would only pertain to Native American/Amerindian/Aleut/Eskimo populations. Populations derived from more than one region (admixed populations) can be assigned to a specific class for the type of admixture (depending on the regions of origin) or included in a single class for all admixed populations. The definitions of each region and admixed class should be defined in the data dictionary.

8. Latitude and longitude: Geographic coordinates that specify specific locations on the surface of the Earth. Latitude and longitude values should be recorded in a decimal format, with minutes and seconds indicated as factions of each degree value. North latitudes and east longitudes should be recorded with positive values, and south latitudes and west longitudes should be recorded with negative values. For example, 35° 20 min south latitude would be recorded as −35.333, and 2° 30 min east longitude should be recorded as 2.5 or +2.5.

10. Population: A group of individuals living in a specific geographic area. More specifically, a population is defined such that all pairs of individual members have the opportunity to mate, and are more likely to mate with each other than with members of other populations. A population should be documented in the data dictionary in terms of the pertinent geographic area and the approximate number of included individuals.

Population sample: A unique descriptor for the individuals from a given population that were included in the study. If the study involves multiple sets of individuals (samples) from the same population, each set of individuals should be given a unique name; usually it is sufficient to append the number of individuals to the end of the population name (e.g., antarctica_87, antarctica_207, antarctica_597).

11. Population substructure: A barrier to the opportunity of mating between all pairs of individuals in a population.

12. Proband: The individual under study, primarily used in family-based disease association studies.

13. Status: The status of an individual as affected or unaffected with respect to a disease phenotype, or belonging to a case or a control group.

14. Typing assay: The laboratory method(s) and associated protocols used to generate the data included in the analysis. Many of them are described in this volume. Commonly used molecular methods for HLA and KIR genotyping include sequence-specific priming (SSP), sequence-specific oligo probe (SSO or SSOP), sequence/sequencing-based typing (SBT), matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF), and reference strand conformation analysis (RSCA). Serology has been used historically for HLA phenotype data generation. When possible, a description of the assay identifying the assay manufacturer and reagent version/lot employed should be included in the data dictionary. Literature citations or references to specific protocols should also be associated with the methods used, especially if multiple distinct methods have been employed in generating the data.

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Gourraud, PA., Hollenbach, J.A., Barnetche, T., Single, R.M., Mack, S.J. (2012). Standard Methods for the Management of Immunogenetic Data. In: Christiansen, F., Tait, B. (eds) Immunogenetics. Methods in Molecular Biology, vol 882. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-842-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-842-9_12

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-841-2

  • Online ISBN: 978-1-61779-842-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics