What you will learn
Next-generation sequencing experiments produce millions of short reads per sample and the processing of those raw reads and their conversion into other file formats lead to additional information on the obtained data. Various file formats are in use in order to store and manipulate this information. This chapter presents an overview of the file formats FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data. Moreover, the structure and function of the different file formats are reviewed. This chapter explains how different file formats can be interpreted and what information can be gained from their analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genetics. 2014;15(2):121–32.
Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010;38(6):1767–71.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Robinson JT, Thorvaldsdottir H, Wenger AM, Zehir A, Mesirov JP. Variant review with the integrative genomics viewer. Cancer Res. 2017;77(21):e31–e4.
Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6.
Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92.
Quinlan AR. BEDTools: the Swiss-army tool for genome feature analysis. Curr Protoc Bioinformatics. 2014;47:11–2. 1–34
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.
Gericke A, Munson M, Ross AH. Regulation of the PTEN phosphatase. Gene. 2006;374:1–9.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.
Leinonen R, Sugawara H, Shumway M. International nucleotide sequence database C. The sequence read archive. Nucleic Acids Res. 2011;39(Database issue):D19–21.
Trivedi UH, Cezard T, Bridgett S, Montazam A, Nichols J, Blaxter M, et al. Quality control of next-generation sequencing data without a reference. Front Genet. 2014;5:111.
Acknowledgements
We thank Patricia Basurto and Carolina Castañeda of the International Laboratory for Human Genome Research (National Autonomous University of Mexico, Juriquilla campus) for reviewing this chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kappelmann-Fenzl, M. (2021). NGS Data. In: Kappelmann-Fenzl, M. (eds) Next Generation Sequencing and Data Analysis. Learning Materials in Biosciences. Springer, Cham. https://doi.org/10.1007/978-3-030-62490-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-62490-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62489-7
Online ISBN: 978-3-030-62490-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)