Skip to main content

Part of the book series: Learning Materials in Biosciences ((LMB))

  • 3020 Accesses

What you will learn

Next-generation sequencing experiments produce millions of short reads per sample and the processing of those raw reads and their conversion into other file formats lead to additional information on the obtained data. Various file formats are in use in order to store and manipulate this information. This chapter presents an overview of the file formats FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data. Moreover, the structure and function of the different file formats are reviewed. This chapter explains how different file formats can be interpreted and what information can be gained from their analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genetics. 2014;15(2):121–32.

    Article  CAS  Google Scholar 

  2. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010;38(6):1767–71.

    Article  CAS  Google Scholar 

  3. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.

    Article  Google Scholar 

  4. Robinson JT, Thorvaldsdottir H, Wenger AM, Zehir A, Mesirov JP. Variant review with the integrative genomics viewer. Cancer Res. 2017;77(21):e31–e4.

    Article  CAS  Google Scholar 

  5. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6.

    Article  CAS  Google Scholar 

  6. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92.

    Article  CAS  Google Scholar 

  7. Quinlan AR. BEDTools: the Swiss-army tool for genome feature analysis. Curr Protoc Bioinformatics. 2014;47:11–2. 1–34

    Article  Google Scholar 

  8. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.

    Article  CAS  Google Scholar 

  9. Gericke A, Munson M, Ross AH. Regulation of the PTEN phosphatase. Gene. 2006;374:1–9.

    Article  CAS  Google Scholar 

  10. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.

    Article  CAS  Google Scholar 

  11. Leinonen R, Sugawara H, Shumway M. International nucleotide sequence database C. The sequence read archive. Nucleic Acids Res. 2011;39(Database issue):D19–21.

    Article  CAS  Google Scholar 

  12. Trivedi UH, Cezard T, Bridgett S, Montazam A, Nichols J, Blaxter M, et al. Quality control of next-generation sequencing data without a reference. Front Genet. 2014;5:111.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Patricia Basurto and Carolina Castañeda of the International Laboratory for Human Genome Research (National Autonomous University of Mexico, Juriquilla campus) for reviewing this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Melanie Kappelmann-Fenzl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kappelmann-Fenzl, M. (2021). NGS Data. In: Kappelmann-Fenzl, M. (eds) Next Generation Sequencing and Data Analysis. Learning Materials in Biosciences. Springer, Cham. https://doi.org/10.1007/978-3-030-62490-3_7

Download citation

Publish with us

Policies and ethics