Abstract
The application of Recurrence-Based Techniques to biopolymers is herewith introduced with an emphasis on the differences holding between the analysis of strings endowed with a mainly logical (DNA) or chemico-physical (Proteins) information content. The unique features of RQA when applied to systems in which spatial order (sequence) takes the place of time are described, highlighting the emergence of ‘time distortions’. This is a metaphorical term stressing the fact that a monodimensional array of aminoacid residues (sequence) beside being formally identical to a discrete time series is a physical object that folds in the usual three dimensional space. This behavior allows to fully appreciate the fact that RQA as an analytical tool is flexible enough to deal with complex networks in either the spatial or the temporal dimension. The comparison of DNA sequences with text strings helps to shed light on the particular nature of biological information coding as well as on the role of RQA technique in bioinformatics and computational biology fields.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Satellite DNA is the main component of functional centromeres, and form the main structural constituent of heterochromatin, i.e. the densely packed, non expressed part of DNA molecule. The name “satellite DNA” refers to how repetitions tend to produce a different frequency of the nucleotides adenine, cytosine, guanine and thymine, and thus have a different density from bulk DNA, such that they form a second or ‘satellite’ band when genomic DNA is separated on a density gradient [28].
References
J.P. Eckmann, S.O. Kamporst, D. Ruelle, Recurrence plots of dynamical systems. Eur. Phys. Lett. 4, 973–977 (1987)
C.L. Webber Jr., J.P. Zbilut, Dynamical assessment of physiological systems and states using recurrence plot strategies. J. Appl. Physiol. 76, 965–973 (1994)
N. Marwan, N. Wessel, U. Meyerfeldt, A Schirdewan, J. Kurths, Recurrence plot based measures of complexity and its application to heart rate variability data. Phys. Rev. E 66, 026702–1026702–7 (2002)
N. Marwan, M.C. Romano, M. Thiel, J. Kurths, Recurrence plots for the analysis of complex systems. Phys. Rep. 438, 237–329 (2007)
D.B. Vasconcelos, S.R. Lopes, R.L. Viana, J. Kurths, Spatial recurrence plots. Phys. Rev. E. 73, 056207 (2006)
A. Giuliani, R. Benigni, J.P. Zbilut, C.L. Webber Jr., P. Sirabella, A. Colosimo, Nonlinear signal analysis methods in the elucidation of protein sequence structure relationships. Chem. Rev. 102, 1471–1491 (2002)
G. Oliva, L. Di Paola, A. Giuliani, F. Pascucci, R. Setola. Assessing protein resilience via a complex network approach. In Network Science Workshop (NSW), 2013 IEEE 2nd, (IEEE 2013), pp. 131–137
L. Di Paola, M. De Ruvo, P. Paci, D. Santoni, A. Giuliani, Protein contact networks: an emerging paradigm in chemistry. Chem. Rev. 113, 1598–1613 (2013)
C.L. Webber Jr., A. Giuliani, J.P. Zbilut, A. Colosimo, Elucidating protein secondary structures using alpha carbon recurrence quantifications. Proteins Struct. Funct. Genet. 44, 292–303 (2001)
M. De Ruvo, A. Giuliani, P. Paci, D. Santoni, L. Di Paola, Shedding light on protein-ligand binding by graph theory: the topological nature of allostery. Biophys. Chem. 165–166, 21–29 (2012)
S. Vishveshwara, K. Brinda, N. Kannan, Protein structure: insights from graph theory. J. Theor. Comput. Chem. 1, 187–212 (2002)
C. Hansch, D. Hoekman, H. Gao, Comparative qsar: toward adeeper understanding of chemico-biological interactions. Chem. Rev. 96, 1045–1075 (1996)
S. Miyazawa, R.L. Jernigan, Estimation of effective inter-residue contact energies from protein crystal structure: quasi-chemical approximation. Macromolecules 18, 534–552 (1985)
J. Kyte, R.F. Doolitle, A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982)
A. Porrello, S. Soddu, J.P. Zbilut, M. Crescenzi, A. Giuliani, Discrimination of single aminoacid mutations of the p53 protein by means of recurrence quantification analysis. Proteins Struct. Funct. Bioinf. 55, 743–755 (2004)
A. Giuliani, R. Benigni, P. Sirabella, J.P. Zbilut, A. Colosimo, Nonlinear methods in the analysis of protein sequences: a case study in rubredoxins. Biophys. J. 78, 136–149 (2000)
S. Soddu, G. Blandino, R. Scardigli, R. Martinelli, M.G. Rizzo, M. Crescenzi, A. Sacchi, Wild-type p53 induces diverse effects in 32d cells expressing different oncogenes. Mol. Cell. Biol. 16, 487–495 (1996)
T. Soussi, Y. Legros, R. Lubin, K. Ory, B. Schlichtholz, Multifactorial analysis of p53 alteration in human cancer: a review. Int. J. Cancer 57, 1–9 (1994)
H.J. Jeffrey, Chaos game representation of gene structure. Nucleic Acid Res. 18, 2163–2170 (1990)
O.C. Kulkarni, R. Vigneshwar, V.K. Jayaraman, B.D. Kulkarni, Identification of coding and noncoding sequences using local holder exponent formalism. Bioinformatics 21, 3818–3822 (2005)
R.N. Mantegna, S.V. Buldyrev, A.L. Goldberger, S. Havlin, C.K. Peng, M. Simons, H.E. Stanley, Linguistic features of noncoding dna sequences. Phys. Rev. Lett. 73, 3169–3175 (1994)
E.A. Feingold, P.J. Good, M.S. Guyer, S. Kamholz, L. Liefer, K. Wetterstrand, F.S. Collins et al., The encode (encyclopedia of dna elements) project. Science 306, 636–640 (2004)
J.O. Andersson, S.G. Andersson, Pseudogenes, junk dna, and the dynamics of rickettsia genomes. Mol. Biol. Evol. 18(5), 829–839 (2001)
C. Frontali, E. Pizzi, Similarity in oligonucleotide usage in introns and intergenic regions contributes to long-range correlation in the caenorhabditis elegans genome. Gene 232, 87–95 (1999)
E. Bultrini, E. Pizzi, P. Del Giudice, C. Frontali, Pentamer vocabularies characterizing introns and intron-like intergenic tracts from Caenorabditis elegans and Drosophila melanogaster. Gene 304, 183–192 (2003)
F. Orsucci, A. Giuliani, C.L. Webber, J.P. Zbilut, P. Fonagy, M. Mazza, Combinatorics and synchronization in natural semiotics. Phys. A 361, 665–676 (2006)
G. Leonardi, The study of language and conversation with recurrence analysis methods. Psychol. Lang. Commun. 16, 165–183 (2012)
B. John, G.L. Miklos, Functional aspects of satellite dna and heterochromatin. Int. Rev. Cytol. 58, 1–114 (1979)
M.A. Montemurro, Beyond the Zipf-Mandelbrot law in quantitative linguistics. Physica A 300, 567–578 (2001)
C.L. Webber Jr., J.P. Zbilut, Recurrence quantification analysis of nonlinear dynamical systems, in Tutorials in Contemporary Nonlinear Methods for the Behavioral Sciences, Chap. 2 (National Science Foundation, Washington, DC, 2005) pp. 26–94
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Cryptography
Cryptography was a strategically crucial discipline during the Second World War: the decipherment of hidden information in encrypted messages (like in the case of the cracking by the allies of the German code generated by the system Enigma) was based upon the notion that any human language, despite its apparent randomness and arbitrariness, is endowed with regularities of various kinds (e.g. the relative abundance of words of given length, the juxtaposition of pairs of symbols, etc.) and that no masking code can obscure the code-independent features typical of the original language. These code-independent features are supposed to derive from some general invariants common to all languages, like the so called Zipfs law [29] stating the frequency of occurrence of the words in any kind of (sufficiently long) text written in any language is negatively correlated with the number of letters according to a power law (Fig. 5.16).
Figure 5.16 reports (on a double-logarithm scale) the strictly invariant relation between frequency of occurrence and word length in different book collections. Such regularities are clearly independent from the rich semantic information present in the analyzed books: the observed scaling comes from global constraints linked to the general features of human languages. The fact these general features are largely content independent was considered very important for many investigators involved in the analysis of the DNA sequences: in this way they could skip the specific (and extremely heterogeneous) function of different patches so to concentrate on the global statistical features of the billions letters DNA text.
Appendix 2: Strings from Human Languages
2.1 (A) Dante Alighieri - Inferno - I Canto (tercets 1–3) - FILTERED
Nelmezzodelcammindinostravitamiritrovaiperunaselvaoscuracheladirittaviaerasmarrita Ahiquantoadirqualeraecosaduraestaselvaselvaggiaeaspraefortechenelpensierrinovalapaura Tanteamarachepocoepiumortemapertrattardelbenchivitrovaidirodellaltrecosechivhoscorte
2.2 (B) Dante Alighieri - Inferno - I Canto (tercets 1–3) - NONFILTERED (English Translation by Henry Wadsworth Longfellow))
Midway upon the journey of our life
I found myself within a forest dark
For the straightforward pathway had been lost.
Ah me! how hard a thing it is to say
What was this forest savage, rough, and stern
Which in the very thought renews the fear.
So bitter is it, death is little more;
But of the good to treat, which there I found,
Speak will I of the other things I saw there.
2.3 (C) Dr. Suess Poem - NONFILTERED
I do not like eggs in the file.
I do not like them in any style.
I will not take them fried or boiled.
I will not take them poached or broiled.
I will not take them soft or scrambled,
Despite an argument well-rambled.
No fan I am of the egg at hand.
Destroy that egg! Today! Today!
Today I say!
Without delay!
(A), (B) and (C) refer to the strings from spoken languages whose % Det is shown in Fig. 5.14. Notice that in all cases RQA was applied after filtering the original texts as indicated in [30]: only in (A), however, the filtered text is shown. (A) and (B) show the first three of the 45 analyzed tercets in Fig. 5.14.
Appendix 3: Nucleotidic Strings
3.1 Satellite DNA 1 - GenBank: BI067039.1 Homo Sapiens Genomic Region Containing Hypervariable Minisatellites, mRNA Sequence
GTCCTCCGCCCCACACTTATGGGGCAGAACCCACACTTCCGGTCCTCCGCTCCACACTTATGGGGCACAGCCCACACTTCTGGTCCTCTGCCCCACACTTATGGGGCACAGTTGGGTGTTCTGCCCCACACTTATGGGGCACAGACAGCAGTTCCGGACCTCCACCCCACACTTATGGGGCAGAACCCACACTTCCGGTCCTCCGCCCCACACTTATGGGGCAGAACCCACAGTTTTGGTCCTCCGCTCCACACTTATGGGGCACAACAACCCACAGTTATGGGGCTTATGAGGTTCTGCCCCACACTTATGGGGCACAGACAGCAGTTCTGGTCCTCCGCCCCACACTTATGGGGCAGAACCCACACTTCCGGTCCTCCGCCCCACACTTATGGGGCAGAACCCACACTTCCGGTCCTCCGCTCCACACTTATGGGGCACAGCCCACACTTCTGGTCCTCTGCCCCACACTTATGGGGCACAGCTGGGGGTCCTACCCCACACTTATGGGGCAGAACCCACAGTTCCGGTCCTCCACCCCACACTTATGGGGCACAGCTGGGGATTCTGTGCCACACTTATGGGGCAGAACCCACAGTTCCGGCCCTCCGCCCCACACTTATGGGGCAGNNCNNGCNGNNCGGG
3.2 Satellite DNA 2 - GenBank: BM439581.1 Homo Sapiens Genomic Region Containing Hypervariable Minisatellites, mRNA Sequence
GGCACAGCTGGGGATTCTGCCCCACACTTATGCGGCACAACCCACAGTTCTGGTCCTCTCCCCCACACTTATGGGGCACAACAACCCACAGTTATGGGGCTTATGAGGTTCTGCCCCACACTTACGGGGCACAGACAGCAGTTCCAGTCCTCCGCCCCACACTTATGGGGCAGAACCCACAATTCCGGACCTCTGCCCCACACTTACGGGGCACAGCTGGGGATTCTGCCCCACACTTATGGGGCACAACCCACAGTTCTGGTCCTCTCCCCCACACTTATGGGGCAGAACCCACACTTCCGGTCCTCCGCCCCACACTTAGGGAGCAGAACCCACACTTCCGGTCCTCCGCCCCACACTTATGGGGCACAACAACCCACAGTTATGGGGCCTATGAGGTTCTGCCCCACACTTATGGGGCACAGACAGCAGTTCCGGACCTCTGCCCCACACTTATGGGGCACAGTTGGGGGTCCTACCCCACACTTATGGGGCAGAACCCACAGTTCCGGACCTCCGCCCCACACTTATGGGGCAGAACCCACACTTCCGNACCTCTGCCCCACACTTATGGGGCACA
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Colosimo, A., Giuliani, A. (2015). From Time to Space Recurrences in Biopolymers. In: Webber, Jr., C., Marwan, N. (eds) Recurrence Quantification Analysis. Understanding Complex Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-07155-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-07155-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07154-1
Online ISBN: 978-3-319-07155-8
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)