BazEkon - The Main Library of the Cracow University of Economics

BazEkon home page

Main menu

Author
Polushina Tatiana (University of Bergen, Norway), Sofronov Georgy (Macquarie University, Sydney, Australia)
Title
Change-Point Detection in Binary Markov DNA Sequences by Cross-Entropy Method
Source
Annals of Computer Science and Information Systems, 2014, vol. 2, s. 471 - 478, rys., tab., bibliogr. 53 poz.
Keyword
Algorytmy genetyczne, Biochemia, Algorytmy
Genetic algorithms, Biochemistry, Algorithms
Note
summ.
Abstract
A deoxyribonucleic acid (DNA) sequence can be represented as a sequence with 4 characters. If a particular property of the DNA is studied, for example, GC content, then it is possible to consider a binary sequence. In many cases, if the probabilistic properties of a segment differ from the neighbouring ones, this means that the segment can play a structural role. Therefore, DNA segmentation is given a special attention, and it is one of the most significant applications of change-point detection. Problems of this type also arise in a wide variety of areas, for example, seismology, industry (e.g, fault detection), biomedical signal processing, financial mathematics, speech and image processing. In this study, we have developed a Cross-Entropy algorithm for identifying change-points in binary sequences with first-order Markov dependence. We propose a statistical model for this problem and show effectiveness of our algorithm for synthetic and real datasets.(original abstract)
Full text
Show
Bibliography
Show
  1. Avery P. and Henderson D., "Detecting a changed segment in DNA sequences," Appl. Statist., vol. 48, no. 4, pp. 489-503, 1999, http://dx.doi.org/10.1111/1467-9876.00167
  2. Avery P. and Henderson D., "Fitting Markov chain models to discrete state series such as DNA sequences," Appl. Statist., vol. 48, no. 1, pp. 53-61, 1999, http://dx.doi.org/10.1111/1467-9876.00139
  3. Bernardi G., Structural and evolutionary genomics. Natural Selection in Genome Evolution. Amsterdam: Elsevier, 2004.
  4. Botev Z. I., Kroese D., and Taimre T., "Generalized crossentropy methods with applications to rare-event simulation and optimization," Simulation, vol. 83, no. 11, pp. 785-806, 2007, http://dx.doi.org/10.1177/0037549707087067
  5. Boys R. and Henderson D., "A Bayesian approach to DNA sequence segmentation," Biometrics, vol. 60, pp. 573-588, 2004, http://dx.doi.org/10.1111/j.0006-341X.2005.040701_1.x
  6. Braun J., Braun R., and Müller H., "Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation," Biometrika, vol. 87, no. 2, pp. 301-314, 2000, http://dx.doi.org/10.1093/biomet/87.2.301
  7. Costa A., Jones O., and Kroese D., "Convergence properties of the cross- entropy method for discrete optimization," Operations Research Letters, vol. 35, no. 5, pp. 573-580, 2007, http://dx.doi.org/10.1016/j.orl.2006.11.005
  8. Costantini M., Clay O., Auletta F., and Bernardi G., "An isochore map of human chromosomes," Genome Res., vol. 16, pp. 536-541, 2006, http://dx.doi.org/10.1101/gr.4910606
  9. Evans G. E., Sofronov G. Y., Keith J. M., and Kroese D. P., "Estimating change-points in biological sequences via the cross-entropy method," Ann. Oper. Res., vol. 189, no. 1, pp. 155-165, 2011, http://dx.doi.org/10.1007/s10479-010-0687-0
  10. Hurst L., Brunton C., and Smith N., "Small introns tend to occur in GC-rich regions in some but not all vertebrates," Trends Genet, vol. 15, pp. 437-439, 1999, http://dx.doi.org/10.1016/S0168-9525(99)01832-6
  11. Ikemura T. and Wada K., "Evident diversity of codon usage patterns of human genes with respect to chromosome banding patterns and chromosome numbers; relation between nucleotide sequence data and cytogenetic data," Nucleic Acids Res., vol. 19, pp. 4333-4339, 1991.
  12. Keith J. M., "Segmenting eukaryotic genomes with the generalized Gibbs sampler," J. Comp. Biol., vol. 13, no. 7, pp. 1369-1383, 2006, http://dx.doi.org/10.1089/cmb.2006.13.1369
  13. Keith J. M., Adams P., Stephen S., and Mattick J. S., "Delineating slowly and rapidly evolving fractions of the drosophila genome," J. Comp. Biol., vol. 15, no. 4, pp. 407-430, 2008, http://dx.doi.org/10.1089/cmb.2007.0173
  14. Keith J. M., Kroese D. P., and Bryant D., "A generalized Markov sampler," Methodology and Computing in Applied Probability, vol. 6, no. 1, pp. 29-53, 2004, http://dx.doi.org/10.1023/B:MCAP.0000012414.14405.15
  15. Keith J., Kroese D., and Sofronov G., "Adaptive independence samplers," Statistics and Computing, vol. 18, no. 4, pp. 409-420, 2008, http://dx.doi.org/10.1007/s11222-008-9070-2
  16. Krauth J., "Multiple change points and alternating segments in binary trials with dependence," in Innovations in Classification, Data Science, and Information Systems, D. Baier and K. Wernecke, Eds. Springer, Berlin, 2004, pp. 154-164, http://dx.doi.org/10.1007/3-540-26981-9_19
  17. Krauth J., "Tests for multiple change points in binary Markov sequences," in From Data and Information Analysis to Knowledge Engineering. Springer, Berlin, 2006, pp. 670-677, http://dx.doi.org/10.1007/3-540-31314-1_82
  18. López I., Gámez M., Garay J., Standovár T., and Varga Z., "Application of change-point problem to the detection of plant patches," Acta Bio-theoretica, vol. 58, pp. 51-63, 2010, http://dx.doi.org/10.1007/s10441-009-9093-x
  19. Oliver J., Bernaola-Galvan P., Carpena P., and Roman-Roldan R., "Isochore chromosome maps of eukaryotic genomes," Gene, vol. 276, pp. 47-56, 2001, http://dx.doi.org/10.1016/S0378-1119(01)00641-2
  20. Oliver J., Carpena P., Roman-Roldan R., Mata-Balaguer T., Mejias-Romero A., Hackenberg M., and Bernaola-Galvan P., "Isochore chromosome maps of the human genome," Gene, vol. 300, pp. 117-127, 2002, http://dx.doi.org/10.1016/S0378-1119(02)01034-X
  21. Oliver J., Roman-Roldan R., Perez J., and Bernaola-Galvan P., "Segment: identifying compositional domains in DNA sequences," Bioinformatics, vol. 15, pp. 974-979, 1999, http://dx.doi.org/10.1093/bioinformatics/15.12.974
  22. Pettitt A., "A non-parametric approach to the changepoint problem," Appl. Statist., vol. 28, pp. 126-135, 1979, http://dx.doi.org/10.2307/2346729
  23. Polansky A., "Detecting change-points in Markov chains," Computational Statistics and Data Analysis, vol. 51, pp. 6013-6026, 2007, http://dx.doi.org/10.1016/j.csda.2006.11.040
  24. Polushina T. and Sofronov G., "Change-point detection in biological sequences via genetic algorithm," in Proceedings of the IEEE Congress on Evolutionary Computation (CEC'2011), 2011, pp. 1966-1971, http://dx.doi.org/10.1109/CEC.2011.5949856
  25. Polushina T. V. and Sofronov G. Y., "A hybrid genetic algorithm for change-point detection in binary biomolecular sequences," in Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2013), 2013, pp. 1-8, http://dx.doi.org/10.2316/P.2013.793-026
  26. Priyadarshana M. and Sofronov G., "GAMLSS and Extended Cross-Entropy Method to Detect Multiple Change-Points in DNA Read Count Data," in Proceedings of the 28th International Workshop on Statistical Modelling, Muggeo VMR, Capursi V, Boscaino G, Lovison G (Eds.), 2013, vol.1, pp. 453-457.
  27. Priyadarshana M. and Sofronov G., "The Cross-Entropy method and multiple change-points detection in zero-inflated DNA read count data," in The 4th International Conference on Computational Methods (ICCM2012), Y. T. Gu, S. C. Saha (Eds.), 2012 , pp. 1-8.
  28. Priyadarshana M., and Sofronov G., "Breakpoint (R-package);" software available at http://cran.r-project.org/web/packages/breakpoint.
  29. Priyadarshana M., Polushina T., and Sofronov G., "A hybrid algorithm for multiple change-point detection in continuous measurements," in International Symposium on Computational Models for Life Sciences, AIP Conference Proceedings, vol. 1559, 2013, pp. 108-117, http://dx.doi.org/10.1063/1.4825002
  30. Priyadarshana M., Polushina T., and Sofronov G., "Hybrid algorithms for multiple change-point detection in biological sequences," in Signal and Image Analysis for Biomedical and Life Sciences (Advances in Experimental Medicine and Biology), Sun, C., Bednarz, T., Pham, T. D., Vallotton, P., Wang, D. (Eds.), Springer, 2014, in press.
  31. Priyadarshana M., Sofronov G., "A modified cross-entropy method for detecting change-points in the Sri-Lankan stock market," In: The IASTED International Conference on Engineering and Applied Science (EAS2012), Chen, B. M.; Khan, M. T. and Tan, K-K. (Eds.), 2012, pp. 321-326, http://dx.doi.org/10.2316/P.2012.785-041
  32. Ren L., Gao G., Zhao D., Ding M., Luo J., and Deng H., "Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprint by models of stem cell differentiation," Genome Biol., vol. 8, p. R35, 2007, http://dx.doi.org/10.1186/gb-2007-8-3-r35
  33. Rubinstein R. and Kroese D., Simulation and the Monte Carlo Method. John Wiley & Sons, 2007.
  34. Rubinstein R. and Kroese D., The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. New York: Springer-Verlag, 2004.
  35. Schwarz G., "Estimating the dimension of a model," The Annals of Statistics, vol. 6, no. 2, pp. 461-464, 1978, http://dx.doi.org/10.1214/aos/1176344136
  36. Semon M., Mouchiroud D., and Duret L., "Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance," Hum. Mol. Genet., vol. 14, pp. 421-427, 2005, http://dx.doi.org/10.1093/hmg/ddi038
  37. Sofronov G. Y., Evans G. E., Keith J. M., and Kroese D. P., "Identifying change-points in biological sequences via sequential importance sampling," Environmental Modeling and Assessment, vol. 14, no. 5, pp. 577-584, 2009, http://dx.doi.org/10.1007/s10666-008-9160-8
  38. Sofronov G., "Change-point modelling in biological sequences via the bayesian adaptive independent sampler," International Proceedings of Computer Science and Information Technology, vol. 5, pp. 122-126, 2011.
  39. Sofronov G., Polushina T., and Priyadarshana M., "Sequential changepoint detection via the cross-entropy method," in The 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL2012), B. Reljin and S. Stankovic, (Eds.), 2012, pp. 185-188, http://dx.doi.org/10.1109/NEUREL.2012.6420004
  40. Sonesson C. and Bock D., "A review and discussion of prospective statistical surveillance in public health," Journal of the Royal Statistical Society. Series A (Statistics in Society), vol. 166, no. 1, pp. 5-21, 2003, http://dx.doi.org/10.1111/1467-985X.00256
  41. Spencer C., Deloukas P., Hunt S., Mullikin J., Myers S., Silverman B., Donnelly P., Bentley D., and McVean G., "The influence of recombination on human genetic diversity," PLoS Genet, vol. 2, p. e148, 2006, http://dx.doi.org/10.1371/journal.pgen.0020148
  42. Takano-Shimizu T., "Local changes in GC/AT substitution biases and in crossover frequencies on drosophila chromosomes," Mol. Biol. Evol., vol. 18, pp. 606-619, 2001, http://dx.doi.org/10.1093/oxfordjournals.molbev.a003841
  43. The MHC Sequencing Consortium, "Complete sequence and gene map of a human major histocompatibility complex," Nature, vol. 401, no. 6756, pp. 921-923, 1999, http://dx.doi.org/10.1038/44853
  44. Thomson J. R., Kimmerer W. J., Brown L. R., Newman K. B., Mac Nally R., Bennett W. A., Feyrer F., and Fleishman E., "Bayesian change point analysis of abundance trends for pelagic fishes in the upper San Francisco estuary," Ecological Applications, vol. 20, no. 5, pp. 1431-1448, 2010, http://dx.doi.org/10.1890/09-0998.1
  45. Thurman R., Day N., Noble W., and Stamatoyannopoulos J., "Identification of higher-order functional domains in the human ENCODE regions," Genome Res., vol. 17, pp. 917-927, 2007, http://dx.doi.org/10.1101/gr.6081407
  46. Vinogradov A., "Dualism of gene GC content and CpG pattern in regard to expression in the human genome: magnitude versus breadth," Trends Genet, vol. 21, pp. 639-643, 2005, http://dx.doi.org/10.1016/j.tig.2005.09.002
  47. Vinogradov A., "Isochores and tissue-specificity," Nucleic Acids Res., vol. 31, pp. 5212-5220, 2003, http://dx.doi.org/10.1093/nar/gkg699
  48. Whittaker J. and Frühwirth-Schnatter S., "A dynamic changepoint model for detecting the onset of growth in bacteriological infections," Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 43, no. 4, pp. 625-640, 1994, http://dx.doi.org/10.2307/2986261
  49. Willams E. and Hurst L., "The proteins of linked genes evolve at similar rates," Nature, vol. 407, pp. 900-903, 2000, http://dx.doi.org/10.1038/35038066
  50. Xu H., Wei C., Lin F., and Sung W., "An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data," Bioinformatics, vol. 24, pp. 2344-2349, 2008, http://dx.doi.org/10.1093/bioinformatics/btn402
  51. Yao Y., "Estimating the number of change-points via Schwarz criterion," Statistics and Probability Letters, vol. 6, pp. 181-189, 1988, http://dx.doi.org/10.1016/0167-7152(88)90118-6
  52. Zeitouni B., Boeva V., Janoueix-Lerosey I., O. Delattre, A. Nicolas, and E. Barillot, "SVDetect - a bioinformatic tool to identify genomic structural variations from paired-end next-generation sequencing data," Bioinformatics, vol. 26, pp. 1895-1896, 2010, http://dx.doi.org/10.1093/bioinformatics/btq293
  53. Zhang N. and Siegmund D., "A modified bayes information criterion with applications to the analysis of comparative genomic hybridization data," Biometrics, vol. 3, pp. 22-32, 2007, http://dx.doi.org/10.1111/j.1541-0420.2006.00662.x
Cited by
Show
ISSN
2300-5963
Language
eng
Share on Facebook Share on Twitter Share on Google+ Share on Pinterest Share on LinkedIn Wyślij znajomemu