Approximate Bayesian Computation (ABC) is a powerful Monte Carlo approach to posterior distribution estimation, with origins in population genetics studies (Tavaré et al. (1997), Beaumont et al. (2002), Marjoram et al. (2003)). Proposed parameter values, θ, are used to simulate data from the model or distribution of interest. Unlike, say, Gibbs sampling, ABC requires no explicit likelihood function: instead, ;a distance, ρ, between some summary statistics for the observed and simulated datasets is computed. If ρ is less than some chosen threshold, ε, θ is accepted, and new parameters are then proposed using a transition kernel q(θ). For properly chosen summary statistics, in the limit as ε → O the accepted values of θ form a sample from an approximation to the posterior distribution. Here, ABC is applied to a model of DNA sequence segmentation containing boundary locations and a first-order hidden Markov model (HMM) of the nucleotide transition probabilities. Previous work (Allingham et al. (2007)) used a Kullback-Leibler-based summary statistic and distance measure for the case where segment boundary locations were known, allowing estimation of the nucleotide transition probabilities. In this paper, an alternative summary statistic is proposed which enables the simultaneuous estimation of boundary location and transition probability posterior distributions. An oligonucleotide is a short 1inear sequence of nucleotides. The first-order HMM models pairs of nucleotides, and so oligonucleotides of length two ("2-mers", pairs of adjacent nucleotides) are considered here. The summary statistics used were the moving-average-filtered occurrences of each possible 2-mer along the observed DNA sequence. This enables the simultaneous estimation of boundary locations and transition probabilities, instead of requiring boundary locations in order to estimate the probabilities.
Joint Meeting of 4th World Conference of the IASC and 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis (IASC 2008). Proceedings: IASC 2008: Joint Meeting of 4th World Conference of the IASC and 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis (Yokohama, Japan 5-8 December, 2008) p. 60-68