asymptoticMK: Asymptotic McDonald–Kreitman Test

By Benjamin C. Haller & Philipp W. Messer. Copyright © 2017 Philipp Messer.

See below for background and usage information. If you use this service, please cite our paper:

B.C. Haller, P.W. Messer. (2017). asymptoticMK: A web-based tool for the asymptotic McDonald–Kreitman test. G3: Genes, Genomes, Genetics 7(5), 1569–1575. doi:10.1534/g3.117.039693

Submit your data:

d  :  
d0  :  
Input file  :  
     (Tab-delimited with named columns for x, p, and p0) [sample]
x interval to fit  :  [,]

Please let us know of any issues with this service at philipp {dot} messer <at> gmail [dot] com. Thanks!


Background & usage:

This page provides an R-based implementation of the asymptotic McDonald–Kreitman test (Messer & Petrov 2013) as a web-based service (it can also be run at the command line using curl, or as a local R script; see below). This test is used to determine an estimate of α (alpha), the fraction of substitutions in a genomic test region that were driven to fixation by positive selection. To do this, it uses the data supplied to calculate empirical values of a function α(x):

α(x) = 1 − (d0 / d) (p(x) / p0(x))

where

x  =  derived allele frequency
d0  =  substitution rate in the neutral reference region
d  =  substitution rate in the test region
p0(x)  =  polymorphism level in the neutral reference region for frequency class x
p(x)  =  polymorphism level in the test region for frequency class x.

It then fits an exponential function to this data, of the form:

αfit(x) = a + b exp(−cx)

The value of this function extrapolated to x = 1 provides an estimated value for α:

αasymptotic = αfit(x = 1)

Although the exponential function is generally expected to provide the best fit, a linear function is also fit to the data, of the form:

αfit(x) = a + bx

If the exponential fit fails to converge (which can happen if the data does not fit an exponential pattern), or if the linear fit is superior according to AIC, then the linear fit is reported; otherwise, the exponential fit is reported. (There are also pathological cases in which the exponential fit is superior according to AIC, but the confidence interval of its estimate of α is very wide; in that case, the linear fit is also preferred.)

The individual steps required to execute the test are:

  1. determine your test region and an appropriate neutral reference region,
  2. determine the overall substitution rates for those regions,
  3. subdivide your SNP data for these regions into derived allele frequency classes (the number of classes depending upon how much data you have, such that there are no "empty" frequency classes),
  4. determine the polymorphism level within each of those frequency classes, and
  5. submit this information using the form above to obtain plots and analysis.

Note that it is often advisable to trim the polymorphism data, removing the lowest and highest frequency classes. This is recommended because low-frequency polymorphisms can have a high error rate due to sequencing error, whereas high-frequency polymorphisms can be vulnerable to polarization error. This service therefore allows specification of the interval of x that will be used to fit the exponential and linear models.

Values for the polymorphism rates should be supplied in a tab-separated file of row data, with columns for x, p, and p0. The file's first row should contain text labels for the columns; the specific text used to label the columns is unimportant, but their order (x, p, p0) is important. A sample file can be seen here.


Command-line usage

This service can also be used in an automated fashion at the command line using the Un*x command curl. Depending upon your operating system, you may need to install curl first. To download the full HTML response for a query, use a command like:

curl -F"d=593" -F"d0=930" -F"xlow=0.1" -F"xhigh=0.9" -F"datafile=@polymorphisms.txt"
     -o "MK_full.html" http://benhaller.com/cgi-bin/R/asymptoticMK_run.html

That should all be entered as a single line at the Un*x terminal prompt. The values for d, d0, and the x cutoff interval are supplied with -F options to the curl command as shown here. The file to upload with binned values for x, p, and p0 is given with another -F option, with an @ preceding the filename. The filename for output is given with -o, and the URL for submission to this web service is supplied last. In this example, the file to upload and the output file are both in the current directory, but supplying Un*x paths should also work. The result of this command is an HTML file with the full response, including embedded plots.

Often, however, for automation of a workflow one needs the results in a more machine-readable format, without plots, explanatory text, or HTML markup. To get that, use a command like:

curl -F"d=593" -F"d0=930" -F"xlow=0.1" -F"xhigh=0.9" -F"datafile=@polymorphisms.txt"
     -F"reply=table" -o "MK_table.txt" http://benhaller.com/cgi-bin/R/asymptoticMK_run.html

Here an extra option, reply=table, has been supplied with -F to request the results in the form of a tab-separated table of values, saved as MK_table.txt in the current directory (as requested by the -o option). The resulting file will start with a header of comment lines, beginning with #, that specify the input values for the analysis. The remainder of the file will be tab-separated rows, with a symbol name and then a value on each line. Values will be given for the coefficients a, b, and c of the fit (with NA as the value of c if the linear fit was chosen), the asymptotic estimate αasymptotic from the fitted function, the confidence interval around that estimate, and the original, non-asymptotic McDonald–Kreitman estimate αoriginal (for comparison to the asymptotic estimate). Other Unix tools such as grep may then be used to extract the desired values from the response.


Source & acknowledgements:

This service is open-source; the R code implementing it is in the asymptoticMK Github repository. An example SLiM model for producing binned polymorphism data suitable for testing this service is also available in that repository. Finally, an R script suitable for local execution is also available there, if you prefer to run asymptoticMK locally rather than through this web interface.

The asymptoticMK service was implemented using FastRWeb, a package for building R-based web services. Thanks to S. Urbanek for FastRWeb, and for considerable help with getting this service up and running. Thanks also to G. Grothendieck, J. Horner, and S. Urbanek for other R packages used in asymptoticMK. Thanks to A.-N. Spiess for the R code used to obtain a confidence interval for the exponential fit. Development of this service was supported by funds from the College of Agriculture and Life Sciences at Cornell University to PWM.