In here you will find additional information about this website, and how the prediction algorithms are used.
Table of Contents
Protein sequences are input using one-letter amino acid codes.
Any other non-standard symbols, spaces, or special characters are
ignored. As a sanity check, please check your original sequence
in the output page.
For an input protein, the online program will locate the zinc fingers in your protein sequence and output the protein sequence with all the ZF domains highlighted. You can select the ZF domains for which you want to predict DNA-binding specificities by marking the corresponding boxes or clicking the domains on the protein map. You can also specify whether you wish to use either the expanded linear or polynomial pre-trained SVM model. Note that the program will assume all fingers are binding consecutive bases, and this affects the predictions of the "overlap" bases. If you want to predict the DNA-binding specificity of any particular array of ZF domains, then you may want to select just those fingers.
HMMER algorithm version 2.3.2 is used for detecting ZF domains. We output all original bit scores next to ZF domains. Please note that no bit score threshold is used here, but the default HMMER gathering threshold for ZF domains is 17.7, so you can decide to select only confident ZF domains with bit scores exceeding 17.7. You can check the ZF scores in the final window when generating DNA sequence logos for your protein to ascertain whether you are satisfied with the labelled ZFs.
Please note that the protein may bind to either the primary or complementary
DNA chain: the predicted PWM is shown as a Sequence Logo for the primary DNA
chain and as the Reverse Complement for easier comparison with your expectation
or with a known experimental PWM.
You may also download the PWM in a text format with nucleotide probabilities given for each position by pressing the "Download PWM" button.
We have also made available for download the database of experimental data collected from 25 individual manuscripts published in 1990 - 2005 and from the Protein Data Bank. This archive is password-protected. You can request the password by contacting us. Each line in the database represents one experiment including fields: source - data origin; dna - DNA sequence; zf - number of zinc fingers in protein; f1-fN - sequences of corresponding zinc finger regions; ex - type of example: + for binding, - for non-binding, Kd - for experimentally measured dissociation constant, and > for comparative examples when binding of sequence A is compared to the subsequently listed sequence B. Please consult the list of sources for all individual references.
If you use this program, please cite:
- Anton Persikov, Robert Osada and Mona Singh (2009) "Predicting DNA recognition by Cys2His2 zinc finger proteins". Bioinformatics, 25(1): 22-29.
- Anton Persikov and Mona Singh (2014) "De Novo Prediction of DNA-binding Specificities for Cys2His2 Zinc Finger Proteins". NAR, 42(1): 97-108. Epub 2013 Oct 3.
To give feedback or to send your comments or suggestions please email us: persikov at princeton.edu