Help Page
In here you will find additional information about this website, and how the prediction algorithms are used.
Table of Contents
Using the online scoring form
Protein sequences are input using one-letter amino acid codes.
Any other non-standard symbols, spaces, or special characters are
ignored. As a sanity check, please check your original sequence
in the output page.
For an input protein, the online program will locate the zinc fingers in
your protein sequence and output the protein sequence with all the ZF domains
highlighted. You can select the ZF domains for which you want to predict
DNA-binding specificities by marking the corresponding boxes or clicking
the domains on the protein map. You can also specify whether you wish to use
either the expanded linear or polynomial pre-trained SVM model. Note that the
program will assume all fingers are binding consecutive bases, and this
affects the predictions of the "overlap" bases. If you want to predict the
DNA-binding specificity of any particular array of ZF domains, then you may
want to select just those fingers.
HMMER algorithm version 2.3.2 is used
for detecting ZF domains. We output all original bit scores next to ZF domains.
Please note that no bit score threshold is used here, but the default HMMER
gathering threshold for ZF domains is 17.7, so you can decide to select only
confident ZF domains with bit scores exceeding 17.7. You can check the ZF
scores in the final window when generating DNA sequence logos for your protein
to ascertain whether you are satisfied with the labelled ZFs.
Prediction Results
Please note that the protein may bind to either the primary or complementary
DNA chain: the predicted PWM is shown as a Sequence Logo for the primary DNA
chain and as the Reverse Complement for easier comparison with your expectation
or with a known experimental PWM.
You may also download the PWM in a text format with nucleotide probabilities
given for each position by pressing the "Download PWM" button.
Pre-trained model files
If you would like to test our pre-trained SVM models using external programs, such as SVM_light, you can download pre-trained model files for the expanded linear and polynomial SVMs.
Experimental database download
We have also made available for download the database of experimental data collected from 25 individual manuscripts published in 1990 - 2005 and from the Protein Data Bank. This archive is password-protected. You can request the password by contacting us. Each line in the database represents one experiment including fields: source - data origin; dna - DNA sequence; zf - number of zinc fingers in protein; f1-fN - sequences of corresponding zinc finger regions; ex - type of example: + for binding, - for non-binding, Kd - for experimentally measured dissociation constant, and > for comparative examples when binding of sequence A is compared to the subsequently listed sequence B. Please consult the list of sources for all individual references.
Terms of use
If you use this program, please cite:
- Anton Persikov, Robert Osada and Mona Singh (2009) "Predicting DNA recognition by Cys2His2 zinc finger proteins". Bioinformatics, 25(1): 22-29.
- Anton Persikov and Mona Singh (2014) "De Novo Prediction of DNA-binding Specificities for Cys2His2 Zinc Finger Proteins". NAR, 42(1): 97-108. Epub 2013 Oct 3.
Contact us
To give feedback or to send your comments or suggestions please email us: persikov at princeton.edu