1. Benchmark protein sequences.
2. These protein sequences were downloaded from the UniProt (http://www.uniprot.org/) database on April 6, 2010.
3. Five organisms including Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens were considered.
4. To clear redundant sequences, we simply used CD-HIT (http://www.bioinformatics.org/cd-hit/) with the command: cd-hit -i input_sequence -o output_sequence -c 1.00 -n 5.
5. Files:
(1) SC.fas: budding yeast proteins
(2) CE.fas: nematode proteins
(3) DM.fas: fruit fly proteins
(4) MM.fas: mouse proteins
(5) HS.fas: human proteins
6. Last updated, April 6, 2010.
