PRIDE2 Logo
Probability of Identity #2
Protein fold similarity
Web server
 

PRIDE2 Server test results

The PRIDE2 method and the corresponding server were tested using a previously published and an in-house test set. The test sets are as follows (click on the links to see the obtained results):

  1. Test set based on that published by Novotny et al. (Novotny,M., Madsen,D. and Kleywegt, G.J. (2004): Evaluation of Protein Fold Comparison Servers. Proteins, 54, 260-270).
  2. A set of newly determined structures retrieved from PDB on 29 Dec 2004 using the selection criteria below:
    • Deposited after Sep 1, 2004
    • Released after Dec 1, 2004
    • 90% homology filtered
    • For X-ray structures resolution higher than < 2 Angstrom (55 structures)
    • NMR structures (18 structures)
  3. In addition, a quick comparison of the PRIDE and PRIDE2 methods is presented
Test set based on the one published by Novotny et al

The 'real' test was feeding the full PDB structures (listed in Table 2. of the paper) to the server using default settings and, where necessary, settings optimized for best performance with the given query. The result was considered positive if the server reported a structure falling into the same CATH 'H' level as the query (using the classification of Table 2 in the paper, it should be noted that a number of the structures contains additonal domains listed in CATH) excluding self hits. The table below summarizes the results for each query and details the optimizations that were necessary to obtain positive hits.
We note here a few differences in our data set compared to the original one used by Novotny et al.: the class 3.10.70 in the paper has changed in CATH and is denoted 3.10.50 in version 2.5.1; 1hyl was considered misspelled as 1hy1 is actually in class 1.10.40; 1c3u is not in class 1.10.164 but in 10.10.40.
The 'default' server settings are: H-level filtered CATHselect search, Window/Slide: 160/80 and reporting the 10 best structures for each query (segment). The rank reported corresponds to hits for the segment the correct hit was found for. The last unit of the classification (L#) denotes the lenght of the domain. Optimizations are noted as differences compared to the default settings (and listed also for some positive but weak hits with the default settings).

PDB IDCATH H level Rank of 1st correct hitPRIDE2 of 1st correct hit CATH name/classification of correct hit Optimizations/results obtained
All-α structures
1RLR 1_10_40_20 - - Window/Slide: 80/40, full database search: 1st hit: 0.907 3r1rC1 1_10_40_20_1_1_1_L86
1C3U 1_10_40_30 - - Window/Slide: 80/40, 5th hit: 0.544 1i0aD3 1_10_40_30_2_3_1_L72
1AUW 1_10_40_30 3 0.430 1hy0B3 1_10_40_30_2_2_1_L72 Window/Slide: 70/30, 1st hit: 0.991 1k7wD3 1_10_40_30_2_1_1_L72
1FUR 1_10_40_30 1 0.999 2fusB3 1_10_40_30_1_1_2_L51
1HY1 1_10_40_30 1 0.429 1hy0B3 1_10_40_30_2_2_1_L72 Window/Slide: 72/36 , 1st hit: 0.728 1dcnD3 1_10_40_30_2_1_2_L72
1I0A 1_10_40_30 2 0.420 1k62B3 1_10_40_30_2_4_1_L72 Window/Slide: 72/36, full database search: 1st hit: 0.943 1k62B3 1_10_40_30_2_4_1_L72
1JSW 1_10_40_30 5 0.785 1yfm03 1_10_40_30_1_2_1_L52
1YFM 1_10_40_30 1 0.899 2fusA3 1_10_40_30_1_1_1_L52
 
1AQ6 1_10_164_10 - - Window/Slide: 76/38, full database search: 1st hit: 0.804 1qq7B2 1_10_164_10_1_2_3_L75
1ZRN 1_10_164_10 - - Window/Slide: 80/40, full database search, 2nd hit: 0.809 1jud02 1_10_164_10_1_1_1_L75
1JUD 1_10_164_10 - - Window/Slide: 76/38, full database search: 8th hit: 0.730 1zrn02 1_10_164_10_1_3_1_L75
1FEZ 1_10_164_20 - - Unique at this 'H' level (no non-self hits can be expected)
 
1B3U 1_25_10_10 4 0.503 1m5nS0 1_25_10_10_6_1_5_L485
1BK6 1_25_10_10 2 0.980 1ee5A0 1_25_10_10_3_1_4_L421
1GCJ 1_25_10_10 2 0.961 1ibrB0 1_25_10_10_6_1_1_L458
1IAL 1_25_10_10 2 0.994 1pjmB0_1_25_10_10_3_2_3_L427
1IBR 1_25_10_10 3 0.962 1o6pB0_1_25_10_10_6_1_3_L441
1QBK 1_25_10_10 Finds only itself
2BCT 1_25_10_10 2 0.950 1jpwC0 1_25_10_10_2_1_9_L502

All-β structures
1CI0 2_30_110_10 3 0.902 1g79A0 2_30_110_10_1_2_2_L199
1DNL 2_30_110_10 1 1.000 1g79A0 2_30_110_10_1_2_2_L199
1EJE 2_30_110_20 2 0.737 1i0sA0_2_30_110_20_2_1_1_L161
1I0R 2_30_110_20 1 1.000 1i0sA0 2_30_110_20_2_1_1_L161
 
1A33 2_40_100_10 1 1.000 1c5fE0 2_40_100_10_2_4_1_L174
1AWQ 2_40_100_10 1 1.000 1m9eB0 2_40_100_10_2_1_6_L162
1CYN 2_40_100_10 2 1.000 2rmcG0 2_40_100_10_2_3_1_L182
1DYW 2_40_100_10 1 1.000 1e8kA0 2_40_100_10_2_6_1_L172
1IHG 2_40_100_10 1 0.998 1iipA1 2_40_100_10_2_8_1_L184
1LOP 2_40_100_10 2 1.000 2nul00 2_40_100_10_1_2_2_L163
1QNG 2_40_100_10 2 1.000 1qnhA0 2_40_100_10_2_7_2_L169
1QOI 2_40_100_10 2 0.984 1c5fI0 2_40_100_10_2_4_4_L173
2RMC 2_40_100_10 2 1.000 1cynA0 2_40_100_10_2_2_1_L178
 
1CIY 2_100_10_10 2 0.635 1dlc03 2_100_10_10_1_1_1_L197
1DLC 2_100_10_10 2 0.912 1ji6A3 2_100_10_10_3_1_1_L199
1VMO 2_100_10_20 - - Unique at this 'H' level (no non-self hits can be expected)
1C3K 2_100_10_30 2 1.000 1c3nA0 2_100_10_30_2_1_1_L144
1JAC 2_100_10_30 1 1.000 1m26G0 2_100_10_30_1_1_1_L133
1JOT 2_100_10_30 2 0.999 1m26G0 2_100_10_30_1_1_1_L133

Mixed α-β structures
1GRJ 3_10_50_30 - - Unique at this 'H' level (no non-self hits can be expected)
1BKF 3_10_50_40 1 1.000 2fke00 3_10_50_40_1_1_1_L107
1PBK 3_10_50_40 1 0.811 1n1aB0 3_10_50_40_1_8_1_L121
1ROT 3_10_50_40 1 0.971 1rou00_3_10_50_40_1_4_1_L118
1YAT 3_10_50_40 2 0.993 1bl4B0 3_10_50_40_1_1_5_L107
 
1CFR 3_40_91_10 2 0.795 1knvB0 3_40_91_10_3_1_2_L291
1BHM 3_40_91_20 2 1.000 2bamB0 3_40_91_20_1_1_4_L210
1D2I 3_40_91_20 1 1.000 1dfmB0 3_40_91_20_2_1_2_L213
1FOK 3_40_91_30 2 0.986 2fokA3 3_40_91_30_1_1_2_L175
 
1AXC 3_70_10_10 4 0.956 1plr00 3_70_10_10_1_1_1_L258
1GE8 3_70_10_10 2 0.995 1iz4A0_3_70_10_10_2_1_3_L241
1PLQ 3_70_10_10 1 0.978 1plr00 3_70_10_10_1_1_1_L258
1B77 3_70_10_20 1 1.000 1czdC0 3_70_10_20_1_2_1_L228
1CZD 3_70_10_20 2 0.994 1b8hC0 3_70_10_20_1_1_1_L228
1DML 3_70_10_30 - - Unique at this 'H' level (no non-self hits can be expected)

Folds with few secondary structures
1B2I 2_40_20_10 2 0.894 2pk400 2_40_20_10_1_1_1_L80
1CEA 2_40_20_10 1 0.994 2pf202 2_40_20_10_1_10_1_L82
1KDU 2_40_20_10 2 0.754 1tpkC0 2_40_20_10_1_3_1_L88
1KIV 2_40_20_10 2 0.999 4kiv00 2_40_20_10_1_7_1_L79
1KRN 2_40_20_10 2 0.994 1i71A0 2_40_20_10_1_11_1_L83
1PK4 2_40_20_10 1 0.999 3kiv00 2_40_20_10_1_7_3_L79
1PML 2_40_20_10 1 0.919 1tpkC0_2_40_20_10_1_3_1_L88
5HPG 2_40_20_10 2 0.961 2pk400 2_40_20_10_1_1_1_L80

It is apparent that the group 1_10_164 performs remarkably poor. This is partly because the size of the domains is less than half of the default window size (although this seems not a serious hindrance in other groups!) and the structures in CATHselect at these 'H' levels are not similar enough to yield a group score high enough for effective H-level filtered search.
It should be noted that if self-hits are considered, all queries were correctly identified (if the query is in CATHselect, it is the first hit, that is why in many cases the 2nd, 3rd.. hit is listed). In another test, using all domain structures (instead of the full PDB files) listed in CATH 2.5.1 and belonging to the groups considered by Novotny et al, a success rate of > 95% was obtained by the H-level filtered method and all remaining structures were found by full CATHselect search (counting self-hits as positive).


New structures

The sequences of the new structures were submitted to a BLAST search against the domain sequences in CATH2.5.1. The significant BLAST hits (the best one for each query and domain hit) were then compared to the best PRIDE2 hit. From the 28 different (significant) BLAST hits 19 were reproduced (i.e. a best hit at the same 'H' level was found) using Window/Slide 160/80 and 25 by using 100/50.

PDB IDbest BLAST hit BLAST score BLAST E value CATH classification of best BLAST hit Query segment for best PRIDE2 hit Best PRIDE2 hit PRIDE2 of best hit
1WA5 1a2kC0 353 8e-99 3_40_50_300_10_1_8 chain A 51-151 1ibrC0_3_40_50_300_10_1_14_L169 0.838
1WA5 1bk5A0 831 0.0 1_25_10_10_3_1_2 chain B 151-251 1ee5A0_1_25_10_10_3_1_4_L421 0.888
1WCH 1a5y00 160 2e-40 3_90_190_10_2_1_5 chain A 1lqfD0_3_90_190_10_2_1_16_L287 0.776
1WQ8 1bj1V0 108 2e-25 2_10_90_10_8_1_1 chain A 0-100 1fltW0_2_10_90_10_8_1_6_L98 0.985
1WQ9 1bj1V0 107 6e-25 2_10_90_10_8_1_1 chain A -4-96 2vpfG0_2_10_90_10_8_1_5_L95 0.996
1WSW 1akq00 293 1e-80 3_40_50_360_1_2_7 chain A 1akv00_3_40_50_360_1_2_7_L147 1.000
1WT1 1fg5N0 199 5e-52 3_90_550_10_4_1_3 chain A 1lz7A0_3_90_550_10_4_2_1_L266 1.000
1WTN 132l00 281 2e-77 1_10_530_10_3_1_4 chain A 1uie00_1_10_530_10_3_1_14_L129 1.000
1XCL 1khhA0 401 e-113 3_40_50_150_23_1_1 chain A 1p1cB0_3_40_50_150_23_1_1_L193 0.908
1XCX 1aqh01 328 1e-90 3_20_20_80_13_4_2 chain A 1kbkA1_3_20_20_80_13_2_11_L402 0.863
1XCX 1bsi02 199 1e-51 2_60_40_1180_5_2_2 chain A 396-496 3cpuA2_2_60_40_1180_5_2_2_L93 1.000
1XO5 1dguA0 289 2e-79 1_10_238_10_19_1_1 chain A 101-191 1irjG0_1_10_238_10_3_38_2_L84 0.747
1XO7 1a3300 147 1e-36 2_40_100_10_2_4_1 chain B 2rmcG0_2_40_100_10_2_3_1_L182 0.990
1XPC 1a28A0 97 4e-21 1_10_565_10_4_1_1 chain A 1qktA0_1_10_565_10_3_1_8_L248 0.985
1XSA 1kt9A0 117 1e-27 3_90_79_10_5_1_2 chain A 1ktgA0_3_90_79_10_5_1_1_L137 0.767
1XVX 1d9vA1:1:2 96 7e-21 3_40_190_10_16_1_2 chain A 101-201 1d9yA1_3_40_190_10_16_2_1_L156 0.747
1XVY 1d9vA1:1:2 96 9e-21 3_40_190_10_16_1_2 chain A 101-201 1d9vA1_3_40_190_10_16_1_2_L156 0.810
1XW6 1c72A1:1:2 136 4e-33 3_40_30_10_6_13_1 chain B 1-101 1gtuD1_3_40_30_10_6_14_1_L112 0.949
1XW6 1c72A2 97 2e-21 1_20_1050_10_1_5_2 chain A 101-201 1gtuD2_1_20_1050_10_1_9_1_L105 0.846
1XWV 1a9v00 244 4e-66 2_60_40_770_1_1_1 chain A 1ktjB0_2_60_40_770_1_1_2_L129 0.999
1Y15 1ag200 190 3e-50 1_10_790_10_1_1_1 chain A 1e1wA0_1_10_790_10_1_4_3_L104 0.890
1Y1X 1hqvA0 107 1e-24 1_10_238_10_20_1_1 chain A 1hqvA0_1_10_238_10_20_1_1_L178 0.896
1Y2S 1ag200 171 2e-44 1_10_790_10_1_1_1 chain A 131-231 1qm3A0_1_10_790_10_1_4_1_L104 0.842
1Y4V 1a00A0 115 3e-27 1_10_490_10_4_1_3 chain D 1a0xD0_1_10_490_10_4_8_14_L146 0.999
1Y67 1abmA0 181 1e-46 3_90_149_10_1_1_1 chain D 1en4B0_3_90_149_10_1_5_7_L205 0.986
1YB1 1gegA0 96 7e-21 3_40_50_720_6_4_1 chain A 51-151 1gegH0_3_40_50_720_6_4_1_L255 0.782


Comparison of PRIDE and PRIDE2

Results presented here are based on all-against-all searches on the CATHselect database derived from CATH v2.5.1.

First, a table similar to Table 1. in the original PRIDE paper (Carugo O and Pongor S, Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison, J Mol Biol 2002 Jan 25;315(4):887-98) is presented (identical representatives are omitted as there is only one domain for each 'I' level in the CATHselect database).
PRIDEPRIDE2
CATH levelNumber of comparisonsaveragestandard deviationnearest neighbor within the same group (%)averagestandard deviationnearest neighbor within the same group (%)
N766310.950.1364.760.950.1065.51
S5334450.640.2786.360.680.2187.12
H13007120.520.2690.030.490.2490.84
T49945600.410.2293.940.300.2091.81
A178788600.310.2293.090.240.1693.76
C568287090.280.2197.640.200.1396.50
All1591952460.170.18100.000.150.12100.00


Second, a ROC (Receiver Operating Characteristics) analysis is presented for two of the most populated H-level groups in our database (chosen on the basis of the test set used in the paper by Sierk & Pearson: Sierk,M.L. and Pearson,W.R. (2004) Sensitivity and selectivity in protein structure comparison. Protein Sci., 13, 773-785.). Here again the PRIDE and PRIDE2 methods are compared.
Group used: 2_60_40_10 (1115 domains)Group used: 3_40_50_720 (308 domains)
Full ROC curveFull ROC curve
Upper part of ROC curveUpper part of ROC curve


Both types of analysis demonstrates that the use of the Kolmogorov-Smirnov test (i.e. unbinned distributions) causes a slight but noticeable improvement in performance.