small logo

Protein Classification Benchmark Collection

Examples and tips for use

 

1) For simple comparison, use the tables accessible for the view data commands. We will use the example of the 3PGK protein dataset (record 29).

1.1 A summary of the results are available for you in the Results section. These results are available as an table containing a media of the ROC AUC values for all the tested classification algorithms and similarity measures.

1.2 Viewing detailed results.

When needed detailed results are available using the form in the image as presented below:

 

 

You choose a method and a similarity measure and you can click on view to view the results in a web layout. Just as well you can choose more similarity measures and more methods .

 

You can get a similar view for other methods using the view data command of the chosen method.

1.2 Viewing results for the same similarity/distance measure and different algorithms.

Go to results, choose any method (e.g. SVM). Chose any distance matrix and click on view button.

                                                                                                                                                                                                            

 

2) If ou are interested in methods development, it is a good start to reproduce one set of results given in the database.  For example 3PGK protein (Record X) is a small dataset that can be used for this purpose. You can download data (sequences/structures, a distance matrix, cast matrix) of this dataset, and run the 1NN script available at this site or the programs described in the methods section of the record.

The SCO40mini database was created for developers interested in protein domain comparison algorithms. Naturally, it is advised to test the results on larger datasets as well.

3) If you are interested in developing new similarity/distance measures, you can produce a distance matrix according to the description given under Data formats, and then perform the calculations either with  the 1NN script available at this site or the programs described in the methods section of the chosen record. You can then compare the data with those given in a table view described at 1.1. 

For example, if you want to develop a new similarity /distance measure for protein sequences, you can use 3PGK protein (record 29). Download 3PGK_PROTEIN.fasta, calculate your experimental similarity/distance measure in the form of a distance matrix given under Data formats, and in particular as given e.g. in 3PGK_29_BLAST.dmx.  Then you can use the 1NN script available at this site or the programs described in the methods section of the record.

4) If you are interested in developing a new algorithm, you can start programming/testing the algorithm on a small dataset  such as 3PGK protein, using the cast matrix  3PGK_29.cast, and e.g.  Smith-Waterman distance matrix 3PGK_PROTEIN_SW.dmx. Once ready, you can choose a larger dataset closer to your actual application. You can then compare the results with those given in a table view described at 1.2. 

 

©2006 ICGEBNet