
         :-) :-) :-) CASBench (-: (-: (-: 
    A benchmarking set of proteins with catalytic 
  and allosteric sites annotated in their structures
	        Dataset version: 1.0
	  Release date: February 22, 2018

-----------------
Citing CASBench
-----------------

If you find CASBench usefull please cite:
A. Zlobin, D. Suplatov, K. Kopylov, V. Švedas (2018) Acta Naturae, in press

-----------------
On-line resources
-----------------

Please visit the CASBench web-site for more info: https://biokinet.belozersky.msu.ru/casbench

You can browse CASBench on-line using the interactive analysis tools at https://biokinet.belozersky.msu.ru/casbenchbrowse
Interactivity is implemented in HTML5, a language native to web browsers, therefore neither plugins nor Java are required.

-----------------
Dataset structure
-----------------

One CASBench entry corresponds to one protein which can be represented by more than one PDB structure.
For each PDB structure the following information is provided in a corresponding folder [PDBX]/ (where [PDBX] is an actual PDB code):

[PDBX]/[PDBX].pdb             - a 3D structure file of a biological unit
[PDBX]/CATALYTIC_SITES.txt    - a text file with annotation of the catalytic site(s)
[PDBX]/ALLOSTERIC_SITES.txt   - a text file with annotation of the allosteric site(s)
[PDBX]/CATALYTIC_LIGANDS.txt  - a text file with names of crystallographic ligands bound to the catalytic site(s)
[PDBX]/ALLOSTERIC_LIGANDS.txt - a text file with names of crystallographic ligands bound to the allosteric site(s)
[PDBX]/[PDBX].pse             - PyMol binary PSE session file with annotation of catalytic and allosteric sites in the protein's structure. 

alignments/                   - folder with multiple alignments is provided for a CASBench entry

NB! 
Please note that for multichain proteins the annotation is provided for structurally unique binding sites only, 
i.e., duplicate annotations of structurally identical sites, which were formed when restoring a full-size biological 
unit from PDB, are removed to avoid redundancy. Example: if a protein's biological unit is a homohexamer, but PDB 
contains only chains A, B, and C, then the full-size protein will be reconstructed from the available subunits by 
applying the BIOMT transformation (i.e., A→D, B→E, C→F). Annotation will be provided only for the binding sites which 
are primarily hosted by the structurally unique chains A, B, and C (i.e., even when the amino acid sequence is the 
same, the exact location of the corresponding atoms can be slightly different in A, B, and C), and the annotation 
for binding sites hosted primarily by chains D, E, and F, which are structurally identical to A, B, and C, will be ommited. 

NB!
A multiple sequence alignment of a protein family was built for each chain by the Mustguseal method as described in 
the CASBench paper. Please note that for multichain proteins the sequence alignment is provided only for chains with 
a unique amino acid sequence 

-----------------
Example
-----------------
The hierarchy of files in the CAS0001 catalog:

cas0001/
cas0001/1owb
cas0001/1owb/CATALYTIC_SITES.txt
cas0001/1owb/CATALYTIC_LIGANDS.txt
cas0001/1owb/1owb.pse
cas0001/1owb/ALLOSTERIC_SITES.txt
cas0001/1owb/1owb.pdb
cas0001/1owb/ALLOSTERIC_LIGANDS.txt
cas0001/4jae
cas0001/4jae/CATALYTIC_SITES.txt
cas0001/4jae/4jae.pse
cas0001/4jae/CATALYTIC_LIGANDS.txt
cas0001/4jae/ALLOSTERIC_SITES.txt
cas0001/4jae/4jae.pdb
cas0001/4jae/ALLOSTERIC_LIGANDS.txt
etc for other PDBs ...
cas0001/alignments
cas0001/alignments/FAMILY_cas0001_1nxg_A_MSA.fasta

-----------------

Please address your inquiries to:
Dmitry Suplatov d.a.suplatov@belozersky.msu.ru

https://biokinet.belozersky.msu.ru/casbench

-----------------
EOF