CS23D2.0 - I/O formats

		Chemical Shift to 3D Structure 2.0



HOME :: DOCUMENTATION :: I/O FORMATS :: GALLERY :: CONTACT

Input File Format

CS23D2.0 accepts and processes backbone and side chain 1H, 13C or 15N chemical shift data of almost any
combination (HA only, HN only, HA+HN only, HA+HN+sidechain H, CA only, CA+CB only, CA+CO only,
HA+CA+CB, HN+CA+CB, HN+15N only, HN,+15N+CA, HN+15N+CA+CB, etc.). This allows CS23D2.0 to
handle small peptides (where only H shifts are typically measured) to large proteins (where only N or C
shifts might be available).

The input file must include sequence data and chemical shift data either in BMRB STAR 2.1 (or 2.1.1)
format or SHIFTY format. The minimum sequence length is 3 residues. The maximum is 1000 residues.

FASTA sequence Format

Example #1:

>CS23:D|PDBID|CHAIN|SEQUENCE
LTLNANPLDATQSEDVVCPVFGTPRTCQIHGRSRELAK

Example #2:

LTLNANPLDATQSEDVVCPVFGTPRTCQIHGRSRELAK

BMRB Format

Support BMRB nmrstart2.1, 3.1, NMR Exchange Format(NEF) format.
Examples of allowable BMRB files (with and without different headers) are shown below:

Example #1: This is an example of a generic BMRB file extracted from the BMRB. The entire
file is ~500 lines, and only a portion is shown here. The header file is not important for CS23D2.0
data processing,only the chemical shift list (at the bottom of the file). CS23D2.0 ignores most (if
not all) of the header text.


data_548

#######################
#  Entry information  #
#######################

save_entry_information
   _Saveframe_category      entry_information

   _Entry_title            
;
Sequence-Specific 1H NMR Assignment and Secondary Structure of Neuropeptide Y in
 Aqueous Solution
;

   loop_
      _Author_ordinal
      _Author_family_name
      _Author_given_name
      _Author_middle_initials
      _Author_family_title

      1 Saudek Vladimir .  . 
      2 Pelton John     T. . 

   stop_

   _BMRB_accession_number   548
   _BMRB_flat_file_name     bmr548.str
   _Entry_type              revision
   _Submission_date         1995-07-31
   _Accession_date          1996-04-12
   _Entry_origination       BMRB
   _NMR_STAR_version        2.1
   _Experimental_method     NMR

ETC.
ETC.

   loop_
      _Atom_shift_assign_ID
      _Residue_seq_code
      _Residue_label
      _Atom_name
      _Atom_type
      _Chem_shift_value
      _Chem_shift_value_error
      _Chem_shift_ambiguity_code

        1  1 TYR   HA    H 4.53 . 1 
        2  1 TYR   HB2   H 3.05 . 2 
        3  1 TYR   HB3   H 3.28 . 2 
        4  1 TYR   HD1   H 7.28 . 1 
        5  1 TYR   HD2   H 7.28 . 1 
        6  1 TYR   HE1   H 6.93 . 1 
        7  1 TYR   HE2   H 6.93 . 1 
        8  2 PRO   HA    H 4.59 . 1 
        9  2 PRO   HB2   H 2.01 . 2 
       10  2 PRO   HB3   H 2.39 . 2 
       11  2 PRO   HG2   H 1.48 . 1 
       12  2 PRO   HG3   H 1.48 . 1 
       13  2 PRO   HD2   H 3.38 . 2 
       14  2 PRO   HD3   H 3.74 . 2 
       15  3 SER   H     H 8.42 . 1 
       16  3 SER   HA    H 4.38 . 1 
       17  3 SER   HB2   H 3.83 . 1 
       18  3 SER   HB3   H 3.83 . 1

Example #2: This is an example of a slightly shortened BMRB format where only the assigned
chemical shift section of the BMRB file is provided.


	##############################
	#  assigned chemical shifts  #
	##############################



save_assigned_chem_shift_list_1
   _Saveframe_category               assigned_chemical_shifts


   loop_
      _Software_label

      $NMRPipe 

   stop_

   loop_
      _Sample_label

      $sample_1 
      $sample_2 

   stop_

   _Sample_conditions_label         $sample_conditions_1
   _Chem_shift_reference_set_label  $chemical_shift_reference_1
   _Mol_system_component_name        entity_1

   loop_
      _Atom_shift_assign_ID
      _Residue_author_seq_code
      _Residue_seq_code
      _Residue_label
      _Atom_name
      _Atom_type
      _Chem_shift_value
      _Chem_shift_value_error
      _Chem_shift_ambiguity_code

        1  1  1 GLY HA2  H   4.44 0.0300 2 
        2  1  1 GLY HA3  H   3.72 0.0300 2 
        3  1  1 GLY CA   C  44.81 0.4000 1 
        4  2  2 SER H    H   8.70 0.0300 1 
        5  2  2 SER N    N 121.24 0.4000 1 
        6  4  4 MET HA   H   4.30 0.0300 1 
        7  4  4 MET HB2  H   2.11 0.0300 2 
        8  4  4 MET HB3  H   1.94 0.0300 2 
        9  4  4 MET HG2  H   2.30 0.0300 2 
       10  4  4 MET HG3  H   2.30 0.0300 2 
       11  4  4 MET C    C 172.22 0.4000 1 
       12  4  4 MET CA   C  55.62 0.4000 1 
       13  4  4 MET CB   C  29.60 0.4000 1

Example #3: This is an example of the simplest BMRB format that CS23D2.0 accepts. Only the
chemical shift list is provided with no preceding data tags. The number of columns in this
example is 9.


        1  1  1 GLY HA2  H   4.44 0.0300 2 
        2  1  1 GLY HA3  H   3.72 0.0300 2 
        3  1  1 GLY CA   C  44.81 0.4000 1 
        4  2  2 SER H    H   8.70 0.0300 1 
        5  2  2 SER N    N 121.24 0.4000 1 
        6  4  4 MET HA   H   4.30 0.0300 1 
        7  4  4 MET HB2  H   2.11 0.0300 2 
        8  4  4 MET HB3  H   1.94 0.0300 2 
        9  4  4 MET HG2  H   2.30 0.0300 2 
       10  4  4 MET HG3  H   2.30 0.0300 2 
       11  4  4 MET C    C 172.22 0.4000 1 
       12  4  4 MET CA   C  55.62 0.4000 1
       13  4  4 MET CB   C  29.60 0.4000 1

Example #4: This is another example of a simplified BMRB format that CS23D2.0 also accepts.
The number of data columns in this example is 8. The minimum number of columns that
CS23D2.0 accepts is 8. If no data is available for the chemical shift error or ambiguity, these
values can be replaced by a period (as seen in this example).


loop_
      _Atom_shift_assign_ID
      _Residue_author_seq_code
      _Residue_seq_code
      _Residue_label
      _Atom_name
      _Atom_type
      _Chem_shift_value
      _Chem_shift_value_error
      _Chem_shift_ambiguity_code

        1  1 GLY HA2  H   4.44 . . 
        2  1 GLY HA3  H   3.72 . . 
        3  1 GLY CA   C  44.81 . . 
        4  2 SER H    H   8.70 . . 
        5  2 SER N    N 121.24 . . 
        6  4 MET HA   H   4.30 . . 
        7  4 MET HB2  H   2.11 . . 
        8  4 MET HB3  H   1.94 . . 
        9  4 MET HG2  H   2.30 . . 
       10  4 MET HG3  H   2.30 . . 
       11  4 MET C    C 172.22 . . 
       12  4 MET CA   C  55.62 . . 
       13  4 MET CB   C  29.60 . .

Example #5: Here is another example of an acceptable BMRB format. In this situation the
“case” of the assignment loop is upper case (instead of the usual lower case). The number of
data columns is 9,even though the Author_seq_code and residue_seq_code are duplicated.


loop_
      _ATOM_SHIFT_ASSIGN_ID
      _RESIDUE_AUTHOR_SEQ_CODE
      _RESIDUE_SEQ_CODE
      _RESIDUE_LABEL
      _ATOM_NAME
      _ATOM_TYPE
      _CHEM_SHIFT_VALUE
      _CHEM_SHIFT_VALUE_ERROR
      _CHEM_SHIFT_AMBIGUITY_CODE

	1  1  1 GLY HA2  H   4.44 0.0300 . 
        2  1  1 GLY HA3  H   3.72 0.0300 . 
        3  1  1 GLY CA   C  44.81 0.4000 . 
        4  2  2 SER H    H   8.70 0.0300 . 
        5  2  2 SER N    N 121.24 0.4000 . 
        6  4  4 MET HA   H   4.30 0.0300 . 
        7  4  4 MET HB2  H   2.11 0.0300 . 
        8  4  4 MET HB3  H   1.94 0.0300 . 
        9  4  4 MET HG2  H   2.30 0.0300 . 
       10  4  4 MET HG3  H   2.30 0.0300 . 
       11  4  4 MET C    C 172.22 0.4000 . 
       12  4  4 MET CA   C  55.62 0.4000 . 
       13  4  4 MET CB   C  29.60 0.4000 .

Example #6: In this example the data is presented in a tab-delimited format rather than
following the usual 3-character spacing found in most BMRB files. Comments have also been
added below the chemical shift assignment loop and above the data columns. This format
(and modest variations of it) is also accepted by CS23D2.0.


loop_
      _ATOM_CHEM_SHIFT.ID
      _ATOM_CHEM_SHIFT.COMP_INDEX_ID
      _ATOM_CHEM_SHIFT.COMP_ID
      _ATOM_CHEM_SHIFT.ATOM_ID
      _ATOM_CHEM_SHIFT.ATOM_TYPE
      _ATOM_CHEM_SHIFT.VAL
      _ATOM_CHEM_SHIFT.VAL_ERR
      _ATOM_CHEM_SHIFT.AMBIGUITY_CODE
      _ATOM_CHEM_SHIFT.OCCUPANCY
#
# some comments placed here
# more comments
#
1  	1  	GLY 	HA2  	H   	4.44     0.0300     2 
2  	1  	GLY 	HA3  	H   	3.72     0.0300     2 
3  	1  	GLY 	CA   	C  	44.81    0.4000     1 
4  	2  	SER 	H    	H   	8.70     0.0300     1 
5  	2  	SER 	N    	N 	121.24   0.4000     1 
6  	4  	MET 	HA   	H   	4.30     0.0300     1 
7  	4  	MET 	HB2  	H   	2.11     0.0300     2 
8  	4  	MET 	HB3  	H   	1.94     0.0300     2 
9  	4  	MET 	HG2  	H   	2.30     0.0300     2 
10  	4  	MET 	HG3  	H   	2.30     0.0300     2 
11  	4  	MET 	C    	C 	172.22   0.4000     1 
12  	4  	MET 	CA   	C  	55.62    0.4000     1 
13  	4  	MET 	CB   	C  	29.60    0.4000     1

Example #7: In this example the data is presented in a single-space-delimited format rather
than following the usual 3-character spacing found in most BMRB files. Comments have also
been added below the chemical shift assignment loop and above the data columns. This
format (and modest variations of it) is also accepted by CS23D2.0.


loop_
      _ATOM_CHEM_SHIFT.ID
      _ATOM_CHEM_SHIFT.COMP_INDEX_ID
      _ATOM_CHEM_SHIFT.COMP_ID
      _ATOM_CHEM_SHIFT.ATOM_ID
      _ATOM_CHEM_SHIFT.ATOM_TYPE
      _ATOM_CHEM_SHIFT.VAL
      _ATOM_CHEM_SHIFT.VAL_ERR
      _ATOM_CHEM_SHIFT.VAL_ERROR
      _ATOM_CHEM_SHIFT.AMBIGUITY_CODE
      _ATOM_CHEM_SHIFT.OCCUPANCY
      _ATOM_CHEM_SHIFT.DETAILS
#
# some comments placed here
# more comments

1 1 1 GLY HA2 H 4.44 0.03 2. 
2 1 1 GLY HA3 H 3.72 0.03 2. 
3 1 1 GLY CA C 44.81 0.4 1. 
4 2 2 SER H H 8.70 0.03 1. 
5 2 2 SER N N 121.24 0.4 1. 
6 4 4 MET HA H 4.30 0.03 2. 
7 4 4 MET HB2 H 2.11 0.03 2. 
8 4 4 MET HB3 H 1.94 0.03 2. 
9 4 4 MET HG2 H 2.30 0.03 2. 
10 4 4 MET HG3 H 2.30 0.03 1. 
11 4 4 MET C C 172.22 0.4 1. 
12 4 4 MET CA C 55.62 0.4 1.

Example #8: This is an example of a generic BMRB new format file extracted from the BMRB.


LOOP_
        _ATOM_CHEM_SHIFT.ID 
        _ATOM_CHEM_SHIFT.ASSEMBLY_ATOM_ID 
        _ATOM_CHEM_SHIFT.ENTITY_ASSEMBLY_ID 
        _ATOM_CHEM_SHIFT.ENTITY_ID 
        _ATOM_CHEM_SHIFT.COMP_INDEX_ID 
        _ATOM_CHEM_SHIFT.SEQ_ID 
        _ATOM_CHEM_SHIFT.COMP_ID 
        _ATOM_CHEM_SHIFT.ATOM_ID 
        _ATOM_CHEM_SHIFT.ATOM_TYPE 
        _ATOM_CHEM_SHIFT.ATOM_ISOTOPE_NUMBER 
        _ATOM_CHEM_SHIFT.VAL 
        _ATOM_CHEM_SHIFT.VAL_ERR 
        _ATOM_CHEM_SHIFT.ASSIGN_FIG_OF_MERIT 
        _ATOM_CHEM_SHIFT.AMBIGUITY_CODE 
        _ATOM_CHEM_SHIFT.OCCUPANCY 
        _ATOM_CHEM_SHIFT.RESONANCE_ID 
        _ATOM_CHEM_SHIFT.AUTH_ENTITY_ASSEMBLY_ID 
        _ATOM_CHEM_SHIFT.AUTH_SEQ_ID 
        _ATOM_CHEM_SHIFT.AUTH_COMP_ID 
        _ATOM_CHEM_SHIFT.AUTH_ATOM_ID 
        _ATOM_CHEM_SHIFT.DETAILS 
        _ATOM_CHEM_SHIFT.ENTRY_ID 
        _ATOM_CHEM_SHIFT.ASSIGNED_CHEM_SHIFT_LIST_ID 
          1   .   1   1    1    1   LYS   HA     H    1     4.133   0.000   .   1   .   .   .    1   K   HA     .   16747   1   
          2   .   1   1    1    1   LYS   HB2    H    1     1.685   0.000   .   2   .   .   .    1   K   HB     .   16747   1   
          3   .   1   1    1    1   LYS   HB3    H    1     1.685   0.000   .   2   .   .   .    1   K   HB     .   16747   1   
          4   .   1   1    1    1   LYS   HD2    H    1     1.435   0.000   .   2   .   .   .    1   K   HD2    .   16747   1   
          5   .   1   1    1    1   LYS   HD3    H    1     1.401   0.000   .   2   .   .   .    1   K   HD3    .   16747   1   
          6   .   1   1    1    1   LYS   HE2    H    1     2.830   0.000   .   2   .   .   .    1   K   HE     .   16747   1   
          7   .   1   1    1    1   LYS   HE3    H    1     2.830   0.000   .   2   .   .   .    1   K   HE     .   16747   1   
          8   .   1   1    1    1   LYS   HG2    H    1     1.334   0.000   .   2   .   .   .    1   K   HG     .   16747   1   
          9   .   1   1    1    1   LYS   HG3    H    1     1.334   0.000   .   2   .   .   .    1   K   HG     .   16747   1   
         10   .   1   1    1    1   LYS   CA     C   13    51.650   0.000   .   1   .   .   .    1   K   CA     .   16747   1   
         11   .   1   1    1    1   LYS   CB     C   13    29.270   0.000   .   1   .   .   .    1   K   CB     .   16747   1   
         12   .   1   1    1    1   LYS   CD     C   13    26.130   0.000   .   1   .   .   .    1   K   CD     .   16747   1   
         13   .   1   1    1    1   LYS   CE     C   13    39.360   0.000   .   1   .   .   .    1   K   CE     .   16747   1

Example #9: This is an example of NMR Exchange Format(NEF) format of the BMRB.


save_chemical_shift_list_1
   _nef_chemical_shift_list.sf_category                nef_chemical_shift_list
   _nef_chemical_shift_list.sf_framecode               chemical_shift_list_1
   _nef_chemical_shift_list.atom_chemical_shift_units  ppm

   loop_
      _nef_chemical_shift.chain_code
      _nef_chemical_shift.sequence_code
      _nef_chemical_shift.residue_type
      _nef_chemical_shift.atom_name
      _nef_chemical_shift.value
      _nef_chemical_shift.value_uncertainty

     A   10   HIS   C      175.19    0.4
     A   10   HIS   CA     56.002    0.4
     A   10   HIS   CB     30.634    0.4
     A   10   HIS   CD2    119.578   0.4
     A   10   HIS   HA     4.687     0.02
     A   10   HIS   HBX    3.106     0.02
     A   10   HIS   HBY    3.201     0.02
     A   10   HIS   HD2    7.067     0.02
     A   11   MET   C      175.775   0.4
     A   11   MET   CA     55.347    0.4
     A   11   MET   CB     34.981    0.4
     A   11   MET   CG     32.805    0.4

SHIFTY Format

The SHIFTY is a simplified chemical shift data entry format developed in the Sykes Lab in
1991 and is one of the more common “alternate” formats for chemical shift information.
Examples of allowable SHIFTY formats are shown below (note that any combination of shifts may
be listed in any order, just as long as the columns are labeled with a header). The first line header
is essential. The header can be matched to the column positions or it can be presented as a
single spaced row. Minimally a SHIFTY file must have 3 columns: a residue number column, the
single letter residue name column and a chemical shift column. Unmeasured or undetectable
chemical shifts can be entered as either 0.00 or – or *.


# AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557 
6 V 4.5204 8.4684 123.4184 61.4330 34.6444 173.0311 
7 T 4.9002 8.2696 119.8067 62.2487 70.0431 174.1138 
8 I 4.1698 8.8360 129.2597 61.8793 37.2884 176.4472 
9 T 4.4136 8.2868 115.9694 60.8221 70.1452 174.6432 
10 A 4.2796 8.0655 127.7723 50.9885 19.0033 176.6414 
11 P 4.3562 0.0000 0.0000 65.5591 31.2252 177.2392 
12 N 4.8824 7.8942 112.1161 52.5902 39.2484 177.0207 
13 G 3.7309 7.5941 106.4993 46.8305 0.0000 174.5358 
14 L 4.6853 9.7859 121.2612 53.1092 41.6631 175.3041 
15 D 4.6986 7.0435 114.6080 52.0224 40.8042 177.3864 
16 T 4.0677 7.8732 114.9997 67.0623 68.7506 177.2631 
17 R 3.9316 8.0671 119.4180 60.4646 30.5755 177.9282 
18 P 4.2658 0.0000 0.0000 65.3875 30.9009 178.6357 
19 A 4.0015 8.5778 121.5522 55.2170 18.1581 179.5463 
20 A 4.0493 7.9442 119.6336 55.1010 18.1309 179.7605 
21 Q 4.0158 7.9651 115.7440 58.4227 28.2881 178.1323 
22 F 4.1284 8.6923 121.2872 61.8092 39.3486 177.1596 
23 V 4.0272 8.4435 118.5810 65.9995 31.2267 178.5363 
24 K 3.9445 7.8277 117.7576 58.7971 31.7623 178.6483

Example 2: Here is an example where only HA HN and N15 shifts are presented. The header
spacing is aligned with the columns in this case, although the alignment is not necessary. This is the minimal info format.
One of HA, CA, CO, CB must be provided.


# AA  HA     HN      N15 
1 M 4.6128 8.3509 128.1401  
2 F 5.1658 9.1754 128.0914  
3 Q 5.0880 7.8251 122.4598  
4 Q 4.6980 8.4214 119.1251  
5 E 5.1262 8.3247 122.6401  
6 V 4.5204 8.4684 123.4184  
7 T 4.9002 8.2696 119.8067

Example 3: Acceptable SHIFTY Format can include any of the following column headers
where the # sign is replaced by “NUM” or “>” or “#NUM”:


# AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557 
or

NUM AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557 
or

> AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557 
or

#NUM AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557

Output File Format

An email will be sent to the recipient's email address indicated on the email box.The contents of
the email should include the final PDB Structure if the program successfully managed to generate
a PDB Structure. Also within that email, the user can view the 3D structure of the Protein.Lastly,
statistical information is made readily available to the recipient in regards to the quality of the
structure. The Link to results page summarizes the contents of the email, which can be bookmarked
for futurerevisitations. The following is an example of an email sent after CS23D2.0 successfully
generated a structure:

Your CS23D2.0 structure prediction is complete.    
Link to PDB structure: http://busby1.cs.ualberta.ca/CS23D2.0/tmp/1203980205.pdb
Link to View Structure: http://busby1.cs.ualberta.ca/cgi-bin/CS23D/show_struct.cgi?id=1203980205    
Link to results page: http://busby1.cs.ualberta.ca/cgi-bin/GenMR/Results.cgi?dir=/usr/scratch/prion/GENMR/tmp&amp;
Input=1203980205&amp;email=peter.tang.lai@gmail.com                                                     

                                    Before optimization   After optimization   Expected  
CS23D2.0 energy                     -18.84                -33.75  
Mean chemical shift correlation     0.740                  0.742  
Torsion angles     
   #res in phi/psi core             79                     79                  72 (90%)     
   #res in phi/psi allowed          1                      1                   6 (  7%)     
   #res in phi/psi generous         0                      0                   1 (  1%)     
   #res in phi/psi disallowed       0                      0                   0 (  0%)
   #res in omega allowed            80                     80                  80 (99%)     
   #res in omega disallowed         1                      1                   1 (  1%)    
   
Final structure reliability: Good    

Mean chemical shift correlation   
   0.75 - 1.00 = High   
   0.65 - 0.75 = Good   
   0.55 - 0.65 = Moderate   
   0.00 - 0.55 = Poor