Usage¶
Usage of the library for output consists of first creating a hierarchy of Python objects that together describe the system, and then dumping that hierarchy to an mmCIF or BinaryCIF file.
For complete worked examples, see the ModBase example or the ligands example.
The top level of the hierarchy is the modelcif.System
. All other
objects are referenced from a System object (either directly or via another
object that is referenced by the System).
System architecture¶
The architecture of the system is described with a number of classes:
modelcif.Entity
describes each unique sequence (used in the target model, in one or more templates, or both).
modelcif.AsymUnit
describes each asymmetric unit (chain) in the target model. For example, a homodimer would consist of two asymmetric units, both pointing to the same entity, while a heterodimer contains two entities.Similarly,
modelcif.Template
describes a chain used as a template.
modelcif.Assembly
groups asymmetric units, or parts of them. Assemblies are used to describe which parts of the system were modeled.A variety of classes in the
modelcif.alignment
module can be used to describe alignments between the target and one or more templates.
Modeling protocol¶
modelcif.protocol.Protocol
objects describe how models were generated
from the input data. A protocol can consist of
multiple steps
, such as template search,
alignment, modeling, and model selection. These objects also describe what
was used as input and what was generated on output by each step, as one or more
modelcif.data.Data
objects.
Model coordinates¶
modelcif.model.Model
objects give the actual coordinates of the final
generated models. These point to the Assembly
of what was
modeled. Quality scores can also be assigned to each model (see the
modelcif.qa_metric
module) or to individual residues or pairs
of residues.
Models can also be grouped together for any purpose using the
modelcif.model.ModelGroup
class.
Metadata¶
Metadata can also be added to the system, such as
modelcif.System.citations
: publication(s) that describe this modeling or the methods used in it.
modelcif.Software
: software packages used at any stage in the modeling.
modelcif.System.grants
: funding support for the modeling.
modelcif.reference.TemplateReference
: ormodelcif.reference.TargetReference
: information on a template structure, or a target sequence.
Residue numbering¶
The library keeps track of several numbering schemes to reflect the reality of the data used in modeling:
Internal numbering. Residues are always numbered sequentially starting at 1 in an
Entity
. All references to residues or residue ranges in the library use this numbering.Author-provided numbering. If a different numbering scheme is used by the authors, for example to correspond to the numbering of the original sequence that is modeled, this can be given as an author-provided numbering for one or more asymmetric units. See the
auth_seq_id_map
parameter toAsymUnit
. (The mapping between author-provided and internal numbering is given in thepdbx_poly_seq_scheme
table in the mmCIF file.)
Output¶
Once the hierarchy of classes is complete, it can be freely inspected or modified. All the classes are simple lightweight Python objects, generally with the relevant data available as member variables.
The complete hierarchy can be written out to an mmCIF or BinaryCIF file using
the modelcif.dumper.write()
function.
Input¶
Hierarchies of classes can also be read from mmCIF or BinaryCIF files.
This is done using the modelcif.reader.read()
function, which returns
a list of modelcif.System
objects.
Format conversion¶
The library can be employed to easily convert a ModelCIF file between mmCIF and BinaryCIF format by simply reading in one format and then writing in another. See the convert_bcif example.
Conversion from legacy PDB format to mmCIF or BinaryCIF is not generally possible because PDB format has no defined standard for including information about modeling protocols, alignments, and so on. This extra information must be deduced from other sources, for example custom PDB REMARK records or separate files, and provided to the library. For reference, a script that uses the library to convert ModBase models from PDB format to mmCIF can be seen here.
Validation¶
The library is designed to generate files that are consistent with the PDBx and ModelCIF dictionaries by construction. However, the library can also be used to validate ModelCIF (or other mmCIF/BinaryCIF files) if desired; see the validator example.