Usage

Usage of the library for output consists of first creating a hierarchy of Python objects that together describe the system, and then dumping that hierarchy to an mmCIF or BinaryCIF file.

For complete worked examples, see the ModBase example or the ligands example.

The top level of the hierarchy is the modelcif.System. All other objects are referenced from a System object (either directly or via another object that is referenced by the System).

System architecture

The architecture of the system is described with a number of classes:

  • modelcif.Entity describes each unique sequence (used in the target model, in one or more templates, or both).

  • modelcif.AsymUnit describes each asymmetric unit (chain) in the target model. For example, a homodimer would consist of two asymmetric units, both pointing to the same entity, while a heterodimer contains two entities.

  • Similarly, modelcif.Template describes a chain used as a template.

  • modelcif.Assembly groups asymmetric units, or parts of them. Assemblies are used to describe which parts of the system were modeled.

  • A variety of classes in the modelcif.alignment module can be used to describe alignments between the target and one or more templates.

Modeling protocol

modelcif.protocol.Protocol objects describe how models were generated from the input data. A protocol can consist of multiple steps, such as template search, alignment, modeling, and model selection. These objects also describe what was used as input and what was generated on output by each step, as one or more modelcif.data.Data objects.

Model coordinates

modelcif.model.Model objects give the actual coordinates of the final generated models. These point to the Assembly of what was modeled. Quality scores can also be assigned to each model (see the modelcif.qa_metric module) or to individual residues or pairs of residues.

Models can also be grouped together for any purpose using the modelcif.model.ModelGroup class.

Metadata

Metadata can also be added to the system, such as

Residue numbering

The library keeps track of several numbering schemes to reflect the reality of the data used in modeling:

  • Internal numbering. Residues are always numbered sequentially starting at 1 in an Entity. All references to residues or residue ranges in the library use this numbering.

  • Author-provided numbering. If a different numbering scheme is used by the authors, for example to correspond to the numbering of the original sequence that is modeled, this can be given as an author-provided numbering for one or more asymmetric units. See the auth_seq_id_map parameter to AsymUnit. (The mapping between author-provided and internal numbering is given in the pdbx_poly_seq_scheme table in the mmCIF file.)

Output

Once the hierarchy of classes is complete, it can be freely inspected or modified. All the classes are simple lightweight Python objects, generally with the relevant data available as member variables.

The complete hierarchy can be written out to an mmCIF or BinaryCIF file using the modelcif.dumper.write() function.

Input

Hierarchies of classes can also be read from mmCIF or BinaryCIF files. This is done using the modelcif.reader.read() function, which returns a list of modelcif.System objects.

Format conversion

The library can be employed to easily convert a ModelCIF file between mmCIF and BinaryCIF format by simply reading in one format and then writing in another. See the convert_bcif example.

Conversion from legacy PDB format to mmCIF or BinaryCIF is not generally possible because PDB format has no defined standard for including information about modeling protocols, alignments, and so on. This extra information must be deduced from other sources, for example custom PDB REMARK records or separate files, and provided to the library. For reference, a script that uses the library to convert ModBase models from PDB format to mmCIF can be seen here.

Validation

The library is designed to generate files that are consistent with the PDBx and ModelCIF dictionaries by construction. However, the library can also be used to validate ModelCIF (or other mmCIF/BinaryCIF files) if desired; see the validator example.