The modelcif Python module

class modelcif.System(title=None, id='model', database=None, model_details=None)[source]

Top-level class representing a complete modeled system.

Parameters:
  • title (str) – Longer text description of the system.

  • id (str) – Unique identifier for this system in the mmCIF file.

  • database (Database) – If this system is part of an official database (e.g. SwissModel, ModBase), details of the database identifiers.

  • model_details (str) – Detailed description of the system, like an abstract.

The system contains a number of simple flat lists of various objects, for example alignments. After constructing objects they should usually be added to these lists so that a hierarchy of classes is formed and is ultimately written out to mmCIF/BinaryCIF. After reading a file the resulting System object will also populate these lists.

Most objects do not need to be explicitly added to the system since they are referenced by other objects. For example Template objects are not usually added to the system because they are added to alignments which in turn are added to the system. If however an “orphan” Template is desired (not part of an alignment) the system does maintain an appropriate list (System.templates in this case) to which it can be added.

alignments

All modeling alignments. See modelcif.alignment.

authors

List of all authors of this system, as a list of strings (last name followed by initials, e.g. “Smith AJ”). When writing out a file, if this list is empty, all authors from the first citation (see citations and ihm.Citation) are used instead.

citations

List of all citations. By convention the first citation describes the system itself. See ihm.Citation.

comments

List of plain text comments. These will be added to the top of the mmCIF file.

grants

List of all grants that supported this work. See ihm.Grant.

model_groups

All groups of models. See ModelGroup.

protocols

All modeling protocols. See Protocol.

repositories

Any additional files with extra data about this system. See modelcif.associated.Repository.

class modelcif.Database(id, code)[source]

Information about a System that is part of an official database.

If a System is part of an official database (e.g. SwissModel, ModBase), this class contains details of the database identifiers. It should be passed to the System constructor.

Parameters:
  • id (str) – Abbreviated name of the database (e.g. PDB)

  • code (str) – Identifier from the database (e.g. 1abc)

class modelcif.Software(name, classification, description, location, type='program', version=None, citation=None)[source]

Software used as part of the modeling protocol.

Parameters:
  • name (str) – The name of the software.

  • classification (str) – The major function of the software, for example ‘model building’, ‘sample preparation’, ‘data collection’.

  • description (str) – A longer text description of the software.

  • location (str) – Place where the software can be found (e.g. URL).

  • type (str) – Type of software (program/package/library/other).

  • version (str) – The version used.

  • citation (ihm.Citation) – Publication describing the software.

Generally these objects are added to groups (see SoftwareGroup) which can then be used to describe the software used in various parts of the modeling (Software objects can also be used any place SoftwareGroup are accepted, in which case they will act as if a group containing only a single member was used).

See also System.software.

class modelcif.SoftwareGroup(elements=(), parameters=None)[source]

A number of Software and/or SoftwareWithParameters objects that are grouped together.

This class can be used to group together multiple Software objects if multiple pieces of software were used together to generate a single alignment (see modelcif.alignment.AlignmentMode), to run a modeling step (see modelcif.protocol.Step), or to calculate a model quality score (see modelcif.qa_metric). It behaves like a regular Python list.

SoftwareWithParameters allows including both a piece of software, and the parameters with which it was used, in the group.

Parameters:

elements (sequence) – Initial set of Software and/or SoftwareWithParameters objects.

class modelcif.SoftwareWithParameters(software, parameters=None)[source]

A piece of software and the parameters with which it was used.

See SoftwareGroup.

Parameters:
class modelcif.SoftwareParameter(name, value, description=None)[source]

A single parameter given to software used in modeling.

See SoftwareWithParameters, SoftwareGroup.

Parameters:
  • name (str) – A short name for this parameter.

  • value (int, float, str, bool, list of int, or list of float.) – Parameter value.

  • description (str) – A longer description of the parameter.

class modelcif.Entity(sequence, alphabet=<class 'ihm.LPeptideAlphabet'>, description=None, details=None, source=None, references=[])[source]

Represent a unique molecular sequence.

This can be used both for template sequences (in which case the Entity is then used in a Template object) or for target (model) sequences (where it is used in a AsymUnit object).

(Note that template sequence Entity objects are not written out to the entity, entity_poly etc. tables in the mmCIF/BinaryCIF file by default. Instead, sequence information is captured in template-specific categories.)

Parameters:
  • sequence (sequence) – The primary sequence, as a sequence of ihm.ChemComp objects, and/or codes looked up in alphabet. See ihm.Entity for examples.

  • alphabet (ihm.Alphabet) – The mapping from code to chemical components to use (it is not necessary to instantiate this class).

  • description (str) – A short text name for the sequence.

  • details (str) – Longer text describing the sequence.

  • source (ihm.source.Source) – The method by which the sample for this entity was produced.

  • references (sequence of reference.TargetReference objects) – For a target (model) sequence, information about this entity stored in external databases (for example the sequence in UniProt). For references to structure databases for templates, see Template instead.

See ihm.Entity for more information.

branch_descriptors

String descriptors of branched chemical structure. These generally only make sense for oligosaccharide entities, and should be a list of BranchDescriptor objects.

Any links between components in a branched entity. This is a list of BranchLink objects.

property formula_weight

Formula weight (dalton). This is calculated automatically from that of the chemical components.

is_branched()[source]

Return True iff this entity is branched (generally an oligosaccharide)

is_polymeric()[source]

Return True iff this entity represents a polymer, such as an amino acid sequence or DNA/RNA chain (and not a ligand or water)

residue(seq_id)[source]

Get a Residue at the given sequence position

property seq_id_range

Sequence range

class modelcif.AsymUnit(entity, details=None, auth_seq_id_map=0, id=None, strand_id=None, orig_auth_seq_id_map=None)[source]

An asymmetric unit, i.e. a unique instance of an Entity that was modeled.

Note that this class should not be used to describe crystal waters; for that, see WaterAsymUnit.

Parameters:
  • entity (Entity) – The unique sequence of this asymmetric unit.

  • details (str) – Longer text description of this unit.

  • auth_seq_id_map – Mapping from internal 1-based consecutive residue numbering (seq_id) to PDB “author-provided” numbering (auth_seq_id plus an optional ins_code). This can be either be an int offset, in which case auth_seq_id = seq_id + auth_seq_id_map with no insertion codes, or a mapping type (dict, list, tuple) in which case auth_seq_id = auth_seq_id_map[seq_id] with no insertion codes, or auth_seq_id, ins_code = auth_seq_id_map[seq_id] - i.e. the output of the mapping is either the author-provided number, or a 2-element tuple containing that number and an insertion code. (Note that if a list or tuple is used for the mapping, the first element in the list or tuple does not correspond to the first residue and will never be used - since seq_id can never be zero.) The default if not specified, or not in the mapping, is for auth_seq_id == seq_id and for no insertion codes to be used.

  • id (str) – User-specified ID (usually a string of one or more upper-case letters, e.g. A, B, C, AA). If not specified, IDs are automatically assigned alphabetically.

  • strand_id (str) – PDB or “author-provided” strand/chain ID. If not specified, it will be the same as the regular ID.

  • orig_auth_seq_id_map – Mapping from internal 1-based consecutive residue numbering (seq_id) to original “author-provided” numbering. This differs from auth_seq_id_map as the original numbering need not follow any defined scheme, while auth_seq_id_map must follow certain PDB-defined rules. This can be any mapping type (dict, list, tuple) in which case orig_auth_seq_id = orig_auth_seq_id_map[seq_id]. If the mapping is None (the default), or a given seq_id cannot be found in the mapping, orig_auth_seq_id = auth_seq_id. This mapping is only used in the various scheme tables, such as pdbx_poly_seq_scheme.

See System.asym_units.

num_map

For branched entities read from files, mapping from provisional to final internal numbering (seq_id), or None if no mapping is necessary. See ihm.model.Model.add_atom().

residue(seq_id)[source]

Get a Residue at the given sequence position

segment(gapped_sequence, seq_id_begin, seq_id_end)[source]

Get an object representing the alignment of part of this sequence.

Parameters:
  • gapped_sequence (str) – Sequence of the segment, including gaps.

  • seq_id_begin (int) – Start of the segment.

  • seq_id_end (int) – End of the segment.

property seq_id_range

Sequence range

property sequence

Primary sequence

property strand_id

PDB or author-provided strand/chain ID

class modelcif.NonPolymerFromTemplate(template, explicit, details=None, auth_seq_id_map=0, id=None, strand_id=None)[source]

A non-polymer (e.g. ligand) in the model that is modeled from a non-polymer template.

These objects act just like AsymUnit and should be added to Assembly.

To represent a non-polymer that is modeled without a template, just use a regular AsymUnit.

Parameters:
  • template (Template) – The non-polymer template used to model this non-polymer.

  • explicit (bool) – True iff the conformation of the template is allowed to change (e.g. bond relaxation, flexible fitting) during the modeling, or False if the template is treated as a single rigid body.

For the other parameters, see AsymUnit.

class modelcif.Residue(seq_id, entity=None, asym=None)[source]

A single residue in an entity or asymmetric unit. Usually these objects are created by calling Entity.residue() or AsymUnit.residue().

atom(atom_id)[source]

Get a Atom in this residue with the given name.

property auth_seq_id

Author-provided seq_id; only makes sense for asymmetric units

property comp

Chemical component (residue type)

property ins_code

Insertion code; only makes sense for asymmetric units

class modelcif.Assembly(elements=(), name=None, description=None)[source]

A collection of parts of the system that were modeled together.

Parameters:
  • elements (sequence) – Initial set of parts of the system.

  • name (str) – Short text name of this assembly.

  • description (str) – Longer text that describes this assembly.

This is implemented as a simple list of asymmetric units (or parts of them), i.e. a list of AsymUnit and/or AsymUnitRange objects. An Assembly is typically passed to the modelcif.model.Model constructor.

Note that the ModelCIF dictionary has deprecated the corresponding ma_struct_assembly category, so any name or description of the assembly will not be written to the mmCIF file. The ModelCIF dictionary requires that all models have the same composition.

parent = None

Assembly that is the immediate parent in a hierarchy, or None

class modelcif.AsymUnitRange(asym, seq_id_begin, seq_id_end)[source]

Part of an asymmetric unit. Usually these objects are created from an AsymUnit, e.g. to get a range covering residues 4 through 7 in asym use:

asym = ihm.AsymUnit(entity)
rng = asym(4,7)
class modelcif.Transformation(rot_matrix, tr_vector)[source]

Rotation and translation applied to an object.

These objects are generally used to record the transformation that was applied to a Template to generate the starting structure used in modeling.

Parameters:
  • rot_matrix – Rotation matrix (as a 3x3 array of floats) that places the object in its final position.

  • tr_vector – Translation vector (as a 3-element float list) that places the object in its final position.

class modelcif.TemplateSegment(template, gapped_sequence, seq_id_begin, seq_id_end)[source]

An aligned part of a template (see modelcif.alignment.Pair).

Usually these objects are created from a Template using Template.segment(), e.g. to get a segment covering residues 1 through 3 in tmpl use:

tmpl = modelcif.Template(entity, ...)
seg = tmpl.segment('--ACG', 1, 3)
class modelcif.Template(entity, asym_id, model_num, transformation, name=None, references=[], strand_id=None, entity_id=None)[source]

A single chain that was used as a template structure for modeling.

After creating a polymer template, use segment() to denote the part of its sequence used in any modeling alignments (see modelcif.alignment.Pair).

Non-polymer templates do not have alignments, and should instead be passed to one or more NonPolymerFromTemplate objects.

Template objects can also be used as inputs or outputs in modeling protocol steps; see modelcif.protocol.Step.

Parameters:
  • entity (Entity) – The sequence of the chain.

  • asym_id (str) – The asym or chain ID in the template structure.

  • model_num (int) – The model number of the template structure.

  • transformation (Transformation) – Rotation and translation applied to the original template structure to get the starting model used in modeling.

  • name (str) – A short name for this template.

  • references (list of modelcif.reference.TemplateReference objects) – A list of pointers to reference databases (such as PDB) from which the template structure was taken.

  • strand_id (str) – PDB or “author-provided” strand/chain ID. If not specified, it will be the same as the regular asym_id.

  • entity_id (str) – If known, the ID of the entity for this template in its own mmCIF file.

segment(gapped_sequence, seq_id_begin, seq_id_end)[source]

Get an object representing the alignment of part of this sequence.

Parameters:
  • gapped_sequence (str) – Sequence of the segment, including gaps.

  • seq_id_begin (int) – Start of the segment.

  • seq_id_end (int) – End of the segment.

property seq_id_range

Sequence range

property strand_id

PDB or author-provided strand/chain ID

class modelcif.ReferenceDatabase(name, url, version=None, release_date=None)[source]

A reference database used in the modeling. This is typically a sequence database used for template search, alignments, etc. These objects are passed as input or output to modelcif.protocol.Step. See also modelcif.data.Data for more details.

Compare with modelcif.reference.TargetReference, which pertains to just the modeled sequence itself; this class describes multiple sequences.

Parameters:
  • name (str) – Name of the database.

  • url (str) – Location of the database.

  • version (str) – Version of the database.

  • release_date (datetime.date) – Release date of the specified version.