The modelcif.reference Python module

Classes for linking back to a sequence or structure database.

class modelcif.reference.TargetReference(code, accession, align_begin=None, align_end=None, isoform=None, ncbi_taxonomy_id=None, organism_scientific=None, sequence_version_date=None, sequence_crc64=None, sequence=None, details=None)[source]

Point to the sequence of a target modelcif.Entity in a sequence database. Typically a subclass such as UniProt is used, although to use a custom database, make a new subclass and provide a docstring to describe the database, e.g.:

class CustomRef(TargetReference):
    "my custom database"

Compare with modelcif.ReferenceDatabase, which describes multiple sequences used in template searches or alignment construction; this class relates to just the modeled sequence itself.

See also alignments to describe the correspondence between the database and entity sequences.

Parameters:
  • code (str) – The name of the sequence in the database.

  • accession (str) – The database accession.

  • align_begin (int) – Beginning index of the sequence in the database. Deprecated; use alignments instead.

  • align_end (int) – Ending index of the sequence in the database. Deprecated; use alignments instead.

  • isoform (str) – Sequence isoform, if applicable.

  • ncbi_taxonomy_id (str) – Taxonomy identifier provided by NCBI.

  • organism_scientific (str) – Scientific name of the organism.

  • sequence_version_date (datetime.date or datetime.datetime) – Versioning date, e.g. for UniProtKB sequences this is usually the date of last modification from the DT line of an entry.

  • sequence_crc64 (str) – The CRC64 sum of the original database sequence.

  • sequence (str) – The complete database sequence, as a string of one-letter codes. If omitted, will default to the canonical sequence of the associated Entity.

  • details (str) – Longer text describing the sequence.

alignments

All alignments between the reference and entity sequences, as Alignment objects. If none are provided, a simple 1:1 alignment is assumed.

property other_details

More information about a custom reference type. By default it is the first line of the docstring.

class modelcif.reference.UniProt(code, accession, align_begin=None, align_end=None, isoform=None, ncbi_taxonomy_id=None, organism_scientific=None, sequence_version_date=None, sequence_crc64=None, sequence=None, details=None)[source]

Point to the sequence of an modelcif.Entity in UniProt.

These objects are typically passed to the modelcif.Entity constructor for target sequences (for templates, see TemplateReference).

See TargetReference for a description of the parameters.

class modelcif.reference.Alignment(db_begin=1, db_end=None, entity_begin=1, entity_end=None, seq_dif=[])[source]

A sequence range that aligns between the database and the entity. This describes part of the sequence in the sequence database (Sequence) and in the ihm.Entity. The two ranges must be the same length and have the same primary sequence (any differences must be described with SeqDif objects).

Parameters:
  • db_begin (int) – The first residue in the database sequence that is used (defaults to the entire sequence).

  • db_end (int) – The last residue in the database sequence that is used (or None, the default, to use the entire sequence).

  • entity_begin (int) – The first residue in the Entity sequence that is taken from the reference (defaults to the entire entity sequence).

  • entity_end (int) – The last residue in the Entity sequence that is taken from the reference (or None, the default, to use the entire sequence).

  • seq_dif (Sequence of SeqDif objects.) – Single-point mutations made to the sequence.

class modelcif.reference.SeqDif(seq_id, db_monomer, monomer, details=None)[source]

Annotate a sequence difference between a reference and entity sequence. See Alignment.

Parameters:
  • seq_id (int) – The residue index in the entity sequence.

  • db_monomer (ihm.ChemComp) – The monomer type (as a ChemComp object) in the reference sequence.

  • monomer (ihm.ChemComp) – The monomer type (as a ChemComp object) in the entity sequence.

  • details (str) – Descriptive text for the sequence difference.

class modelcif.reference.TemplateReference(accession, db_version_date=None)[source]

Point to the structure of a modelcif.Template in a structure database.

These objects are typically passed to the modelcif.Template constructor for template sequences (for target sequences, see TargetReference).

Typically a subclass such as PDB is used, although to use a custom database, make a new subclass and provide a docstring to describe the database, e.g.:

class CustomRef(TemplateReference):
    "my custom database"
Parameters:
  • accession (str) – The database accession.

  • db_version_date (datetime.date or datetime.datetime) – Versioning date, e.g. for PDB entries this is usually the value of _pdbx_audit_revision_history.revision_date.

property other_details

More information about a custom reference type. By default it is the first line of the docstring.

class modelcif.reference.PDB(accession, db_version_date=None)[source]

Point to the structure of a modelcif.Template in PDB.

These objects are typically passed to the modelcif.Template constructor.

See TemplateReference for a description of the parameters.

class modelcif.reference.AlphaFoldDB(accession, db_version_date=None)[source]

Point to the structure of a modelcif.Template in AlphaFold DB.

These objects are typically passed to the modelcif.Template constructor.

See TemplateReference for a description of the parameters.

class modelcif.reference.PubChem(accession, db_version_date=None)[source]

Point to the structure of a modelcif.Template in PubChem.

These objects are typically passed to the modelcif.Template constructor.

See TemplateReference for a description of the parameters. Use the PubChem CID as the accession code.