proteinnetpy.parser

Parse ProteinNet files

fetch_record(record_id, path)

Retrieve a record from a ProteinNet file

Fetch a specific record from a ProteinNet file by its ID.

Parameters:
  • record_id (str) – ID of the record to fetch.

  • path (str) – Path to a ProteinNet file to search in.

Returns:

Record with the given ID, if found, otherwise None.

Return type:

ProteinNetRecord or None

record_parser(path, max_len=inf, excluded_fields=None, normalise_angles=False, profiles=None)

Parse records from a ProteinNet file.

Generator yielding records from a ProteinNet file in record.ProteinNetRecord form. It also allows some minimal manipulation of records before creating a data.ProteinNetDataset, although most filtering is assumed to be done at the dataset stage. In general, this function does not need to be used directly by end users, since the data.ProteinNetDataset class manages loading from files and allows keywords to be passed to the parser.

Parameters:
  • path (str) – Path to ProteinNet file

  • max_len (int) – Max length of sequences to keep

  • excluded_fields (list) – Fields to exclude from the output record, reducing memory usage if only specific data is required. You cannot exclude id or primary.

  • normalise_angles (bool) – Normalise backbone/chiral angles to be -1 to 1 rather than -pi to pi

  • profiles (function) – Function returning positional profiles for a given record or None if profiles are not available for that record. These profiles can be any additional data associated with positions in a protein, for example the output of a protein language model or surface accessibility data. Few expectations are placed on them in downstream code and equally they are rarely used.

Yields:

ProteinNetRecord – Records parsed and processed from the input ProteinNet file.