proteinnetpy.mutation
Module containing functions for mutating ProteinNetRecords and feeding that data into further computations (e.g. Tensorflow). These functions are fairly specific so may often be better used as inspiration to build users own solutions.
- class ProteinNetMutator(mutator, per_position=False, include=('wt',), weights=(0, 1, 1), encoding=None, **kwargs)
Bases:
LabeledFunctionMap function generating mutated records.
Apply a mutator function to a ProteinNet record and return the mutated sequence. This is a LabeledFunction that can be used to generate a TensorFlow Dataset. This setup is fairly specific to your downstream model design, so it will often be more useful to use it as a base to create an alternate implementation.
- Returns are in the form:
([wt_seq], mut_seq, [phi, psi, chi1]), label, [weights]
- wildtype
Outputs wildtype as well as mutant sequence.
- Type:
bool
- phi
Outputs Phi backbone angles.
- Type:
bool
- psi
Outputs Psi backbone angles.
- Type:
bool
- chi
Outputs rotamer angles.
- Type:
bool
- mutator
Mutator function taking a ProteinNetRecord and returning the sampled variants and their deleteriousness. The return format depends on per_position. If per_position=False must return a tuple with the mutated sequence index array and whether it is deleterious (1/0). If per_position=True must return a tuple with mutant_seq, deleterious_inds, neutral_inds arrays.
- Type:
function
- kwargs
Keyword arguments passed to the mutator function.
- Type:
dict
- encoding
Encoding mapping alphabetically encoded integer indeces to a new scheme.
- Type:
dict
- weights
List of float weights for WT, Deleterious and Neutral variants when mutating per position.
- Type:
list
- func
Function applied when the class is called. This is a mutator applied to the whole sequence or per position derived from the initialisation parameters.
- Type:
function
- output_shapes, output_types
Tuple of output shapes and types (see data.LabeledFunction for details)
- Type:
tuple
- per_position_mutator(record, max_deleterious=2, max_neutral=4, max_deleterious_freq=0.01, min_neutral_freq=0.1)
Generate mutated sequences from ProteinNetRecords with labels identifying deleterious and neutral mutations.
Generate mutated sequences from ProteinNetRecords with labels identifying where deleterious and neutral mutations have been made. Will always generate at least one variant.
- Parameters:
record (ProteinNetRecord) – Record to mutate.
max_deleterious (int) – Maximum number of deleterious variants to make.
max_neutral (int) – Maximum number of neutral variants to make.
max_deleterious_freq (float) – Maximum MSA frequency for a variant to be considered deleterious.
min_neutral_freq (float) – Minimum MSA frequency for a variant to be considered neutral.
- Returns:
Tuple of the format seq, deleterious, neutral. The first entry is the mutated sequence, the second a list of positions with deleterious variants and the third a list of positions with neutral variants.
- Return type:
tuple
- sample_deleterious(num, pssm, wt_seq, max_freq=0.025, mask=None)
Sample deleterious mutations from a MSA frequency matrix.
Randomly choose a selection of deleterious variants from a MSA frequency matrix.
- Parameters:
num (int) – Number of mutations to make.
pssm (float ndarray (20, N)) – MSA frequency matrix to determine neutral and deleterious variants.
wt_seq (int ndarray (N,)) – WT sequence of the protein (as int indeces corresponding to the MSA matrix rows).
max_freq (float) – Maximum frequency considered deleterious.
mask (int array_like) – Array of positions not to mutate.
- Returns:
Numpy array of position indeces chosen and an array of the alternate amino acid in each position (as MSA row indeces).
- Return type:
tuple
- sample_neutral(num, pssm, wt_seq, min_freq=0.025, mask=None)
Sample deleterious mutations froma pssm
- Parameters:
num (int) – Number of mutations to make.
pssm (float ndarray (20, N)) – MSA frequency matrix to determine neutral and deleterious variants.
wt_seq (int ndarray (N,)) – WT sequence of the protein (as int indeces corresponding to the MSA matrix rows).
min_freq (float) – Minimum frequency considered neutral.
mask (int array_like) – Array of positions not to mutate.
- Returns:
Numpy array of position indeces chosen and an array of the alternate amino acid in each position (as MSA row indeces).
- Return type:
tuple
- sequence_mutator(record, p_deleterious=0.5, max_mutations=3, max_deleterious=0.01, min_neutral=0.1)
Generate mutated sequences from a ProteinNetRecord with a few deleterious or neutral variants.
Generate mutated sequences from a ProteinNetRecord with a few deleterious and/or neutral variants. First randomly choose to generate a deleterious or neutral sequence then sample some of the corresponding variant types based on the records MSA frequencies.
- Parameters:
record (ProteinNetRecord) – Record to mutate.
p_deleterious (float) – Probability of returning a deleterious set of variants.
max_mutations (int) – Maximum number of mutations to make.
max_deleterious (float) – Maximum MSA frequency for a variant to be considered deleterious.
min_neutral (float) – Minimum MSA frequency for a variant to be considered neutral.
- Returns:
Tuple of the format (seq, deleterious). The first entry is the mutated amino acid sequence, encoded with integer indeces and the second is 1 if the sequence is deleterious and 0 if neutral.
- Return type:
tuple