pyani.tetra module¶
Code to implement the TETRA average nucleotide identity method.
Provides functions for calculation of TETRA as described in:
Richter M, Rossello-Mora R (2009) Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci USA 106: 19126-19131. doi:10.1073/pnas.0906412106.
and
Teeling et al. (2004) Application of tetranucleotide frequencies for the assignment of genomic fragments. Env. Microbiol. 6(9): 938-947. doi:10.1111/j.1462-2920.2004.00624.x
-
pyani.tetra.
calculate_correlations
(tetra_z: Dict[str, Dict[str, float]]) → pandas.core.frame.DataFrame[source]¶ Return dataframe of Pearson correlation coefficients.
Parameters: tetra_z – dict, Z-scores, keyed by sequence ID Calculates Pearson correlation coefficient from Z scores for each tetranucleotide. This is done longhand here, which is fast enough, but for robustness we might want to do something else… (TODO).
Note that we report a correlation by this method, rather than a percentage identity.
-
pyani.tetra.
calculate_tetra_zscore
(filename: pathlib.Path) → Dict[str, float][source]¶ Return TETRA Z-score for the sequence in the passed file.
Parameters: filename – path to sequence file Calculates mono-, di-, tri- and tetranucleotide frequencies for each sequence, on each strand, and follows Teeling et al. (2004) in calculating a corresponding Z-score for each observed tetranucleotide frequency, dependent on the mono-, di- and tri- nucleotide frequencies for that input sequence.