pyani.fastani module¶
Code to implement the fastANI average nucleotide identity method.
-
class
pyani.fastani.
ComparisonResult
(reference, query, ani, matches, fragments)[source]¶ Bases:
tuple
-
ani
¶ Alias for field number 2
-
fragments
¶ Alias for field number 4
-
matches
¶ Alias for field number 3
-
query
¶ Alias for field number 1
-
reference
¶ Alias for field number 0
-
-
exception
pyani.fastani.
PyaniFastANIException
[source]¶ Bases:
pyani.PyaniException
Exception raised when there is a problem with fastANI
-
pyani.fastani.
construct_fastani_cmdline
(query: pathlib.Path, ref: pathlib.Path, outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2) → str[source]¶ Will return a fastcmd item
Parameters: - query – path to query file
- ref – path to reference file
- outdir – path to output directory
- fastani_exe – path to fastANI executable
- fragLen – fragment length to use
- kmerSize – kmer size to use
- minFraction – minimum portion of the genomes that must match to trust ANI
-
pyani.fastani.
generate_fastani_commands
(filenames: List[pathlib.Path], outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2) → List[str][source]¶ Return list of fastANI command lines.
Parameters: - filenames – a list of paths to input FASTA files
- outdir – path to output directory
- fastani_exe – location of the fastANI binary
- fragLen – fragment length to use
- kmerSize – kmer size to use
- minFraction – minimum portion of the genomes that must match to trust ANI
Loop over all FASTA files generating fastANI command lines for each pairwise comparison.
-
pyani.fastani.
generate_fastani_jobs
(filenames: List[pathlib.Path], outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2, jobprefix: str = 'fastANI')[source]¶ Return list of Jobs describing fastANI command lines.
Parameters: - filenames – a list of paths to input FASTA files
- outdir – path to output directory
- fastani_exe – location of the fastANI binary
- fragLen – fragment length to use
- kmerSize – kmer size to use
- minFraction – minimum portion of the genomes that must match to trust ANI
- jobprefix –
Loop over all FASTA files, generating Jobs describing fastANI command lines for each pairwise comparison.
-
pyani.fastani.
get_version
(fastani_exe: pathlib.Path = PosixPath('fastANI')) → str[source]¶ Return FastANI package version as a string.
Parameters: fastani_exe – path to FastANI executable We expect fastANI to return a string on STDOUT as
$ ./fastANI -v version 1.32
we concatenate this with the OS name.
The following circumstances are explicitly reported as strings:
- no executable at passed path
- non-executable file at passed path (this includes cases where the user doesn’t have execute permissions on the file)
- no version info returned
-
pyani.fastani.
parse_fastani_file
(filename: pathlib.Path) → pyani.fastani.ComparisonResult[source]¶ Return (ref genome, query genome, ANI estimate, orthologous matches, sequence fragments) tuple.
Parameters: filename – Path, path to the input file Extracts the ANI estimate, the number of orthologous matches, and the number of sequence fragments considered from the fastANI output file.
We assume that all fastANI comparisons are pairwise: one query and one reference file. The fastANI file should contain a single line.
fsatANI can produce multi-line output, if a list of query/reference files is given to it.
-
pyani.fastani.
process_files
(outdir: pathlib.Path, org_lengths: Dict[KT, VT]) → pyani.pyani_tools.ANIResults[source]¶ Return tuple of fastANI results for files in passed directory.
Parameters: - outdir – Path, path to the directory containing output files
- org_lengths – dictionary of total sequence lengths, keyed by sequence
Returns the following pandas dataframes in an ANIResults object; query sequences are rows, reference sequences are columns:
- alignment_lengths - asymmetrical: total length of alignment
- percentage_identity - asymmetrical: percentage identity of alignment
- alignment_coverage - asymmetrical: coverage of query and reference
- similarity_errors - asymmetrical: count of similarity errors
May throw a ZeroDivisionError if one or more fastANI runs failed, or a very distant sequence was included in the analysis.