pyani.fastani module¶

Code to implement the fastANI average nucleotide identity method.

class pyani.fastani.ComparisonResult(reference, query, ani, matches, fragments)[source]¶

Bases: tuple

ani¶: Alias for field number 2

fragments¶: Alias for field number 4

matches¶: Alias for field number 3

query¶: Alias for field number 1

reference¶: Alias for field number 0

exception pyani.fastani.PyaniFastANIException[source]¶

Bases: pyani.PyaniException

Exception raised when there is a problem with fastANI

pyani.fastani.construct_fastani_cmdline(query: pathlib.Path, ref: pathlib.Path, outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2) → str[source]¶

Will return a fastcmd item

Parameters:	query – path to query file ref – path to reference file outdir – path to output directory fastani_exe – path to fastANI executable fragLen – fragment length to use kmerSize – kmer size to use minFraction – minimum portion of the genomes that must match to trust ANI

pyani.fastani.generate_fastani_commands(filenames: List[pathlib.Path], outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2) → List[str][source]¶

Return list of fastANI command lines.

Parameters:	filenames – a list of paths to input FASTA files outdir – path to output directory fastani_exe – location of the fastANI binary fragLen – fragment length to use kmerSize – kmer size to use minFraction – minimum portion of the genomes that must match to trust ANI

Loop over all FASTA files generating fastANI command lines for each pairwise comparison.

pyani.fastani.generate_fastani_jobs(filenames: List[pathlib.Path], outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2, jobprefix: str = 'fastANI')[source]¶

Return list of Jobs describing fastANI command lines.

Parameters:	filenames – a list of paths to input FASTA files outdir – path to output directory fastani_exe – location of the fastANI binary fragLen – fragment length to use kmerSize – kmer size to use minFraction – minimum portion of the genomes that must match to trust ANI jobprefix –

Loop over all FASTA files, generating Jobs describing fastANI command lines for each pairwise comparison.

pyani.fastani.get_version(fastani_exe: pathlib.Path = PosixPath('fastANI')) → str[source]¶

Return FastANI package version as a string.

Parameters:	fastani_exe – path to FastANI executable

We expect fastANI to return a string on STDOUT as

$ ./fastANI -v
version 1.32

we concatenate this with the OS name.

The following circumstances are explicitly reported as strings:

no executable at passed path
non-executable file at passed path (this includes cases where the user doesn’t have execute permissions on the file)
no version info returned

pyani.fastani.parse_fastani_file(filename: pathlib.Path) → pyani.fastani.ComparisonResult[source]¶

Return (ref genome, query genome, ANI estimate, orthologous matches, sequence fragments) tuple.

Parameters:	filename – Path, path to the input file

Extracts the ANI estimate, the number of orthologous matches, and the number of sequence fragments considered from the fastANI output file.

We assume that all fastANI comparisons are pairwise: one query and one reference file. The fastANI file should contain a single line.

fsatANI can produce multi-line output, if a list of query/reference files is given to it.

pyani.fastani.process_files(outdir: pathlib.Path, org_lengths: Dict[KT, VT]) → pyani.pyani_tools.ANIResults[source]¶

Return tuple of fastANI results for files in passed directory.

Parameters:	outdir – Path, path to the directory containing output files org_lengths – dictionary of total sequence lengths, keyed by sequence

Returns the following pandas dataframes in an ANIResults object; query sequences are rows, reference sequences are columns:

alignment_lengths - asymmetrical: total length of alignment
percentage_identity - asymmetrical: percentage identity of alignment
alignment_coverage - asymmetrical: coverage of query and reference
similarity_errors - asymmetrical: count of similarity errors

May throw a ZeroDivisionError if one or more fastANI runs failed, or a very distant sequence was included in the analysis.