pyani.fastani module

Code to implement the fastANI average nucleotide identity method.

class pyani.fastani.ComparisonResult(reference, query, ani, matches, fragments)[source]

Bases: tuple

ani

Alias for field number 2

fragments

Alias for field number 4

matches

Alias for field number 3

query

Alias for field number 1

reference

Alias for field number 0

exception pyani.fastani.PyaniFastANIException[source]

Bases: pyani.PyaniException

Exception raised when there is a problem with fastANI

pyani.fastani.construct_fastani_cmdline(query: pathlib.Path, ref: pathlib.Path, outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2) → str[source]

Will return a fastcmd item

Parameters:
  • query – path to query file
  • ref – path to reference file
  • outdir – path to output directory
  • fastani_exe – path to fastANI executable
  • fragLen – fragment length to use
  • kmerSize – kmer size to use
  • minFraction – minimum portion of the genomes that must match to trust ANI
pyani.fastani.generate_fastani_commands(filenames: List[pathlib.Path], outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2) → List[str][source]

Return list of fastANI command lines.

Parameters:
  • filenames – a list of paths to input FASTA files
  • outdir – path to output directory
  • fastani_exe – location of the fastANI binary
  • fragLen – fragment length to use
  • kmerSize – kmer size to use
  • minFraction – minimum portion of the genomes that must match to trust ANI

Loop over all FASTA files generating fastANI command lines for each pairwise comparison.

pyani.fastani.generate_fastani_jobs(filenames: List[pathlib.Path], outdir: pathlib.Path = PosixPath('.'), fastani_exe: pathlib.Path = PosixPath('fastANI'), fragLen: int = 3000, kmerSize: int = 16, minFraction: float = 0.2, jobprefix: str = 'fastANI')[source]

Return list of Jobs describing fastANI command lines.

Parameters:
  • filenames – a list of paths to input FASTA files
  • outdir – path to output directory
  • fastani_exe – location of the fastANI binary
  • fragLen – fragment length to use
  • kmerSize – kmer size to use
  • minFraction – minimum portion of the genomes that must match to trust ANI
  • jobprefix

Loop over all FASTA files, generating Jobs describing fastANI command lines for each pairwise comparison.

pyani.fastani.get_version(fastani_exe: pathlib.Path = PosixPath('fastANI')) → str[source]

Return FastANI package version as a string.

Parameters:fastani_exe – path to FastANI executable

We expect fastANI to return a string on STDOUT as

$ ./fastANI -v
version 1.32

we concatenate this with the OS name.

The following circumstances are explicitly reported as strings:

  • no executable at passed path
  • non-executable file at passed path (this includes cases where the user doesn’t have execute permissions on the file)
  • no version info returned
pyani.fastani.parse_fastani_file(filename: pathlib.Path) → pyani.fastani.ComparisonResult[source]

Return (ref genome, query genome, ANI estimate, orthologous matches, sequence fragments) tuple.

Parameters:filename – Path, path to the input file

Extracts the ANI estimate, the number of orthologous matches, and the number of sequence fragments considered from the fastANI output file.

We assume that all fastANI comparisons are pairwise: one query and one reference file. The fastANI file should contain a single line.

fsatANI can produce multi-line output, if a list of query/reference files is given to it.

pyani.fastani.process_files(outdir: pathlib.Path, org_lengths: Dict[KT, VT]) → pyani.pyani_tools.ANIResults[source]

Return tuple of fastANI results for files in passed directory.

Parameters:
  • outdir – Path, path to the directory containing output files
  • org_lengths – dictionary of total sequence lengths, keyed by sequence

Returns the following pandas dataframes in an ANIResults object; query sequences are rows, reference sequences are columns:

  • alignment_lengths - asymmetrical: total length of alignment
  • percentage_identity - asymmetrical: percentage identity of alignment
  • alignment_coverage - asymmetrical: coverage of query and reference
  • similarity_errors - asymmetrical: count of similarity errors

May throw a ZeroDivisionError if one or more fastANI runs failed, or a very distant sequence was included in the analysis.