pyani.scripts.subcommands.subcmd_anib module

Provides the anib subcommand for pyani.

pyani.scripts.subcommands.subcmd_anib.fragment_fasta_file(inpath: pathlib.Path, outdir: pathlib.Path, fragsize: int) → Tuple[pathlib.Path, str][source]

Return path to fragmented sequence file and JSON of fragment lengths.

Parameters:
  • inpath – Path to genome file
  • outdir – Path to directory to hold fragmented files
  • fragsize – size of genome fragments

Returns a tuple of (path, json) where path is the path to the fragment file and json is a JSON-ified dictionary of fragment lengths, keyed by fragment sequence ID.

pyani.scripts.subcommands.subcmd_anib.generate_joblist(comparisons: List[T], existingfiles: List[T], fragfiles: List[T], fraglens: List[T], args: argparse.Namespace) → NotImplementedError[source]

Return list of ComparisonJobs.

Parameters:
  • comparisons – list of (Genome, Genome) tuples for which comparisons are needed
  • existingfiles – list of pre-existing BLASTN+ outputs
  • fragfiles
  • fraglens
  • args – Namespace, command-line arguments
pyani.scripts.subcommands.subcmd_anib.subcmd_anib(args: argparse.Namespace) → None[source]

Perform ANIb on all genome files in an input directory.

Parameters:args – Namespace, command-line arguments

Finds ANI by the ANIb method, as described in Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, et al. (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Micr 57: 81-91. doi:10.1099/ijs.0.64483-0.

All FASTA format files (selected by suffix) in the input directory are fragmented into (by default 1020nt) consecutive sections, and a BLAST+ database constructed from the whole genome input. The BLAST+ blastn tool is then used to query each set of fragments against each BLAST+ database, in turn.

For each query, the BLAST+ .tab output is parsed to obtain alignment length, identity and similarity error count. Alignments below a threshold are not included in the calculation (this introduces systematic bias with respect to ANIm). The results are processed to calculate the ANI percentages, coverage, and similarity error.

The calculated values are stored in the local SQLite3 database.