Requirements

The pyani package requires several other programs, packages and tools to run and develop. Many of these are automatically installed alongside pyani, but some packages and tools must be installed separately.

This page describes requirements for pyani and how/why they are used.

Tip

For more information about installation of specific packages, please see the Installation Guide page.

Python3

pyani is written in Python, and the modern version of Python is Python3. The legacy version of Python will not be maintained past 2020. pyani is written to use many features of Python3 and will not run on Python2.

NCBI-BLAST+

To carry out ANIb (average nucleotide identity using BLAST) analysis, genome sequences are compared using the BLAST+ tool, provided by NCBI. The BLAST+ tool is the current, maintained version, and is completely rewritten with respect to the legacy BLAST package (see below).

MUMmer v3.23

To carry out ANIm (average nucleotide identity using MUMmer) analysis, genome sequences are compared using the nucmer tool, part of the MUMmer package. Currently, pyani uses an older version of MUMmer for this analysis, pinned at version 3.23. pyani has not yet been tested with MUMmer 4.x.

Legacy NCBI-BLAST

An alternative implementation of ANIb (average nucleotide identity using BLAST), included for compatibility checks with other ANI calculation software is provided in pyani through the legacy script average_nucleotide_identity.py. The use of the legacy aniblastall analysis is not recommended, and NCBI do not recommend use of the legacy NCBI-BLAST tool. However, the legacy software can still be downloaded and installed, for the curious and those who wish to test legacy compatibility.

fastANI v1.32

To carry out fastANI (average nucleotide identity using fastANI) analysis, genome sequences are compared using the fastANI tool.

SQLite3

The output generated by pyani analyses is stored in a local database, provided by SQLite3, for rapid querying and recovery. This allows for persistent storage of results without the need to keep the original alignment files, and for incremental addition of new analyses. SQLite is installed with Python

Open Grid Scheduler

When running on a cluster, pyani currently schedules jobs using the Sun Grid Engine/Open Grid Engine/Open Grid Scheduler syntax. Your cluster will require a compatible scheduler for pyani to distribute jobs appropriately:

Python Packages

pyani relies on functionality provided by a number of additional Python packages, and we gratefully acknowledge their contribution:

  • Biopython: for working with biological data formats
  • Matplotlib: for graphical output
  • NetworkX: for graph calculations and representation
  • Numpy: for matrix calculations
  • OpenPyXL: for MicroSoft Excel output compatibility
  • Pandas: for dataframe operations
  • Pillow: for graphics manipulation and rendering
  • SciPy: for scientific computing operations
  • Seaborn: for graphical output
  • SQLAlchemy: (pinned at v1.2.18 for compatibility reasons) for interaction with SQLite3
  • tqdm: provides progress bars for user interaction

Development

We rely on a number of additional packages to aid pyani development, and if you set up a development environment as recommended in Contributing to pyani, then the following Python packages will be installed or expected to be present:

  • bandit: to check for security issues in the codebase
  • black: to enforce consistent, opinionated code formatting
  • codecov: to generate code coverage output for the codecov.io service
  • coverage: to generate code coverage output for local inspection
  • doc8: to check docstring formatting syntax
  • flake8: for code linting
  • jinja2: for output/docfile templating
  • pre-commit: for checking code style and quality prior to git commit
  • pylint: for code linting
  • pytest: to manage and run automated testing
  • pytest-cov: to integrate pytest with codecov and coverage
  • pytest-ordering: to ensure pytest test ordering
  • sphinx: to generate documentation
  • sphinx-rtd-theme: to provide local ReadTheDocs style formatting