Requirements¶
The pyani
package requires several other programs, packages and tools to run and develop. Many of these are automatically installed alongside pyani
, but some packages and tools must be installed separately.
This page describes requirements for pyani
and how/why they are used.
Tip
For more information about installation of specific packages, please see the Installation Guide page.
Python3
¶
pyani
is written in Python
, and the modern version of Python
is Python3
. The legacy version of Python
will not be maintained past 2020. pyani
is written to use many features of Python3
and will not run on Python2
.
NCBI-BLAST+
¶
To carry out ANIb (average nucleotide identity using BLAST
) analysis, genome sequences are compared using the BLAST+
tool, provided by NCBI. The BLAST+
tool is the current, maintained version, and is completely rewritten with respect to the legacy BLAST
package (see below).
MUMmer
v3.23¶
To carry out ANIm (average nucleotide identity using MUMmer
) analysis, genome sequences are compared using the nucmer
tool, part of the MUMmer
package. Currently, pyani
uses an older version of MUMmer
for this analysis, pinned at version 3.23. pyani
has not yet been tested with MUMmer
4.x.
Legacy NCBI-BLAST
¶
An alternative implementation of ANIb (average nucleotide identity using BLAST
), included for compatibility checks with other ANI calculation software is provided in pyani
through the legacy script average_nucleotide_identity.py
. The use of the legacy aniblastall
analysis is not recommended, and NCBI do not recommend use of the legacy NCBI-BLAST
tool. However, the legacy software can still be downloaded and installed, for the curious and those who wish to test legacy compatibility.
fastANI
v1.32¶
To carry out fastANI (average nucleotide identity using fastANI
) analysis, genome sequences are compared using the fastANI
tool.
SQLite3¶
The output generated by pyani
analyses is stored in a local database, provided by SQLite3
, for rapid querying and recovery. This allows for persistent storage of results without the need to keep the original alignment files, and for incremental addition of new analyses. SQLite
is installed with Python
Open Grid Scheduler¶
When running on a cluster, pyani
currently schedules jobs using the Sun Grid Engine/Open Grid Engine/Open Grid Scheduler syntax. Your cluster will require a compatible scheduler for pyani
to distribute jobs appropriately:
Python Packages¶
pyani
relies on functionality provided by a number of additional Python packages, and we gratefully acknowledge their contribution:
- Biopython: for working with biological data formats
- Matplotlib: for graphical output
- NetworkX: for graph calculations and representation
- Numpy: for matrix calculations
- OpenPyXL: for MicroSoft Excel output compatibility
- Pandas: for dataframe operations
- Pillow: for graphics manipulation and rendering
- SciPy: for scientific computing operations
- Seaborn: for graphical output
- SQLAlchemy: (pinned at v1.2.18 for compatibility reasons) for interaction with
SQLite3
- tqdm: provides progress bars for user interaction
Development¶
We rely on a number of additional packages to aid pyani
development, and if you set up a development environment as recommended in Contributing to pyani, then the following Python packages will be installed or expected to be present:
- bandit: to check for security issues in the codebase
- black: to enforce consistent, opinionated code formatting
- codecov: to generate code coverage output for the codecov.io service
- coverage: to generate code coverage output for local inspection
- doc8: to check docstring formatting syntax
- flake8: for code linting
- jinja2: for output/docfile templating
- pre-commit: for checking code style and quality prior to
git
commit - pylint: for code linting
- pytest: to manage and run automated testing
- pytest-cov: to integrate
pytest
withcodecov
andcoverage
- pytest-ordering: to ensure
pytest
test ordering - sphinx: to generate documentation
- sphinx-rtd-theme: to provide local
ReadTheDocs
style formatting