pyani.pyani_orm module¶

Module providing useful functions for manipulating pyani’s SQLite3 db.

This SQLAlchemy-based ORM replaces the previous SQL-based module

class pyani.pyani_orm.BlastDB(**kwargs)[source]¶

Bases: sqlalchemy.orm.decl_api.Base

Describes relationship between genome, run, source BLAST database and query fragments.

Each genome and run combination can be assigned a single BLAST database for the comparisons

fragpath path to fragmented genome (query in ANIb)
dbpath path to source genome database (subject in ANIb)
fragsizes JSONified dict of fragment sizes
dbcmd command used to generate database

blastdb_id¶

dbcmd¶

dbpath¶

fragpath¶

fragsizes¶

genome¶

genome_id¶

run¶

run_id¶

class pyani.pyani_orm.Comparison(**kwargs)[source]¶

Bases: sqlalchemy.orm.decl_api.Base

Describes a single pairwise comparison between two genomes.

aln_length¶

comparison_id¶

cov_query¶

cov_subject¶

fragsize¶

identity¶

kmersize¶

maxmatch¶

minmatch¶

program¶

query¶

query_id¶

runs¶

sim_errs¶

subject¶

subject_id¶

version¶

class pyani.pyani_orm.Genome(**kwargs)[source]¶

Bases: sqlalchemy.orm.decl_api.Base

Describes an input genome for a pyani run.

genome_id

primary key
genome_hash

MD5 hash of input genome file (in path)
path

path to FASTA genome file
length

length of genome (total bases)
description

genome description

blastdbs¶

description¶

genome_hash¶

genome_id¶

labels¶

length¶

path¶

query_comparisons¶

runs¶

subject_comparisons¶

class pyani.pyani_orm.Label(**kwargs)[source]¶

Bases: sqlalchemy.orm.decl_api.Base

Describes relationship between genome, run and genome label.

Each genome and run combination can be assigned a single label

class_label¶

genome¶

genome_id¶

label¶

label_id¶

run¶

run_id¶

class pyani.pyani_orm.LabelTuple[source]¶

Bases: tuple

Label and Class for each file.

class_label¶: Alias for field number 1

label¶: Alias for field number 0

exception pyani.pyani_orm.PyaniORMException[source]¶

Bases: pyani.PyaniException

Exception raised when ORM or database interaction fails.

class pyani.pyani_orm.Run(**kwargs)[source]¶

Bases: sqlalchemy.orm.decl_api.Base

Describes a single pyani run.

blastdbs¶

cmdline¶

comparisons¶

date¶

df_alnlength¶

df_coverage¶

df_hadamard¶

df_identity¶

df_simerrors¶

genomes¶

labels¶

method¶

name¶

run_id¶

status¶

pyani.pyani_orm.add_run(session, method, cmdline, date, status, name)[source]¶

Create a new Run and add it to the session.

Parameters:	session – live SQLAlchemy session of pyani database method – string describing analysis run type cmdline – string describing pyani command-line for run date – datetime object describing analysis start time status – string describing status of analysis name – string - name given to the analysis run

Creates a new Run object with the passed parameters, and returns it.

pyani.pyani_orm.add_run_genomes(session, run, indir: pathlib.Path, classpath: pathlib.Path, labelpath: pathlib.Path, **kwargs) → List[T][source]¶

Add genomes for a run to the database.

Parameters:	session – live SQLAlchemy session of pyani database run – Run object describing the parent pyani run indir – path to the directory containing genomes classpath – path to the file containing class information for each genome labelpath – path to the file containing class information for each genome

This function expects a single directory (indir) containing all FASTA files for a run, and optional paths to plain text files that contain information on class and label strings for each genome.

If the genome already exists in the database, then a Genome object is recovered from the database. Otherwise, a new Genome object is created. All Genome objects will be associated with the passed Run object.

The session changes are committed once all genomes and labels are added to the database without error, as a single transaction.

pyani.pyani_orm.create_db(dbpath: pathlib.Path) → None[source]¶

Create an empty pyani SQLite3 database at the passed path.

Parameters:	dbpath – path to pyani database

pyani.pyani_orm.filter_existing_comparisons(session, run, comparisons, program, version, fragsize: Optional[int] = None, maxmatch: Optional[bool] = False, kmersize: Optional[int] = None, minmatch: Optional[float] = None) → List[T][source]¶

Filter list of (Genome, Genome) comparisons for those not in the session db.

Parameters:	session – live SQLAlchemy session of pyani database run – Run object describing parent pyani run comparisons – list of (Genome, Genome) query vs subject comparisons program – program used for comparison version – version of program for comparison fragsize – fragment size for BLAST databases maxmatch – maxmatch used with nucmer comparison

When passed a list of (Genome, Genome) comparisons as comparisons, check whether the comparison exists in the database and, if so, associate it with the passed run. If not, then add the (Genome, Genome) pair to a list for returning as the comparisons that still need to be run.

pyani.pyani_orm.get_comparison_dict(session: Any) → Dict[Tuple, Any][source]¶

Return a dictionary of comparisons in the session database.

Parameters:	session – live SQLAlchemy session of pyani database

Returns Comparison objects, keyed by (_.query_id, _.subject_id, _.program, _.version, _.fragsize, _.maxmatch) tuple

pyani.pyani_orm.get_matrix_classes_for_run(session: Any, run_id: int) → Dict[str, List[T]][source]¶

Return dictionary of genome classes, keyed by row/column ID.

Parameters:	session – live SQLAlchemy session run_id – the Run.run_id value for matrices

The class labels should be valid for identity, coverage and other complete matrix results accessed via the .df_* attributes of a run

Labels are returned keyed by the string of the genome ID, for compatibility with matplotlib.

pyani.pyani_orm.get_matrix_labels_for_run(session: Any, run_id: int) → Dict[KT, VT][source]¶

Return dictionary of genome labels, keyed by row/column ID.

Parameters:	session – live SQLAlchemy session run_id – the Run.run_id value for matrices

The labels should be valid for identity, coverage and other complete matrix results accessed via the .df_* attributes of a run.

Labels are returned keyed by the string of the genome ID, for compatibility with matplotlib.

pyani.pyani_orm.get_session(dbpath: pathlib.Path) → Any[source]¶

Connect to an existing pyani SQLite3 database and return a session.

Parameters:	dbpath – path to pyani database

pyani.pyani_orm.update_comparison_matrices(session, run) → None[source]¶

Update the Run table with summary matrices for the analysis.

Parameters:	session – active pyanidb session via ORM run – Run ORM object for the current ANIm run