pyani.pyani_orm module

Module providing useful functions for manipulating pyani’s SQLite3 db.

This SQLAlchemy-based ORM replaces the previous SQL-based module

class pyani.pyani_orm.BlastDB(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

Describes relationship between genome, run, source BLAST database and query fragments.

Each genome and run combination can be assigned a single BLAST database for the comparisons

  • fragpath path to fragmented genome (query in ANIb)
  • dbpath path to source genome database (subject in ANIb)
  • fragsizes JSONified dict of fragment sizes
  • dbcmd command used to generate database
blastdb_id
dbcmd
dbpath
fragpath
fragsizes
genome
genome_id
run
run_id
class pyani.pyani_orm.Comparison(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

Describes a single pairwise comparison between two genomes.

aln_length
comparison_id
cov_query
cov_subject
fragsize
identity
kmersize
maxmatch
minmatch
program
query
query_id
runs
sim_errs
subject
subject_id
version
class pyani.pyani_orm.Genome(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

Describes an input genome for a pyani run.

  • genome_id
    primary key
  • genome_hash
    MD5 hash of input genome file (in path)
  • path
    path to FASTA genome file
  • length
    length of genome (total bases)
  • description
    genome description
blastdbs
description
genome_hash
genome_id
labels
length
path
query_comparisons
runs
subject_comparisons
class pyani.pyani_orm.Label(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

Describes relationship between genome, run and genome label.

Each genome and run combination can be assigned a single label

class_label
genome
genome_id
label
label_id
run
run_id
class pyani.pyani_orm.LabelTuple[source]

Bases: tuple

Label and Class for each file.

class_label

Alias for field number 1

label

Alias for field number 0

exception pyani.pyani_orm.PyaniORMException[source]

Bases: pyani.PyaniException

Exception raised when ORM or database interaction fails.

class pyani.pyani_orm.Run(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

Describes a single pyani run.

blastdbs
cmdline
comparisons
date
df_alnlength
df_coverage
df_hadamard
df_identity
df_simerrors
genomes
labels
method
name
run_id
status
pyani.pyani_orm.add_run(session, method, cmdline, date, status, name)[source]

Create a new Run and add it to the session.

Parameters:
  • session – live SQLAlchemy session of pyani database
  • method – string describing analysis run type
  • cmdline – string describing pyani command-line for run
  • date – datetime object describing analysis start time
  • status – string describing status of analysis
  • name – string - name given to the analysis run

Creates a new Run object with the passed parameters, and returns it.

pyani.pyani_orm.add_run_genomes(session, run, indir: pathlib.Path, classpath: pathlib.Path, labelpath: pathlib.Path, **kwargs) → List[T][source]

Add genomes for a run to the database.

Parameters:
  • session – live SQLAlchemy session of pyani database
  • run – Run object describing the parent pyani run
  • indir – path to the directory containing genomes
  • classpath – path to the file containing class information for each genome
  • labelpath – path to the file containing class information for each genome

This function expects a single directory (indir) containing all FASTA files for a run, and optional paths to plain text files that contain information on class and label strings for each genome.

If the genome already exists in the database, then a Genome object is recovered from the database. Otherwise, a new Genome object is created. All Genome objects will be associated with the passed Run object.

The session changes are committed once all genomes and labels are added to the database without error, as a single transaction.

pyani.pyani_orm.create_db(dbpath: pathlib.Path) → None[source]

Create an empty pyani SQLite3 database at the passed path.

Parameters:dbpath – path to pyani database
pyani.pyani_orm.filter_existing_comparisons(session, run, comparisons, program, version, fragsize: Optional[int] = None, maxmatch: Optional[bool] = False, kmersize: Optional[int] = None, minmatch: Optional[float] = None) → List[T][source]

Filter list of (Genome, Genome) comparisons for those not in the session db.

Parameters:
  • session – live SQLAlchemy session of pyani database
  • run – Run object describing parent pyani run
  • comparisons – list of (Genome, Genome) query vs subject comparisons
  • program – program used for comparison
  • version – version of program for comparison
  • fragsize – fragment size for BLAST databases
  • maxmatch – maxmatch used with nucmer comparison

When passed a list of (Genome, Genome) comparisons as comparisons, check whether the comparison exists in the database and, if so, associate it with the passed run. If not, then add the (Genome, Genome) pair to a list for returning as the comparisons that still need to be run.

pyani.pyani_orm.get_comparison_dict(session: Any) → Dict[Tuple, Any][source]

Return a dictionary of comparisons in the session database.

Parameters:session – live SQLAlchemy session of pyani database

Returns Comparison objects, keyed by (_.query_id, _.subject_id, _.program, _.version, _.fragsize, _.maxmatch) tuple

pyani.pyani_orm.get_matrix_classes_for_run(session: Any, run_id: int) → Dict[str, List[T]][source]

Return dictionary of genome classes, keyed by row/column ID.

Parameters:
  • session – live SQLAlchemy session
  • run_id – the Run.run_id value for matrices

The class labels should be valid for identity, coverage and other complete matrix results accessed via the .df_* attributes of a run

Labels are returned keyed by the string of the genome ID, for compatibility with matplotlib.

pyani.pyani_orm.get_matrix_labels_for_run(session: Any, run_id: int) → Dict[KT, VT][source]

Return dictionary of genome labels, keyed by row/column ID.

Parameters:
  • session – live SQLAlchemy session
  • run_id – the Run.run_id value for matrices

The labels should be valid for identity, coverage and other complete matrix results accessed via the .df_* attributes of a run.

Labels are returned keyed by the string of the genome ID, for compatibility with matplotlib.

pyani.pyani_orm.get_session(dbpath: pathlib.Path) → Any[source]

Connect to an existing pyani SQLite3 database and return a session.

Parameters:dbpath – path to pyani database
pyani.pyani_orm.update_comparison_matrices(session, run) → None[source]

Update the Run table with summary matrices for the analysis.

Parameters:
  • session – active pyanidb session via ORM
  • run – Run ORM object for the current ANIm run