pyani.pyani_orm module¶
Module providing useful functions for manipulating pyani’s SQLite3 db.
This SQLAlchemy-based ORM replaces the previous SQL-based module
-
class
pyani.pyani_orm.
BlastDB
(**kwargs)[source]¶ Bases:
sqlalchemy.orm.decl_api.Base
Describes relationship between genome, run, source BLAST database and query fragments.
Each genome and run combination can be assigned a single BLAST database for the comparisons
- fragpath path to fragmented genome (query in ANIb)
- dbpath path to source genome database (subject in ANIb)
- fragsizes JSONified dict of fragment sizes
- dbcmd command used to generate database
-
blastdb_id
¶
-
dbcmd
¶
-
dbpath
¶
-
fragpath
¶
-
fragsizes
¶
-
genome
¶
-
genome_id
¶
-
run
¶
-
run_id
¶
-
class
pyani.pyani_orm.
Comparison
(**kwargs)[source]¶ Bases:
sqlalchemy.orm.decl_api.Base
Describes a single pairwise comparison between two genomes.
-
aln_length
¶
-
comparison_id
¶
-
cov_query
¶
-
cov_subject
¶
-
fragsize
¶
-
identity
¶
-
kmersize
¶
-
maxmatch
¶
-
minmatch
¶
-
program
¶
-
query
¶
-
query_id
¶
-
runs
¶
-
sim_errs
¶
-
subject
¶
-
subject_id
¶
-
version
¶
-
-
class
pyani.pyani_orm.
Genome
(**kwargs)[source]¶ Bases:
sqlalchemy.orm.decl_api.Base
Describes an input genome for a pyani run.
- genome_id
- primary key
- genome_hash
- MD5 hash of input genome file (in
path
)
- path
- path to FASTA genome file
- length
- length of genome (total bases)
- description
- genome description
-
blastdbs
¶
-
description
¶
-
genome_hash
¶
-
genome_id
¶
-
labels
¶
-
length
¶
-
path
¶
-
query_comparisons
¶
-
runs
¶
-
subject_comparisons
¶
-
class
pyani.pyani_orm.
Label
(**kwargs)[source]¶ Bases:
sqlalchemy.orm.decl_api.Base
Describes relationship between genome, run and genome label.
Each genome and run combination can be assigned a single label
-
class_label
¶
-
genome
¶
-
genome_id
¶
-
label
¶
-
label_id
¶
-
run
¶
-
run_id
¶
-
-
class
pyani.pyani_orm.
LabelTuple
[source]¶ Bases:
tuple
Label and Class for each file.
-
class_label
¶ Alias for field number 1
-
label
¶ Alias for field number 0
-
-
exception
pyani.pyani_orm.
PyaniORMException
[source]¶ Bases:
pyani.PyaniException
Exception raised when ORM or database interaction fails.
-
class
pyani.pyani_orm.
Run
(**kwargs)[source]¶ Bases:
sqlalchemy.orm.decl_api.Base
Describes a single pyani run.
-
blastdbs
¶
-
cmdline
¶
-
comparisons
¶
-
date
¶
-
df_alnlength
¶
-
df_coverage
¶
-
df_hadamard
¶
-
df_identity
¶
-
df_simerrors
¶
-
genomes
¶
-
labels
¶
-
method
¶
-
name
¶
-
run_id
¶
-
status
¶
-
-
pyani.pyani_orm.
add_run
(session, method, cmdline, date, status, name)[source]¶ Create a new Run and add it to the session.
Parameters: - session – live SQLAlchemy session of pyani database
- method – string describing analysis run type
- cmdline – string describing pyani command-line for run
- date – datetime object describing analysis start time
- status – string describing status of analysis
- name – string - name given to the analysis run
Creates a new Run object with the passed parameters, and returns it.
-
pyani.pyani_orm.
add_run_genomes
(session, run, indir: pathlib.Path, classpath: pathlib.Path, labelpath: pathlib.Path, **kwargs) → List[T][source]¶ Add genomes for a run to the database.
Parameters: - session – live SQLAlchemy session of pyani database
- run – Run object describing the parent pyani run
- indir – path to the directory containing genomes
- classpath – path to the file containing class information for each genome
- labelpath – path to the file containing class information for each genome
This function expects a single directory (indir) containing all FASTA files for a run, and optional paths to plain text files that contain information on class and label strings for each genome.
If the genome already exists in the database, then a Genome object is recovered from the database. Otherwise, a new Genome object is created. All Genome objects will be associated with the passed Run object.
The session changes are committed once all genomes and labels are added to the database without error, as a single transaction.
-
pyani.pyani_orm.
create_db
(dbpath: pathlib.Path) → None[source]¶ Create an empty pyani SQLite3 database at the passed path.
Parameters: dbpath – path to pyani database
-
pyani.pyani_orm.
filter_existing_comparisons
(session, run, comparisons, program, version, fragsize: Optional[int] = None, maxmatch: Optional[bool] = False, kmersize: Optional[int] = None, minmatch: Optional[float] = None) → List[T][source]¶ Filter list of (Genome, Genome) comparisons for those not in the session db.
Parameters: - session – live SQLAlchemy session of pyani database
- run – Run object describing parent pyani run
- comparisons – list of (Genome, Genome) query vs subject comparisons
- program – program used for comparison
- version – version of program for comparison
- fragsize – fragment size for BLAST databases
- maxmatch – maxmatch used with nucmer comparison
When passed a list of (Genome, Genome) comparisons as comparisons, check whether the comparison exists in the database and, if so, associate it with the passed run. If not, then add the (Genome, Genome) pair to a list for returning as the comparisons that still need to be run.
-
pyani.pyani_orm.
get_comparison_dict
(session: Any) → Dict[Tuple, Any][source]¶ Return a dictionary of comparisons in the session database.
Parameters: session – live SQLAlchemy session of pyani database Returns Comparison objects, keyed by (_.query_id, _.subject_id, _.program, _.version, _.fragsize, _.maxmatch) tuple
-
pyani.pyani_orm.
get_matrix_classes_for_run
(session: Any, run_id: int) → Dict[str, List[T]][source]¶ Return dictionary of genome classes, keyed by row/column ID.
Parameters: - session – live SQLAlchemy session
- run_id – the Run.run_id value for matrices
The class labels should be valid for identity, coverage and other complete matrix results accessed via the .df_* attributes of a run
Labels are returned keyed by the string of the genome ID, for compatibility with matplotlib.
-
pyani.pyani_orm.
get_matrix_labels_for_run
(session: Any, run_id: int) → Dict[KT, VT][source]¶ Return dictionary of genome labels, keyed by row/column ID.
Parameters: - session – live SQLAlchemy session
- run_id – the Run.run_id value for matrices
The labels should be valid for identity, coverage and other complete matrix results accessed via the .df_* attributes of a run.
Labels are returned keyed by the string of the genome ID, for compatibility with matplotlib.