pyani.scripts.subcommands.subcmd_download module

Provides the download subcommand for pyani.

class pyani.scripts.subcommands.subcmd_download.Skipped[source]

Bases: tuple

Convenience struct for holding information about skipped genomes.

accession

Alias for field number 1

dltype

Alias for field number 5

organism

Alias for field number 2

strain

Alias for field number 3

taxon_id

Alias for field number 0

url

Alias for field number 4

pyani.scripts.subcommands.subcmd_download.configure_entrez(args: argparse.Namespace) → Optional[str][source]

Configure Entrez email, return API key.

Parameters:args – Namespace, command-line arguments

Returns None if no API key found

pyani.scripts.subcommands.subcmd_download.dl_info_to_str(esummary, uid_class) → str[source]

Return descriptive string for passed download data.

Parameters:
  • esummary
  • uid_class
pyani.scripts.subcommands.subcmd_download.download_data(args: argparse.Namespace, api_key: Optional[str], asm_dict: Dict[str, List[T]]) → Tuple[List[T], List[T], List[T]][source]

Download the accessions indicated in the passed dictionary.

Parameters:
  • args – Namespace of command-line arguments
  • api_key – str, API key for NCBI downloads
  • asm_dict – dictionary of assembly UIDs to download, keyed by taxID

Returns lists of information about downloaded genome classes and labels, and a list of skipped downloads (as Skipped objects).

pyani.scripts.subcommands.subcmd_download.download_genome(args: argparse.Namespace, filestem: str, tid: str, uid: str, uid_class)[source]

Download single genome data to output directory.

Parameters:
  • args – Namespace, command-line arguments
  • filestem – str, output filestem
  • tid – str, taxonID
  • uid – str, assembly UID
  • uid_class
pyani.scripts.subcommands.subcmd_download.extract_genomes(args: argparse.Namespace, dlstatus: pyani.download.DLStatus, esummary) → None[source]

Extract genome files in passed dlstatus.

Parameters:
  • args – Namespace of command-line arguments
  • dlstatus
  • esummary
pyani.scripts.subcommands.subcmd_download.get_tax_asm_dict(args: argparse.Namespace) → Dict[str, List[T]][source]

Return dictionary of assembly UIDs to download, keyed by taxID.

Parameters:args – Namespace of command-line arguments
pyani.scripts.subcommands.subcmd_download.hash_genomes(args: argparse.Namespace, dlstatus: pyani.download.DLStatus, filestem: str, uid_class) → Tuple[str, str][source]

Hash genome files in passed dlstatus.

Parameters:
  • args – Namespace of command-line arguments
  • dlstatus
  • filestem – str, filestem for output
  • uid_class
pyani.scripts.subcommands.subcmd_download.parse_api_key(args: argparse.Namespace) → Optional[str][source]

Returns NCBI API key if present, None otherwise.

Parameters:args – Namespace of command-line arguments

Checks for key in args.api_keypath.

pyani.scripts.subcommands.subcmd_download.subcmd_download(args: argparse.Namespace) → int[source]

Download assembled genomes in subtree of passed NCBI taxon ID.

Parameters:args – Namespace, command-line arguments