Commands and Usage
ViralQC provides three main commands through the command-line interface (vqc).
get-nextclade-datasets
Downloads and configures Nextclade datasets locally.
Important
This command must be run at least once before using run.
Usage
vqc get-nextclade-datasets --cores 2
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
String |
|
Directory where datasets will be stored |
|
Integer |
|
Number of threads/cores to use |
|
Boolean |
|
Show snakemake logs |
Output Structure
datasets/
├── nextclade_data/
│ ├── denv1/
│ ├── denv2/
│ └── ...
├── external_datasets/
│ └── zikav/
└── external_datasets_minimizers.json
get-blast-database
Creates a local BLAST database containing all viral genomes from NCBI RefSeq.
Important
This command must be run at least once before using run.
Usage
vqc get-blast-database
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
String |
|
Directory where the BLAST database will be stored |
|
String |
|
Filter sequences by release date (YYYY-MM-DD). Only sequences released on or before this date will be included |
|
Integer |
|
Number of threads/cores to use |
|
Boolean |
|
Show snakemake logs |
Release Date Filtering
The --release-date parameter allows you to create a reproducible BLAST database by filtering sequences based on their NCBI release date:
# Create database with all sequences released up to June 15, 2023
vqc get-blast-database --release-date 2023-06-15
Behavior:
When
--release-dateis provided:Only sequences with
release_date <= specified_dateare includedThe specified date is used as the database version identifier
When not provided:
All available RefSeq sequences are included
Current date is used as the database version identifier
This is useful for:
Reproducibility: Recreate the same database at different points in time
Auditing: Track which sequences were available at a specific date
Comparative studies: Analyze how results change with database updates
Database Version
The database version is recorded in blast.tsv metadata file:
Format:
ncbi-refseq-virus_YYYY-MM-DDUses the
--release-datevalue if provided, otherwise the current date
Output Structure
datasets/
├── blast.fasta # Reference sequences
├── blast.fasta.ndb # BLAST database files
├── blast.fasta.nhr
├── blast.fasta.nin
├── blast.fasta.nsq
├── blast.tsv # Metadata with version info
└── blast_gff/ # GFF3 files for generic analysis
run
Main analysis command. Identifies viruses, performs quality control, and extracts target regions.
Usage
vqc run --input my_sequences.fasta
Required Parameters
Parameter |
Type |
Description |
|---|---|---|
|
String |
Path to the input FASTA file |
Output Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
String |
|
Working directory. Results will be stored in an |
|
String |
|
Results file ( |
Dataset Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
String |
|
Path to Nextclade datasets directory |
Nextclade Sort Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
Float |
|
Minimum score for valid match |
|
Integer |
|
Minimum hits for valid match |
BLAST Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
String |
|
Path to BLAST database |
|
String |
|
Path to BLAST metadata |
|
Integer |
|
Minimum percent identity (0-100) |
|
Float |
|
Maximum E-value |
|
Integer |
|
Minimum query coverage (0-100) |
|
String |
|
BLAST task type |
BLAST Task Types
The --blast-task parameter controls the BLAST algorithm sensitivity:
Task |
Description |
Use Case |
|---|---|---|
|
Highly similar sequences (default) |
Fast, same species |
|
Discontiguous megablast |
Cross-species, more sensitive |
|
Traditional BLASTN |
More distant sequences |
|
Short sequences |
Sequences < 50 bp |
Examples:
# Default (megablast) - fast, for similar sequences
vqc run --input seqs.fasta
# More sensitive search for distant viruses
vqc run --input seqs.fasta --blast-task dc-megablast
# Traditional BLASTN for divergent sequences
vqc run --input seqs.fasta --blast-task blastn
System Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
Integer |
|
Number of threads/cores |
|
Boolean |
|
Show snakemake logs |
Complete Example
vqc run \
--input samples.fasta \
--output-dir results \
--output-file report.tsv \
--blast-pident 75 \
--blast-task dc-megablast \
--cores 8
Analysis Workflow
Nextclade Sort: Maps sequences to local datasets
BLAST Analysis: Identifies unmapped sequences
Nextclade Run: Quality control analysis
Post-processing: Combines and scores results
Region Extraction: Extracts target regions based on quality
API
How the viralQC is designed to integrate with viral genomic databases, it is possible to integrate the analysis module in the code of other applications.
This can be done by importing the RunAnalysis class from the viralqc.core.run_analysis module. This class has the run method that executes the quality analysis of a viral genome, receiving as parameter the path to the FASTA file containing the sequences to be analyzed. Other parameters can be informed in an optimized way.
Usage
from viralqc.core.run_analysis import RunAnalysis
input_file = "seqs.fasta"
output_directory = "results"
output_file = "results.json"
run_analysis = RunAnalysis()
snakemake_response = run_analysis.run(
sequences_fasta=input_file,
output_dir=output_directory,
output_file=output_file
)
Or a flexible approach:
from viralqc.core.run_analysis import RunAnalysis
input_file = "seqs.fasta"
output_directory = "results"
output_file = "results.json"
run_analysis = RunAnalysis()
snakemake_response = run_analysis.run(
sequences_fasta=input_file,
output_dir=output_directory,
output_file=output_file,
cores=2,
datasets_local_path="datasets",
nextclade_sort_min_score=0.1,
nextclade_sort_min_hits=10,
blast_database="datasets/blast.fasta",
blast_database_metadata="datasets/blast.tsv",
blast_identity_threshold=0,
blast_evalue=0.01,
blast_qcov=0,
blast_task="blastn"
)
To check the results:
if snakemake_response.status == 200:
results_data = snakemake_response.get_results()
for seq_result in results_data:
virus = seq_result.get("virus")
quality = seq_result.get("genomeQuality")
coverage = seq_result.get("coverage")
print(virus, quality, coverage)
else:
raise Exception(snakemake_response.format_log())
Attributes and Methods
The run method returns a SnakemakeResponse object that has the following attributes:
Attribute |
Type |
Description |
|---|---|---|
run_id |
str |
Execution ID |
status |
RunStatus |
Execution status, which can be 200 (success) or 500 (failure) |
log_path |
str |
Path to the log file |
results_path |
str |
Path to the results file |
And the following methods:
Method |
Description |
|---|---|
|
Returns the log file content formatted |
|
Returns the results file content in dictionary format |