ViralQC
Installation
Prerequisites
Installation via pip
Step 1: Install Dependencies
Step 2: Install ViralQC
Step 3: Verify Installation
Installation from Source Code
Step 1: Clone the Repository
Step 2: Create Conda Environment
Step 3: Install ViralQC
Step 4: Verify Installation
Docker
Step 1: Clone the Repository
Step 2: Build the Docker Image
Step 3: Verify Installation
Step 4: Run viralQC
Dataset Configuration
datasets.yml File Structure
Nextclade Datasets
GitHub Datasets
Configuration Parameters
Scoring Logic
Why a Custom System?
Quality Metrics
1. missingDataQuality - Missing Data Quality
2. privateMutationsQuality - Private Mutations Quality
3. mixedSitesQuality - Mixed Sites Quality
4. snpClustersQuality - SNP Clusters Quality
5. frameShiftsQuality - Frameshifts Quality
6. stopCodonsQuality - Stop Codons Quality
genomeQuality Calculation
Target Region Extraction
Commands and Usage
get-nextclade-datasets
Usage
Parameters
Output Structure
get-blast-database
Usage
Parameters
Release Date Filtering
Database Version
Output Structure
run
Usage
Required Parameters
Output Parameters
Dataset Parameters
Nextclade Sort Parameters
BLAST Parameters
BLAST Task Types
System Parameters
Complete Example
Analysis Workflow
API
Usage
Attributes and Methods
Preparing NCBI Submissions
Grouping by Virus (
virus
)
Supported Viruses
SARS-CoV-2
Dengue
Influenza
Norovirus
Custom Viruses
Grouping by Sample (
sample
)
Metadata CSV
Input Columns
Required Columns
Optional Columns — Standard Viruses (Dengue, Influenza, Norovirus, SARS-CoV-2)
Optional Columns — Custom Viruses
Output Format
FASTA Headers and Annotations
Batch Splitting
Python API
Installation
Constructor
Methods
run_virus(virus="all",
virus_name=None)
run_sample(samples=["all"])
Return Value
Examples
Process all viruses found in the results file
Process only specific sample IDs
Process a custom virus with segment splitting
How to Add New Datasets
Adding a Nextclade Dataset
Step 1: Identify the Dataset
Step 2: Edit datasets.yml
Step 3: Obtain Taxonomic Information
Step 4: Test
Adding a GitHub Dataset
Step 1: Prepare Repository
Step 2: Add to datasets.yml
Step 3: Test
Output Structure
Main File: results.tsv (or .csv, .json)
1. Sequence Identification
2. Quality Metrics (ViralQC A-D System)
3. Quality Metrics (Nextclade)
4. Coverage and Regions
5. Nucleotide Mutations
6. Amino Acid Mutations
7. Private Mutations (Detailed)
Target Regions Files
sequences_target_regions.bed
sequences_target_regions.fasta
Annotation Files
gff_files/
gff_files/per_sample/
tbl_files/
tbl_files/per_sample/
Practical Examples
Example 1: Basic Dengue Analysis
Example 2: Influenza (Multiple Segments)
Example 3: Metagenomic Analysis
Example 4: Reproducible Database
Example 5: JSON Output
Example 6: Sensitive BLAST Search
Troubleshooting
Problem 1:
vqc
command not found
Problem 2: Error downloading datasets
Problem 3: BLAST database not found
Problem 4: No sequences mapped
Problem 5: Memory errors
Problem 6: Permission denied
Problem 7: GitHub datasets not downloaded
Getting Help
Developer Guide
Dataset Management
get_github_dataset.py
jsonl_to_gff.py
get_minimizer_index.py
Analysis Pipeline
format_nextclade_sort.py
blast_wrapper.py
reorder_cds.py
post_process_nextclade.py
extract_target_regions.py
ViralQC
Index
Index