ViralQC
Installation
Prerequisites
Installation via pip
Step 1: Install Dependencies
Step 2: Install ViralQC
Step 3: Verify Installation
Installation from Source Code
Step 1: Clone the Repository
Step 2: Create Conda Environment
Step 3: Install ViralQC
Step 4: Verify Installation
Dataset Configuration
datasets.yml File Structure
Nextclade Datasets
GitHub Datasets
Configuration Parameters
Scoring Logic
Why a Custom System?
Quality Metrics
1. missingDataQuality - Missing Data Quality
2. privateMutationsQuality - Private Mutations Quality
3. mixedSitesQuality - Mixed Sites Quality
4. snpClustersQuality - SNP Clusters Quality
5. frameShiftsQuality - Frameshifts Quality
6. stopCodonsQuality - Stop Codons Quality
genomeQuality Calculation
Target Region Extraction
Commands and Usage
get-nextclade-datasets
Usage
Parameters
Output Structure
get-blast-database
Usage
Parameters
Release Date Filtering
Database Version
Output Structure
run
Usage
Required Parameters
Output Parameters
Dataset Parameters
Nextclade Sort Parameters
BLAST Parameters
BLAST Task Types
System Parameters
Complete Example
Analysis Workflow
API
Usage
Attributes and Methods
How to Add New Datasets
Adding a Nextclade Dataset
Step 1: Identify the Dataset
Step 2: Edit datasets.yml
Step 3: Obtain Taxonomic Information
Step 4: Test
Adding a GitHub Dataset
Step 1: Prepare Repository
Step 2: Add to datasets.yml
Step 3: Test
Output Structure
Main File: results.tsv (or .csv, .json)
1. Sequence Identification
2. Quality Metrics (ViralQC A-D System)
3. Quality Metrics (Nextclade)
4. Coverage and Regions
5. Nucleotide Mutations
6. Amino Acid Mutations
7. Private Mutations (Detailed)
Target Regions Files
sequences_target_regions.bed
sequences_target_regions.fasta
Practical Examples
Example 1: Basic Dengue Analysis
Example 2: Influenza (Multiple Segments)
Example 3: Metagenomic Analysis
Example 4: Reproducible Database
Example 5: JSON Output
Example 6: Sensitive BLAST Search
Troubleshooting
Problem 1:
vqc
command not found
Problem 2: Error downloading datasets
Problem 3: BLAST database not found
Problem 4: No sequences mapped
Problem 5: Memory errors
Problem 6: Permission denied
Problem 7: GitHub datasets not downloaded
Getting Help
Developer Guide
Dataset Management
get_github_dataset.py
jsonl_to_gff.py
get_minimizer_index.py
Analysis Pipeline
format_nextclade_sort.py
blast_wrapper.py
reorder_cds.py
post_process_nextclade.py
extract_target_regions.py
ViralQC
Search
Please activate JavaScript to enable the search functionality.