Output Structure
When you run run-from-fasta, ViralQC creates the following output:
output/
├── identified_datasets/
│ ├── datasets_selected.tsv
│ ├── viruses.tsv
│ ├── viruses.external_datasets.tsv
│ ├── unmapped_sequences.txt
│ └── <virus>/sequences.fa
├── blast_results/
│ ├── unmapped_sequences.blast.tsv
│ └── blast_viruses.list
├── nextclade_results/
│ ├── <virus>.nextclade.tsv
│ └── <accession>.generic.nextclade.tsv
├── gff_files/
│ ├── <virus>.nextclade.gff
│ └── <accession>.generic.nextclade.gff
├── logs/
│ ├── nextclade_sort.log
│ ├── blast.log
│ └── ...
├── results.tsv
├── sequences_target_regions.bed
└── sequences_target_regions.fasta
Main File: results.tsv (or .csv, .json)
This is the file containing consolidated results from all analyses:
1. Sequence Identification
Column |
Type |
Description |
|---|---|---|
|
String |
Sequence name in the input FASTA file |
|
String |
Identified virus name |
|
Integer |
Virus taxonomic ID in NCBI Taxonomy |
|
String |
Viral species name |
|
Integer |
Species taxonomic ID |
|
String |
Genomic segment (e.g., “HA”, “NA”, “Unsegmented”) |
|
String |
Reference genome accession in NCBI |
|
String |
Dataset identifier used |
|
String |
Dataset version/tag |
|
String |
Phylogenetic clade (when available) |
3. Quality Metrics (Nextclade)
Column |
Type |
Description |
|---|---|---|
|
Float |
Nextclade overall quality score |
|
String |
Nextclade quality status (good, mediocre, bad) |
|
Integer |
Total private mutations (Nextclade) |
|
Float |
Private mutations score (Nextclade) |
|
String |
Private mutations status (Nextclade) |
|
Float |
Missing data score (Nextclade) |
|
String |
Missing data status (Nextclade) |
|
Integer |
Total mixed sites (Nextclade) |
|
Float |
Mixed sites score (Nextclade) |
|
String |
Mixed sites status (Nextclade) |
|
Integer |
Total clustered SNPs (Nextclade) |
|
Float |
SNP clusters score (Nextclade) |
|
String |
SNP clusters status (Nextclade) |
|
Integer |
Total frameshifts (Nextclade) |
|
Float |
Frameshifts score (Nextclade) |
|
String |
Frameshifts status (Nextclade) |
|
Integer |
Total stop codons (Nextclade) |
|
Float |
Stop codons score (Nextclade) |
|
String |
Stop codons status (Nextclade) |
4. Coverage and Regions
Column |
Type |
Description |
|---|---|---|
|
Float |
Genome coverage (0.0 to 1.0) |
|
String |
Coverage of each CDS (format: “gene1: 0.98, gene2: 1.0”) |
|
String |
Coverage of target regions defined in |
|
String |
Coverage of target gene defined in |
|
String |
List of target regions (separated by |) |
|
String |
Main target gene name |
5. Nucleotide Mutations
Column |
Type |
Description |
|---|---|---|
|
Integer |
Total nucleotide substitutions |
|
Integer |
Total nucleotide deletions |
|
Integer |
Total nucleotide insertions |
|
Integer |
Total frameshift mutations |
|
Integer |
Total missing nucleotides (N’s or gaps) |
|
Integer |
Total non-ACGTN characters |
|
String |
List of substitutions (format: gene:pos:ref>alt) |
|
String |
List of deletions |
|
String |
List of insertions |
|
String |
List of frameshifts |
|
Float |
Alignment score |
6. Amino Acid Mutations
Column |
Type |
Description |
|---|---|---|
|
Integer |
Total amino acid substitutions |
|
Integer |
Total amino acid deletions |
|
Integer |
Total amino acid insertions |
|
Integer |
Total unknown amino acids |
|
String |
List of amino acid substitutions |
|
String |
List of amino acid deletions |
|
String |
List of amino acid insertions |
7. Private Mutations (Detailed)
Column |
Type |
Description |
|---|---|---|
|
Integer |
Total private substitutions |
|
Integer |
Total known/cataloged private mutations |
|
Integer |
Total uncataloged private mutations |
|
Integer |
Total reversions (mutations that revert to ancestral reference) |
Note on output formats:
TSV/CSV: All columns are strings or numeric values
JSON: Columns like
cdsCoverage,cdsCoverageQuality, andtargetRegionsCoverageare formatted as arrays of objects for easier programmatic parsing
Target Regions Files
sequences_target_regions.bed
seq1 94 2419 C,prM,E
seq2 0 10735 genome
sequences_target_regions.fasta
Extracted sequences from regions meeting quality criteria.