Scoring Logic

ViralQC implements its own quality scoring system that is more flexible than Nextclade’s standard system.

Why a Custom System?

While Nextclade uses three categories (good, mediocre, bad) based on fixed scores, ViralQC offers four levels (A, B, C, D):

  • A - Sequences meeting all quality criteria

  • B - Sequences suitable for most analyses

  • C - Use with caution

  • D - Use not recommended

Quality Metrics

ViralQC calculates scores for 6 metrics:

1. missingDataQuality - Missing Data Quality

Based on genome coverage:

Coverage

Score

≥ 90%

A

≥ 75%

B

≥ 50%

C

< 50%

D

2. privateMutationsQuality - Private Mutations Quality

Based on the number of private mutations relative to a configurable threshold:

Private Mutations

Score

≤ threshold

A

≤ threshold × 1.05

B

≤ threshold × 1.10

C

> threshold × 1.10

D

Default threshold: 10 mutations (configurable per virus via private_mutation_total_threshold)

3. mixedSitesQuality - Mixed Sites Quality

Mixed Sites

Score

0

A

1

B

2

C

≥ 3

D

4. snpClustersQuality - SNP Clusters Quality

SNPs Clustered

Score

0

A

1

B

2

C

≥ 3

D

5. frameShiftsQuality - Frameshifts Quality

Frameshifts

Score

0

A

1

B

2

C

≥ 3

D

6. stopCodonsQuality - Stop Codons Quality

Stop Codons

Score

0

A

1

B

2

C

≥ 3

D

genomeQuality Calculation

The overall genome score combines all 6 metrics:

  1. Conversion: A=4, B=3, C=2, D=1 points

  2. Normalization: score = (total / 24) × 24

  3. Classification:

Score

genomeQuality

= 24

A

≥ 18

B

≥ 12

C

< 12

D

Target Region Extraction

ViralQC extracts sequences based on quality hierarchy:

        graph TD
    A[Analyze Sequence] --> B{genomeQuality = A or B?}
    B -->|Yes| C[Extract COMPLETE GENOME]
    B -->|No| D{targetRegionsQuality = A or B?}
    D -->|Yes| E[Extract TARGET REGIONS]
    D -->|No| F{targetGeneQuality = A or B?}
    F -->|Yes| G[Extract TARGET GENE]
    F -->|No| H[DO NOT EXTRACT]