Structural Variants the Short Read Missed

Zetobit Bioinformatics Insight Series · Variant Calling · May 2026

Research Explainer

Structural Variants the Short Read Missed:
Long-Read SV Calling with Sniffles2, SVIM, and PBSV

A practitioner's guide to the algorithms, benchmarks, and clinical implications of calling insertions, deletions, inversions, and translocations from PacBio HiFi and Oxford Nanopore reads.

Zetobit LLC · Lexington, KY

For two decades, short-read sequencing built its reputation on a single trick: piling tens of millions of 150-base-pair reads onto a reference genome and counting the discordances. For single-nucleotide variants and small indels, the trick works beautifully. For structural variants — the insertions, deletions, inversions, duplications, and translocations longer than 50 base pairs that account for more of the genomic difference between any two humans than all SNVs combined — short reads have always been the wrong tool. Roughly 70 percent of structural variants in a human genome are systematically missed by short-read callers, and the ones that are called are often misplaced, misclassified, or detected only because they happened to fall in a region kind enough to lack repeats. Long-read sequencing has not just incrementally improved this picture; it has rewritten it. A modern PacBio HiFi or Oxford Nanopore run, paired with a long-read-native SV caller like Sniffles2, SVIM, or PBSV, will recover an order of magnitude more structural variants than the best short-read pipeline — and will do so with breakpoint precision that finally makes structural variation a clinically actionable category.

SV DETECTION SENSITIVITY BY VARIANT SIZE Short-read vs. long-read callers · simulated 30× human genome benchmark 0% 25% 50% 75% 100% Recall (sensitivity) Short-read blind zone (repeat-mediated SVs > 300 bp) 50–100 100–300 300–1k 1k–10k 10k–100k >100k SV size (base pairs) Short-read caller (Manta / DELLY) Long-read caller (Sniffles2 / SVIM)
Figure 1. Recall by SV size class. Short-read callers maintain reasonable sensitivity only for SVs under ~300 bp; beyond that, repeat-mediated breakpoints flanked by transposable elements or segmental duplications collapse the signal. Long-read callers maintain >90% recall across all size classes up to 100 kb, declining only for the largest translocations where coverage of both breakpoint flanks is required.

Why Short Reads Fail at Structural Variants

The failure mode is geometric. A 150 bp Illumina read cannot span anything longer than itself, so structural variants are detected indirectly through one of four signals: discordant read pairs, where the two mates of a paired-end fragment map farther apart or in the wrong orientation than expected; split reads, where a single read maps partially to two distant locations; abnormal read depth, where coverage rises or falls in a way consistent with a duplication or deletion; and assembly-based local reconstruction. Every one of these signals depends on the SV breakpoints being anchored in unique, mappable sequence. The moment a breakpoint falls inside a transposable element, a segmental duplication, a tandem repeat array, or a low-complexity region — which describes the majority of biologically interesting breakpoints, because SVs are caused by misalignment of repetitive sequence during DNA repair — the short-read signal collapses.

The consequences compound. Discordant pairs become ambiguous because both mates map to multiple repeat copies. Split reads disappear because the breakpoint-spanning fragment cannot be uniquely placed. Depth-based callers cannot distinguish a true tandem duplication from a mapping artifact in a high-copy region. The result is a sensitivity ceiling that short-read pipelines have failed to break through for fifteen years, no matter how clever the algorithm. Benchmarks from the Genome in a Bottle consortium and from the Human Genome Structural Variation Consortium consistently show short-read recall in the 30–45% range for the full SV spectrum of a single human genome, with precision dropping sharply for variants larger than a few kilobases.[1]

The size distribution problem

A typical human genome contains roughly 25,000 structural variants relative to the reference. The size distribution is heavily skewed toward the short end — most SVs are insertions or deletions between 50 bp and 1 kb — but the medical impact is skewed toward the long end, where single events can ablate entire exons, regulatory elements, or genes. Short-read callers do reasonably well on the most numerous SVs and terribly on the most consequential ones. Long reads invert this trade-off: they span the breakpoints directly, so an insertion is just a soft-clipped or internally aligned segment, a deletion is a gap in the alignment, an inversion is a reversed segment, and a translocation is a read with two distinct chromosomal anchor points. No statistical inference from indirect signals is required.

~25,000SVs per human genome
70%Missed by short reads
15–25 kbHiFi read length

The Three Callers That Define the Field

Three long-read SV callers have reached production maturity and account for the overwhelming majority of long-read SV calls being made today: Sniffles2, the successor to the original Sniffles, optimized for both PacBio HiFi and Oxford Nanopore data; SVIM, a graph-based caller with strong performance on noisy long reads; and PBSV, PacBio's official caller designed to integrate with the SMRT Link analysis suite. Each takes a different algorithmic approach, and the differences matter for choosing the right tool for a given dataset.

Sniffles2: signature clustering at scale

Sniffles2 builds an internal representation of every alignment signature in the input BAM file — soft-clipped read ends, large insertions within alignments, large deletions, split alignments — and clusters these signatures by genomic position and SV type. A cluster of sufficient size and quality becomes a candidate SV call. The version 2 release introduced two transformative features: population-scale joint calling through a Sniffles-specific intermediate format (SNF), and substantial improvements in the handling of mosaic and low-frequency variants. Sniffles2 can now process tens of thousands of samples through a single merge step, making it the dominant caller for cohort and population studies. Its default parameters are tuned for germline diploid genomes at 20–30× coverage, but parameter overrides for somatic, mosaic, or low-coverage applications are well-documented.[2]

SVIM: graph-based candidate refinement

SVIM (Structural Variant Identification using Mapped long reads) takes a more elaborate three-stage approach. It first collects raw signatures from alignments, then clusters them in a position–type graph that allows complex events — for example, a deletion-inversion-deletion compound rearrangement — to be reconstructed as a single composite call rather than as three independent simple calls. The third stage refines breakpoint coordinates using consensus from supporting reads. SVIM tends to outperform Sniffles2 on noisy ONT data and on complex rearrangements, but is slower and has lighter cohort-scale tooling. For clinical interpretation of a small number of probands where breakpoint precision matters more than throughput, SVIM is often the better choice.[3]

PBSV: the PacBio-native caller

PBSV is PacBio's reference implementation, distributed as part of the bioconda pbsv package and integrated with PacBio's SMRT Link platform. It uses a two-step model: a discover step generates per-sample signature files (.svsig.gz), and a call step merges these signature files across samples or jointly genotypes a single sample. PBSV is conservative by default — it tends to call fewer SVs than Sniffles2 with slightly higher precision — and is the de facto choice for clinical PacBio HiFi pipelines because of its tight integration with PacBio's officially supported analysis stack. It does not support ONT data; for ONT-only or hybrid workflows, Sniffles2 or SVIM are the only practical options.[4]

Calling a structural variant from a long read is no longer an inference from indirect signals. It is a direct observation of the rearrangement in a single molecule.

Table 1 — Long-read SV callers at a glance

Caller Algorithm HiFi ONT Joint calling Best for
Sniffles2 Signature clustering Yes Yes Yes (SNF format, scales to ≥10k samples) Population and cohort studies; mosaic detection
SVIM Graph-based clustering + refinement Yes Yes Limited (per-sample primary) Complex rearrangements; noisy ONT data
PBSV Discover + call two-step Yes No Yes (svsig merging) Clinical PacBio HiFi; SMRT Link integration
CuteSV Signature clustering + clustering refinement Yes Yes Limited High-sensitivity discovery; ONT ultra-long
Delly-LR (experimental) Hybrid SR/LR signature integration Yes Yes Yes Hybrid short + long read workflows

The Canonical Long-Read SV Calling Pipeline

The technical workflow from raw reads to a filtered SV callset has converged on a relatively stable canonical form across PacBio HiFi and Oxford Nanopore platforms. Five stages dominate, and each has well-established tooling.

CANONICAL LONG-READ SV CALLING PIPELINE HiFi or ONT · single-sample or cohort Step 1 Alignment minimap2 / pbmm2 / NGMLR Step 2 QC + filtering samtools mosdepth Step 3 SV calling Sniffles2 / SVIM / PBSV / CuteSV Step 4 Merging + genotyping SURVIVOR / Jasmine Step 5 Annotation + QC AnnotSV / Truvari QC
Figure 2. Canonical long-read structural variant calling pipeline. Each step has established tooling; the orange star denotes that quality control — specifically, benchmarking the callset against a truth set such as GIAB HG002 with Truvari — is not optional in clinical or production settings.

Step 1: Alignment

The aligner choice matters more than the SV caller choice in many cases. minimap2 with the map-hifi or map-ont presets is the default for nearly all long-read SV work; it is fast, accurate, and produces alignments that all three major callers consume natively. pbmm2 is PacBio's minimap2 wrapper with HiFi-tuned parameters, required upstream of PBSV. NGMLR was the original aligner that Sniffles was developed against and remains worth using when breakpoint sensitivity around tandem repeats is critical; its repeat-aware scoring model produces slightly cleaner split alignments at the cost of substantially longer runtime. For most production pipelines today, minimap2 is the right default.

Step 2: QC and filtering

Coverage uniformity is the single best predictor of SV callset quality. mosdepth with 1 kb windows will reveal coverage anomalies — drops over centromeres and acrocentric short arms are expected, but unexpected drops elsewhere signal library or alignment issues that will translate into false-positive deletions. Mean coverage should hit at least 20× for confident germline SV calling, with 30× preferred for HiFi and 40–50× preferred for ONT to compensate for higher base error rates. Filtering should remove unmapped reads, secondary alignments, and reads below a minimum mapping quality (typically MAPQ ≥ 20 for SV-supporting reads, though Sniffles2 will use lower-MAPQ reads as supplementary evidence).

Step 3: SV calling

This is the step everyone focuses on and the step that, in practice, matters least once the preceding two are done correctly. Sniffles2 with default parameters on a well-aligned 30× HiFi BAM will recover >95% of SVs that any caller in the field is capable of finding. The differences between callers show up at the margins — at low coverage, on noisy reads, in complex rearrangements, and in the precision of breakpoint coordinates — and those margins matter most in clinical interpretation. Running two callers and intersecting (or unioning) their outputs is increasingly standard for clinical workflows.

Step 4: Merging and genotyping

For cohort analyses, per-sample callsets must be merged into a unified VCF where each SV is represented once and each sample carries a genotype at every site. SURVIVOR and Jasmine are the two dominant merging tools; Jasmine uses a more sophisticated graph-based merging strategy that handles breakpoint imprecision better, while SURVIVOR remains popular for its speed and simplicity. Sniffles2's native joint-calling mode through SNF files bypasses the merging step entirely and is the recommended path for new cohort projects.

Step 5: Annotation and QC

A raw SV VCF is biologically uninterpretable. AnnotSV overlays gene models, ClinVar pathogenicity, regulatory elements, frequency from population databases (gnomAD-SV), and dosage sensitivity scores onto each call, transforming a coordinate list into a ranked clinical report. Truvari benchmarks the callset against a truth set — most commonly the GIAB HG002 SV truth set — producing precision and recall metrics that anchor the callset's quality.[5] No clinical or production callset should ship without a Truvari report against an appropriate truth set.

Benchmarks: What "Good" Looks Like

The Genome in a Bottle (GIAB) HG002 sample is the de facto benchmark for human long-read SV calling. Its v0.6 SV truth set, derived from a combination of PacBio HiFi, ONT ultra-long, and Bionano optical mapping data, contains roughly 12,700 high-confidence SVs and has become the reference against which every new caller release is measured. The leaderboard has been remarkably stable: Sniffles2, SVIM, PBSV, and CuteSV all cluster between 92% and 96% F1 score on the HG002 high-confidence regions at 30× HiFi coverage, with single-percentage-point differences depending on parameter settings.[6]

Table 2 — Representative performance on GIAB HG002 (30× HiFi, high-confidence regions)

Caller Precision Recall F1 Runtime (CPU-hr) Peak RAM
Sniffles2 0.95 0.94 0.945 ~2.5 ~8 GB
SVIM 0.93 0.95 0.940 ~4 ~12 GB
PBSV 0.96 0.92 0.940 ~3 ~10 GB
CuteSV 0.94 0.95 0.945 ~2 ~6 GB

The much more telling benchmark is what happens outside the high-confidence regions. The GIAB high-confidence intervals exclude centromeres, the acrocentric short arms, segmental duplications, and most of the recombination-prone regions where short-read callers most spectacularly fail — and where most clinically interesting SVs occur. Performance in these "challenging medical genes" regions, defined by a separate GIAB stratification, is several percentage points lower across all callers, and the gap between long-read and short-read callers in these regions exceeds 50 percentage points of F1 score. This is where long-read SV calling earns its clinical case.

Clinical Implications: What Long-Read SV Calling Unlocks

The clinical genomics community spent two decades building infrastructure — variant databases, ACMG guidelines, reporting pipelines — around the SNV/small-indel model of disease genomics. Structural variants were acknowledged as important but treated as a secondary category, called by separate tools, interpreted with separate workflows, and reported with substantially higher uncertainty. Long-read SV calling is dissolving this distinction. When a single sequencing run produces SNVs, small indels, large indels, inversions, duplications, translocations, and tandem repeat expansions in a unified workflow with comparable confidence across all categories, the case for a unified clinical variant pipeline becomes overwhelming.

Three application areas are seeing the most immediate impact. In constitutional rare disease, undiagnosed cases negative on exome and short-read genome sequencing are being re-analyzed with long-read whole-genome sequencing, with reported diagnostic yields of 10–15% on the residual cohort — almost entirely from SVs that the short-read pipeline missed. In repeat expansion disorders, ONT and HiFi can directly measure the size of CAG, CGG, GAA, and CTG repeat tracts that confound short-read genotyping (Huntington's disease, fragile X, Friedreich ataxia, myotonic dystrophy) — a use case where short reads are not just less sensitive but mechanistically incapable of measuring the relevant feature. In cancer genomics, complex rearrangements that drive tumor biology — chromothripsis events, fusion genes with breakpoints in repetitive elements, focal amplifications — are detectable with breakpoint precision that short-read tumor sequencing has never achieved.[7]

When a single sequencing run resolves SNVs, indels, SVs, and repeat expansions with comparable confidence, the case for a unified clinical pipeline becomes overwhelming.

Operational Considerations for Production Pipelines

Moving long-read SV calling from a research notebook to a CAP/CLIA-compliant clinical pipeline requires attention to several factors that the academic benchmark literature tends to gloss over.

Truth-set scope. The GIAB HG002 truth set is excellent but covers approximately 89% of the genome at high confidence. Validation against samples and regions outside this scope requires careful curation, orthogonal confirmation (Bionano, optical mapping, targeted long-range PCR), or acceptance of higher uncertainty bounds. A clinical pipeline's validation plan should explicitly enumerate which genomic regions are within the high-confidence validation envelope and which are reported with reduced confidence.

Reproducibility across chemistries. SV callsets are sensitive to the underlying sequencing chemistry: HiFi reads from a Revio run will not produce numerically identical callsets to HiFi reads from a Sequel IIe run, even on the same sample, because subtle differences in error profiles and read-length distributions propagate into the alignment and clustering steps. Production pipelines should be locked to a specific instrument, chemistry, and software version combination, with explicit re-validation when any of these change.

Cohort consistency. For cohort studies and reference panels, joint calling through Sniffles2's SNF mechanism (or its equivalents) is strongly preferred over per-sample calling followed by merging. Independent per-sample calls produce inconsistent breakpoint coordinates for what is biologically the same variant, and merging tools, however sophisticated, cannot fully recover the resolution lost in per-sample calling.

Compute infrastructure. Long-read SV calling itself is computationally modest — Sniffles2 will process a 30× human BAM in under 3 hours on 16 cores — but the alignment step that precedes it is not. minimap2 on a 30× HiFi run requires roughly 8–12 CPU-hours and 30+ GB of RAM. Production deployment on AWS or equivalent cloud infrastructure typically uses memory-optimized instances (r5/r6i families) for alignment and standard compute instances for calling. Spot pricing makes this tractable at scale; reserved instances are appropriate for steady-state clinical workloads.

What Comes Next

Two developments are likely to define the next chapter of long-read SV calling. The first is pangenome-aware SV calling: rather than calling SVs as differences from a single linear reference, callers will increasingly operate against a graph-structured pangenome (the Human Pangenome Reference Consortium's HPRC build is the leading example), where common SVs are encoded as alternative paths through the graph and a "call" becomes a path-selection problem. Early implementations — vg, minigraph-cactus, PanGenie — are showing substantial improvements in both sensitivity and computational efficiency, particularly for the medically actionable regions that have historically been most refractory to linear-reference approaches.

The second is the integration of methylation calling into the SV workflow. Both PacBio HiFi and Oxford Nanopore reads carry direct methylation information without requiring bisulfite conversion, and recent caller releases are beginning to report methylation status across SV breakpoints. For imprinting disorders, repeat expansions with methylation-sensitive pathogenicity (fragile X being the canonical example), and cancer biology where structural variants alter the methylation landscape of regulatory elements, this dual readout in a single assay is genuinely new biology.

Conclusion

The gap between what short-read sequencing can see and what is actually present in a genome has been the single largest unsolved problem in clinical genomics for the better part of two decades. Long-read sequencing closes that gap, and long-read SV callers — Sniffles2, SVIM, PBSV, and their fast-evolving cohort — are the production tooling that turns the closed gap into actionable variant reports. The technology is mature, the benchmarks are stable, the operational patterns are well-understood, and the clinical use cases are accumulating faster than the literature can document them. For bioinformatics teams building the next generation of clinical NGS pipelines, the question is no longer whether long-read SV calling belongs in the workflow. The question is which caller, which truth set, and which validation envelope — and those are exactly the questions a well-scoped bioinformatics engagement is built to answer.

References

  1. Mahmoud M. et al. (2019). Structural variant calling: the long and the short of it. Genome Biology, 20, 246. doi:10.1186/s13059-019-1828-7
  2. Smolka M. et al. (2024). Detection of mosaic and population-level structural variants with Sniffles2. Nature Biotechnology, 42, 1571–1580. doi:10.1038/s41587-023-02024-y
  3. Heller D. & Vingron M. (2019). SVIM: structural variant identification using mapped long reads. Bioinformatics, 35(17), 2907–2915. doi:10.1093/bioinformatics/btz041
  4. Pacific Biosciences. PBSV documentation and source. github.com/PacificBiosciences/pbsv
  5. English A.C. et al. (2022). Truvari: refined structural variant comparison preserves allelic diversity. Genome Biology, 23, 271. doi:10.1186/s13059-022-02840-6
  6. Zook J.M. et al. (2020). A robust benchmark for detection of germline large deletions and insertions. Nature Biotechnology, 38, 1347–1355. doi:10.1038/s41587-020-0538-8
  7. Mastrorosa F.K. et al. (2023). Applications of long-read sequencing to Mendelian genetics. Genome Medicine, 15, 42. doi:10.1186/s13073-023-01194-3
  8. Liao W.-W. et al. (2023). A draft human pangenome reference. Nature, 617, 312–324. doi:10.1038/s41586-023-05896-x
Next
Next

Cracking Open the Green Genome: A Field Guide to Plant Genome Assembly