Why Reduced Representation Sequencing Beats WGS for Cannabis Cultivar Identity

Why Reduced Representation Sequencing Beats WGS for Cannabis Cultivar Identity

Chad Ternes

Whole genome sequencing gets treated as the gold standard for cannabis genetics work, and in certain contexts it genuinely is. But for cultivar identity, relatedness, and IP documentation, it's the wrong tool for the job. Not because it lacks power, but because it has far more than you need, at a cost that makes genomic documentation inaccessible to the breeders who need it most. Reduced representation sequencing was designed specifically for these questions, and the data it produces is more than sufficient to answer them rigorously.

Whole genome sequencing is a powerful approach. The cannabis genome is large, roughly 800 megabases and around 800 million base pairs depending on the variety, and WGS attempts to read all of it. That level of coverage is genuinely necessary for certain applications: assembling a reference genome from scratch, detecting structural variants, characterizing novel variation in an uncharacterized population. When you need the whole picture, WGS is the right call.

But for strain identification and genetic relatedness? Using WGS is like using a cannon to kill a fly.

For these questions, what we need are markers: specific, informative positions in the genome that vary between individuals and can be compared consistently across samples. When the goal is establishing whether two plants are the same cultivar, whether one is derived from another, or whether we can characterize a genetic relationship, markers are the currency. And for that, we don't need the whole genome.

This is exactly what reduced representation sequencing was designed to do. Approaches like RADseq use restriction enzymes to cut the genome at specific, reproducible locations and sequence only those fragments. This generates tens of thousands of SNP markers per individual.

SNPs, or single nucleotide polymorphisms, are positions where individuals differ by a single base pair. They are stable, heritable, and distributed across the genome, residing in both genic and intergenic regions. That last part matters. Intergenic SNPs, those sitting outside of coding regions, are often much closer to neutral. On average they experience weaker selection than sites in protein-coding sequence, making them especially stable identity markers. Restriction enzymes cut where the sequence matches, with no intentional bias toward coding regions and no preference for functional versus non-functional sequences. The result is a highly reproducible sample broadly distributed across the entire genome.

For identity and relatedness applications, tens of thousands of SNPs generated this way is not a compromise on rigor. It is more than sufficient. It's the same type of high-density SNP data that underpins forensic and kinship applications.

That matters beyond the lab. When an NDA gets violated, a licensing agreement gets ignored, or a proprietary cut walks out the door, the question in a legal context is the same one we're asking scientifically: can you prove this plant is yours? DNA fingerprinting built on SNP data is exactly what documentation-backed IP protection is built on.

Reduced representation sequencing also comes with a lower price tag. WGS can reach hundreds to thousands of dollars per sample. Reduced representation brings that cost down to a level that makes genomic documentation genuinely accessible, not just for large commercial operations, but for the independent breeder who deserves the same tools to protect their work.

Back to blog