After 10+ years of mostly silence, Roche is finally starting to lift the curtain on their nanopore sequencer. It’s based on a combination of Genia’s nanopore platform (acquired in 2014) and Stratos Genomics’ Xpandomer chemistry (acquired in 2020). (See my previous post for some more background.) While Roche didn’t reveal all of the details at their webinar today, like the critical questions of cost and launch schedule, they revealed enough to give a pretty clear picture of what they’re working on: a fast, high throughput, accurate, and versatile DNA sequencing platform. And they’ve backed it up with data.
The two big questions people are asking are “What can it do?” and “How ‘real’ is it?”
What can it do?
As Roche promised in the webinar announcement, it can generate 15B “high quality” reads in a four hour sequencing run. More specifically, reads of up to 300b with ~Q39 quality scores. This makes it look like a very fast NovaSeq X sequencer. So is Roche playing any “tricks” here? Yes and no. These aren’t quite raw reads, but it’s not really a trick. There really are 15B Q39 300b reads generated in four hours. It’s just that these are duplex reads, meaning that both strands of the target DNA are converted into the Xpandomer molecule and read through the nanopore. So, for a 300b read, there is a 600b molecule being read. Data from the two strands (attached via a hairpin structure) are used to generate a sort of intramolecular consensus read. Not all of the Xpandomers are full duplex. The partial duplex molecules, while they don’t count in terms of coverage, throughput, or accuracy, help improve the mapping.
And where there’s a duplex, there’s a simplex. If you’re more interested in longer reads and higher throughputs and less concerned about read quality, the simplex prep (which lacks the hairpin) generates 30B Q20 reads in the 50b-200b range, perfect for single cell transcriptomics. And if you’re interested in splice isoforms, you can get reads up to 1500b (although the longer the reads you’re shooting for, the fewer you get - there’s no free lunch).
How “real” is it?
Roche showed data from two early access partners, Hartwig Medical Foundation and The Broad. Hartwig, which focuses on WGS-based cancer diagnostics, ran a tumor/normal study and compared the results with data generated on Illumina sequencers. The error rates for SNVs and INDELs looked remarkably similar, with SNVs slightly outperforming Illumina. Scientists at The Broad ran both WGS samples as well as a Perturb-Seq experiment with over 30B reads generated in four hours. Both groups will be presenting their data in more detail at the Roche AGBT workshop. (If anyone is going, please grab some swag for me!). A natural question is whether the instruments were placed in the field at customer sites or if samples were shipped to Roche. At least for Hartwig, the pictures indicate that the instruments were installed and running at the customer site.
Roche Sequencer "PT-017" - prototype #17?
A very "prototype" looking SBX Synthesizer
There was a LOT of info shared - over 90 slides in 90 minutes. There’s only so much I can cover here, but I’ve included some of the details I found particularly interesting.
SBX Technology
“Sequencing by eXpansion”, or SBX, creates a molecule 50X the size of the target DNA. The Xpandomer is created from highly modified nucleotides with a large sidechain loop consisting of PEG reporters and other elements, including a “translocation element” which pauses the molecule in a stepwise fashion as it’s being passed through the nanopore. The polymerase that’s able to incorporate these monster nucleotides is based on Dpo4, which has been described as “poorly processive and error prone”. Fret not, for this is a highly modified version with 10% of the amino acids having been mutated, leading to a mean raw accuracy of 99.3% and the ability to generate molecules greater than 1000 bases.
This SBX magic happens on a separate “synthesis instrument” (which doesn’t appear to have a polished product name yet, but at least they’ve wisely omitted the internal code name - once those start getting used by the public they never seem to go away). It’s capable of generating four Xpandomer pools in under four hours. While it might have been nice to have this capability included on the sequencer itself, separating it out will likely allow for a more efficient workflow - a single synthesis machine will likely be able to feed multiple sequencers. And, who knows, maybe future iterations/versions will see an all-in-one box (kind of how Illumina moved away from the cBot to having on-instrument bridge amplification).
Read Length
Unlike Illumina sequencers, the Roche sequencer doesn’t have discrete read lengths. The duplex molecules generally fall in a range of 150b to 350b, and there will be a distribution of lengths. In one of the experiments they showed what the range looked like - a distribution around the mean with a long-ish tail of longer reads. [show graph] The lower quality simplex reads have a much larger range, from 50b up to 1500b, presumably with the ability to dial in the length needed for the particular applications.
Accuracy
This was one of the big worries/unknowns about Roche’s platform. How good was “highly accurate” going to be? While Roche won’t be winning any “highest Q score” awards, accuracies around Q39 are likely MUCH higher than many were expecting. And as it doesn’t look like there are any major systematic errors (e.g., homopolymers), there shouldn’t be any applications that the platform is incapable of working with. It should fit right in with the rest of the sequencers on the market.
One nice feature of Roche’s error profile is that accuracy is independent of insert/read length or base position - the estimated Q score holds steady along the entire length of the read for both simplex and duplex reads. This is due to the single molecule nature of the platform. Also, homopolymers, at least up to 15b, really don’t seem to be an issue.
Roche sequencer Product/Market fit
Before trying to determine how Roche’s sequencer will fit in the market, it’s important to really understand how unusual it is and how it differs from every other platform out there. First, the sequencing chemistry (SBX) is completely decoupled from the detection - they don’t even happen on the same instrument. It’s a “single molecule” sequencer, but it doesn’t technically sequence DNA. It sequences Xpandomer molecules that are mostly PEG. It’s a nanopore sequencer, but unlike THE nanopore sequencer from Oxford Nanopore that everyone is familiar with, it does NOT do long reads. (That said, are 1500b reads “short”? They used to be considered “long” when PacBio first came out, but now “long” reads are usually >10kb and sometimes in the millions of bases. We might need to start using a new term. Mid? Medium?)
In addition to being a really fast, really high throughput sequencer that’s probably causing some people at Illumina, Ultima, Element, and Singular to update their resumes, it’s also extremely versatile. The user can decide to trade off read quality with read length and throughput. It has a “fast mode” that forgoes the linear amplification step (at the expense of needing to start with 2ug) to achieve start to finish runs (from library prep through VCF) in under 6.5 hours. It also uses a “sensor module” (which we don’t dare call a flow cell) that is… are you ready for it? …reusable!
The runs are stated to last from four minutes to over four hours. Roche hasn’t talked about pricing yet (apart from a very general guideline to expect prices in the range of current mid-throughput to high-throughput instruments), but a reusable “consumable” with variable run times/outputs sounds like it could lead to some very interesting and creative pricing models. And all this flexibility sounds like it might be leading to a platform with uniform(-ish) pricing across low to high throughputs. Might this be the first sequencing platform to break the inverse relationship between throughput and sample cost? Getting “NovaSeq X” pricing from a quick run with just a few samples would be really nice.
Final thoughts
This all sounds great, but it is Roche we’re talking about. They don’t exactly have the best track record when it comes to sequencing platforms. To quote a former US president from simpler times: “Fool me once, shame on you. Fool me… you can’t get fooled again!” They’ve fired a warning shot, but now they need to deliver. Of course, this isn’t a startup screaming for attention. It’s Roche, with a reputation that can’t be risked. But the danger is the competition won’t be standing still. We’re naturally comparing what Roche is presenting, a product still being developed, with what other companies have already launched. But it might take Roche another year, give or take, to launch their sequencer - they've officially stated early access in 2025 and a launch in 2026.. We really need to compare it with what Illumina, Ultima, Element, and others will have released by then. We can only guess what kind of progress they’ll make (although maybe the various AGBT workshops will reveal a bit more).