Edit: Since our site seems to be overwhelmed at the moment, here's a recap:
We’ve been working hard at BillionToOne on a new COVID-19 test that scales testing to everyone in the US. Our test (1) re-purposes existing infrastructure, (2) eliminates time-consuming RNA extraction, and (3) enables a distributed system for COVID-19 testing.
We need 1 million tests per day to end the stay-at-home orders. Schools are still open in Iceland because they test 15x more than the US does, per capita (https://www.washingtonpost.com/world/2020/04/02/free-coronav...).
The first thing we figured out is how to run COVID-19 tests on existing automated Sanger sequencers. One sequencer can process up to 3840 samples per day. There are hundreds of sequencers of excess capacity because they were built for the Human Genome Project over 20 years ago.
It would take only 2 sequencers to surpass the current test capacity for all of California. There are far more than 2 sequencers in California (some individual labs have 10 or more).
We tweaked the protocol so COVID-19 could be detected from sequencing data using linear regression. Basically, we add ~100 copies of a known DNA sequence to help us calculate how much virus nucleic acid is in the specimen. It works just as well as gold-standard RT-qPCR.
Lab workflow for COVID-19 testing is traditionally 1. Specimen accessioning, 2. RNA extraction, 3. RT-qPCR 4. Reporting. RNA extraction, in particular, has been a huge bottleneck in terms of reagent shortages and labor-intensiveness.
We showed that we can skip RNA extraction entirely without affecting test sensitivity and limit of detection.
By skipping RNA extraction and using automated Sanger sequencers, we think we can get to an additional 200,000 samples per day test capacity in existing clinical labs.
A distributed system is often the only way to operate at massive scale. A fully distributed system could have different sites and labs responsible for each process and dynamically re-allocate resources based on availability and capacity.
The Broad institute COVID-19 lab has already started doing this. They are asking for specimens to be submitted in a standardized tube format and pre-barcoded. They have essentially distributed the specimen accessioning work.
Because there is a highly developed service industry for Sanger sequencing with <24 hour turnaround, there is an opportunity to further scale up testing by distributing the work to their (currently) idle sequencers.
Distributed testing could scale from 200k to >1 million tests per day, but would require a change in regulations that currently prohibit it.
Thanks to the BillionToOne team for pulling this work together! Next step is to start manufacturing test kits and obtain Emergency Use Authorization from the FDA. We’re eager to work with clinical Lab Directors and contract kit manufacturers.
Edit 2: Link to scientific manuscript: https://www.dropbox.com/s/07esyehsvfpmllc/A%20Highly%20Scala...
The way currently available COVID-19 testing works is by detection of viral RNA. Since the amount of viral RNA in a patient sample is too low to detect directly, we first need to amplify it by PCR. However, this viral RNA is packaged within all sorts of proteins and lipids that could make it inaccessible to amplification unless they are first purified away. Furthermore, the sample is shipped in "viral transport medium", which is essentially a cocktail of chemicals designed to preserve the virus. Unfortunately, these preservatives often have the side effect of interfering with PCR amplification, so these too need to be purified from the sample.
However, since RNA extraction is usually the most laborious part of the assay, there has been a lot of interest in optimizing the amplification so that it is resilient to all of these impurities. The preprint referenced in our manuscript (https://www.biorxiv.org/content/10.1101/2020.03.20.001008v1) gave us the initial idea that this could be possible, and much of it comes down to the choice of amplification method (e.g. choice of enzymes and buffers) that you choose.
However, even when you choose a "good" enzyme and buffer, you will still suffer an amplification penalty, and this will cause you to return a false-negative on some affected samples because there was so little virus in the sample to begin with. The innovation we have is to spike-in a correspondingly low level of DNA to the reaction mixture. That way, if you see the low level of DNA without seeing any viral signal, you can be assured that the amplification still worked and that there truly is no virus in the sample.
By the way, this robustness is completely expected, as any impurities in VTM would impact spike-in and endogenous viral amplification equally for end-point PCR (so their ratio stays the. same). This is not necessarily true for qPCR where an impurity (caused by lack of RNA extraction) can potentially cause a positive sample look like a negative when the viral RNA does not RT-PCR.
According to an older scientist, Anders Fomsgaard at the Danish Serum Institute, this is how they did it ”in the old days”. He is the father of one of the authors.
This eliminates supply chain problems for reagents and was shared quickly to help in Spain.
Preprint by Fomsgaard and Rosenstierne:
https://www.medrxiv.org/content/10.1101/2020.03.27.20044495v...
In figure 1A, the workflow includes a standard PCR step before Sanger. Workflow-wise, wouldn't it still have the same bottleneck as qPCR test, i.e. limited by 96/384-well instrument runs?
Most clinical laboratories would have 10 to 50 PCR instruments that they can use to run the initial amplification reaction in parallel before Sanger sequencing. Also, Sanger sequencing uses a plate feeder, so you can add new plates on top as the second round of PCR reactions finish.
But, more importantly, the qSanger can by-pass RNA extraction, which seems to be an important bottleneck in the RT-qPCR workflow.
I had a few questions though:
How do you compare this test to the Abbott machines? Obviously that test is faster, but how does that impact what we can do with it?
For 1m/day to be sufficient, do we need contact tracing programs to be able to find everybody who needs to be tested? How hard will it be to scale these programs?
I think that where the Abbott machines might hit a wall is that they are one at a time, and they require Abbott's consumable test cartridge and device to run (think printer ink / printer). I don't have any firsthand knowledge, but I would anticipate that it is difficult to scale-up manufacturing of the devices rapidly enough to keep pace with the pandemic growth.
We absolutely need contact tracing to find everybody who needs to be tested. We're not working on scaling up contact tracing, but I think several people in the tech community are working on making that easier to perform at scale.
What are your plans for licensing the technology?
If they need our bioinformatics automation & help with set-up, we would license the method for COVID-19 testing for $3-$5/sample as part of each sample that is being put through our pipeline.
If they ask for 96-well plates with all reagents that are ready to use (so that they just need to add VTM), we would work with manufacturers to produce the reactions and plates, and the price of kit (~$15 per test) would include limited license to use our automated bioinformatics calling pipeline.
See: https://www.medrxiv.org/content/10.1101/2020.03.26.20039438v...
https://diagnostics.roche.com/us/en/news-listing/2020/roche-...
That said, it is easier to ship the test kits than scale the instrumentation. Both Roche and Abbott still need to build hundreds of their instruments before the kits that they are shipping out this week can be used on the daily rate that they are trying to get to. I am not sure with Roche, but Abbott estimates the end of June to have enough machines shipped to achieve 50K per day capacity on their instruments.
Another potential problem with new instrumentation is that reimbursement for COVID-19 tests is very low ($30-$50), so it becomes financially difficult for hospitals and laboratories to buy very expensive instruments and also pay for test kits that cost $30-$50 per test, on par with reimbursement.
We try to avoid both issues by utilizing a currently unused Sanger install base and low-cost reagents.
The core of our machine learning is Ax=b :grins:
More seriously, the main reason why traditional sanger sequencing can't be used for COVID-19 testing is because it would be unclear whether a lack of signal is truly due to lack of virus, or if it is just because the assay failed (happens all the time!)
What we've done is introduce a reference sequencing signal that is biochemically very similar to viral RNA, but produces a distinct vector of electrical signals that is different from the signals emitted by viral RNA. Since we know what both the reference and viral signals look like, we can perform linear regression analysis to fit the linear combination of viral and reference signals that best match our data.
The machine learning angle isn't the exciting part here, it's all the rest. Great idea!
We are approaching our peak in 3-4 weeks and with current testing capacity totally fucked.
I know many people in Government and on official Covid response team to get the ball rolling if this is a real possibility.
Do they detect antibodies?
What is the false negative rate for nasal swabs vs anal swabs?
“Most qPCR instruments have low throughput (they can run 1- 48 samples at a time) and they all compete for the same reagents.”
Is it more accurate to say that FDA-approved qPCR machines can run up to 48 samples at a time?
Most qPCR machines can handle 96 samples at a time, and there are versions that go as high as 384 at a time.
The FDA has only approved qPCR machines that go up to 48, correct? Do you know why they’re not running on the more parallel machines?
Thanks!
Given that a Sanger sequences does sequence all kinds of mRNA, would you also find other RNA-based viruses like the flu? How much modification would be necessary to diagnose all kinds of RNA viruses?
Thank you for doing something about the pandemic. It's quite heroic.
Can you please comment on the actual realistic turnaround time for the test?
I was part of a volunteer team that tested 3400 people on Friday/Saturday in Santa Clara country for COVID-19 antibodies [1]. It took a team of 100+ volunteers 10 hours / day just to collect samples.
Stanford, for example, has plenty of automated testing capacity, and even reagents. IMHO, the limiting factors are not that we need new tests, but rather we need (1) lighter regulations (2) funding to buy supplies and (3) massive manpower to scale-up drive through testing
[1] https://www.stanforddaily.com/2020/04/04/stanford-researcher...
Sample collection and accessioning (accessioning is unpacking test tubes one by one and aliquoting them into plates in the lab) is definitely going to require a lot of manpower. I'm hopeful that patients "self swabbing" can help alleviate some of the manpower needs. (Self-swabs are not allowed currently under FDA guidance).
Is self-swabbing allowed there because it's for research purpose, not clinical purpose? The program does tell users whether they test positive though.
If it turns out that it's already spread to most everybody lock downs are just a way to create more homelessness. If it turns out that the only people with the antibodies are the ones in the hospitals we need martial law.
We can disagree on whether a lockdown causes the least amount of societal distress compared to some of the other options, we can argue on when we would personally chose to enact lockdown. We could even try and work out what it would take to prevent people losing their homes and dying. $2000/month UBI seems like it might help.
But seriously, doing anything at all is wrong? And the only right response is to do nothing? No ordering more PPE, no preparing for a surge, no rebalancing shifts so the contagion doesn't take out the police force/navy/healthcare workers/etc?
Saying that any possible response is wrong seems like pretending the problem will go away if we pretend it doesn't exist. Which is really hard to do while people are dying.
What they had at the time were models (which predicted asymptomatic cases) and information on how it looks like in countries that did not done measures soon enough. Even as China lied and made their numbers smaller then the were, enough was known publically and even more by secret services.
Turning this around, if we multiply confirmed deaths by 264, that gives us an estimate of how many cases there are. So, for example, with UK's death count of 6159 this means about 2.4% of the population is infected. Furthermore, on the Diamond Princess only 20% of the people onboard caught the virus under poorly quarantined conditions. So, to extrapolate even further, this would imply that over 12% of the UK population has already been exposed to SARS-CoV-2. This means that the UK should peak at about 50k deaths, without any protective measures.
In 2018, the UK had 50k deaths due to flu in excess of normal flu deaths.
https://www.telegraph.co.uk/news/2018/11/30/winter-deaths-hi...
See e.g. https://twitter.com/hsalis/status/1241121806473461760
- Is this test to see if someone currently has it, or if they have the anti-bodies and are (presumably) immune?
- What are the false-positive/false-negative rates? How does this compare to current leading tests?
- What's the cost per test? How does this compare to current leading tests?
This is a test to see if someone has a current COVID-19 infection. The antibody tests (serological) tests are also important, but since it is estimated that only ~1% of the US has previously contracted COVID-19, it will be a while before serological testing becomes useful at a population level.
Our initial data show no false-positives and no false-negatives out of all specimens assayed. However, it is early days still and none of the leading tests have real-world data on false-positive and false-negative rates. The crucial parameter here to compare test performance is limit of detection (LOD). We showed we could detect as few as 10 molecules of virus, which is on par with the best RT-qPCR tests.
Cost is definitely an important consideration for roll-out of a widespread test. We anticipate that the cost will be about $15 per test.
Your test is perfect? Color me skeptical.
Such tests are immediately of very high value because they allow us to understand immunity to Covid-19, and to actually validate the estimates of cases amongst those who haven’t sought treatment.
These are both of huge value regardless of the percentage of the US population estimate to have previously contracted Covid-19.
By all means market your test which seems like an awesome contribution, but please don’t do so by devaluing other important tools.
[edit: the parent post has been edited to be less dismissive without acknowledgement since I made this comment]
[edit: looks like I’m wrong about the post being edited. Sorry for that. I stand by everything else I say here:
Serological tests are useful and needed right now, not at some future stage. It’s not hard to google to verify this, and it’s irresponsible to downplay the value of a test we need now.]
https://docs.google.com/spreadsheets/d/1Y9_ZrMBhhVLg2xQHtBCc...
The next question is, do you need to test the individual samples then? Not sure how much available material for testing you have, but you might be able to just divide the mix to eliminate bigger groups first.
Or even mix parts with new samples. There must be an ideal procedure (throughput-wise) if you know how the range of positives to expect.
https://www.biorxiv.org/content/10.1101/2020.04.03.024216v1....
https://www.medrxiv.org/content/10.1101/2020.04.05.20054445v...
One million a day seems like a lot until you realize it would take about a year to test everyone in the US... How can we increase this to 10m per day? Is that possible?
California CLIA and FDA regulations for COVID-19 testing have been relaxed significantly over the last few weeks, so the above is perhaps not impossible.
Edit, is the idea (for the non-expert):
1) Repurposing idle gene sequencing equipment left over from Human Genome project
2) Reducing / removing the step of having to extract the RNA of the virus as the marker
3) Making these tests / machines available widely across the country so that the delay to getting a result is minimized ?
3) is almost right--these sanger sequencing instruments are already widely available across the country for research use. Here in the Bay Area, I can choose from at least 4 different Sanger sequencing services that will run 10,000 samples at $2/sample in 24 hours. For example, see: https://www.mclab.com/DNA-Sequencing-Services.html
https://web.archive.org/web/20200407224229/https://www.billi...
I guess the trick is to find a loci that is specific enough to covid-19, but can still produce a good frame-shifted oligo that allows you to maximize the resulting chromatogram signal.
I am surprised that the detection limit of a no-extraction PCR can be this sensitive. But it looks like the data checks out. Does this work for every type of sample? I would assume differing buffers, collection methods will influence the PCR?
Don't miss the off-topic collapsed subthread, if you like that kind of thing.
Spreadsheet to play with is here: https://docs.google.com/spreadsheets/d/1IPcX2JYt9mXdaTvj-3mh...
https://hackaday.io/project/27623-coffee-cup-polymerase-chai...
and some synthetic DNA derived from cov2 (seems to cost about $0.09 per base pair)?
There still needs to be tests taken from individuals and conveyed to the labs? many of these tests will be repeats from individuals who might have taken it earlier? etc
We have already generated and put into our scientific manuscript the data that FDA requires for EUA. As a high-complexity CLIA-licensed clinical laboratory, we don't see significant risk in getting the authorization in 2 weeks. However, we need to start working with clinical laboratories across the country today if this is going to scale to hundreds of thousands of tests as soon as we get the EUA. Also, international laboratories don't need EUA and can start using the test immediately. We did not think it would be responsible to hold off on making the technology public when every day is so critical in our response to the current pandemic.
We’re currently in a better situation than a lot of places but it’s by no means without significant changes to daily life. It’s also a lot to do with the early response.
So no, I don't see a reason to say that its politicians are a joke and that Spanish society is selfish and lazy.
This is a problem for anything that draws together lots of people from a large area into the same place - trade shows, sports events, music festivals, cinemas, conventions, busses and trains, food stores, beaches, universities and schools.
We almost certainly haven't hit on the most optimal solution - in my country we closed schools, yet keeping crowded metro trains running? - but it wouldn't have been wise to wait for more data to come in before taking any action.
So yeah and essentially that by keeping contact limited if someone does get sick you only need to quarantine a limited set of people and close a classroom rather than the whole school. I feel like a lot of the success here has been in being aggressive with contact tracing and people able to take the social responsibility to self-quarantine.
With the details described in the manuscript, the onboarding can be only a few days (or even self-onboarding based on instructions). The only part that is somewhat tricky is the bioinformatics algorithm for positive vs. negative calls, which we can supply.
This is wrong. What's the basis for this comment/idea?
I have founded and run multiple companies. I have worked for small, medium and large (8,000 employee) companies in various capacities. In a 30+ year career I could not name even one company that I would say fits your assertion at all. I am sure they exist. Yet, I am also sure they are an insignificant percentage of the total.
There's a key difference between private enterprise and government: Failure kills.
Again, except for corner cases or companies that exist due to government money (I don't include these in "private enterprise") entrepreneurship, private industry, capitalism if you will, is a survival of the fittest contest. Like it or not, this is reality.
Government operates with completely different metrics. Failure, for the most part, has no real consequences. A simple example of this is the "high speed" train project here in CA. I've lost track at this point. I think the last time I looked they were at $100BN and the whole thing is a massive smelly pile of manure. Nobody, except for taxpayers, will pay for the consequences of that failure...sadly, one of many at the hands of government.
This crisis is highlighting just how messed up things are under government control. Last night my son, who is in university pursuing a degree in CS, said "Dad, do you know COBOL?". When I asked why, he said New Jersey is trying to hire a bunch of COBOL programmers because their payment systems are badly broken, the code is done in COBOL and it is in need of fixing so they can pay people the aid funds the federal government is providing. I mean, this is typical, sad and ridiculous.
No, private enterprise is a universe away from almost anything government touches. The only service I can identify in government that lives (or dies) by similar metrics is the military. They have real and non-trivial consequences for incompetence. Death. And that means they can't run like a state payment system that's 40 years old and grossly outdated. I am sure the military have issues as well, yet, for the most part, given their mission, could not survive in the long run if they did not operate at a certain level of competency.
Nothing to see here, just trying to make a quick buck using the pandemic
It's not a good first impression (especially promising scale) not be to able to see it.
If you can't be bothered to take some basic steps for web site scaling (google page speed + a CDN) then the idea that you will carefully and successfully navigate to a 1 million tests per day moderate to high complexity medical test seems like it will be a stretch no?
That said, a 100% support MUCh higher test capacity, both this and IGg IGm (which should allow for home testing). So good luck!
The website looks terribly designed from a scaling standpoint. Have you run even a basic google pagespeed on it? Why no cache TTL on the custom TrueType font? Why no CDN backing?
This is a bit random, but if you have a "proud" web developer who "knows best" find someone with no pride who can get this sort of thing done.
We believe google can scale in part because they seem to be able return web search / autocomplate / google assistant responses pretty quickly.