undefined | Better HN

0 pointsbrabel4y ago0 comments

I clicked on the big Download button and selected "all records", it downloaded over 3.5GB before I gave up... which file exactly should I use??

0 comments

benjamin-lee4y ago

I'm sorry, I completely forgot that the file I used was from six months ago when I wrote the blog post (and then promptly forgot to publish it). In the last half year, the number of coronavirus sequences has increased dramatically. One thing that you could do to drop the file size down is to filter for only complete and unambiguous sequences, which drops the number down from 1.6 million to ~100k [1].

Alternatively, the exact file I used for the post is available for one week here with MD5 sum 3c33c3c4c2610f650c779291668450c9 [2]. Anyone who wants the file is free to reach out to me directly (email is on site).

[1] https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType...

[2] https://file.io/nUNc7cG5i8gj

hexo4y ago

The file at [2] is already gone :(

j / k navigate · click thread line to collapse