Alignment is not restricted to "gapped" alignments only (or there wouldn't be a need for such a distinction!).
You can modify the algorithm to include gapped alignments (by introducing gaps in your search sequence), as it supports a "gap" character ("-"). The gap character will just always count as "not a match."
[1] http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_sear...
I am not sure you understand what gapped alignment is- it is not the alignment of a sequence with known gaps, but an algorithm which determines the best placement of gaps in a query sequence to obtain the highest matching score. This is a very different problem than the one you just described, and is essentially "the hard part" of sequence alignment. [1] http://en.m.wikipedia.org/wiki/Sequence_alignment
As a learning exercise, this is interesting and fine. I am trying very hard to suppress the inner "reviewer" right now. Walking away...not comparing this to existing algorithms which are implemented in highly optimized C/C++, or CUDA, or even hardware. About why other authors would go to such extraordinary lengths if high-level languages are suitable. Not going to ask how conclusions can be drawn about the suitability of Javascript for very computationally-intensive tasks without solving the actual alignment problem rather than a subset, let alone comparing to existing tools. Not going to ask about the application of the tool. It's not a paper. I'm breathing. OK.
The initial point of the tool / algorithm was to find all potential binding sites for a DNA-binding domain of a protein in 100kbp - 1Mbp genome. (Even those with sequence identity ~50% or less.) This is provided you have a consensus sequence that contains ambiguous nucleotides. (For example, roughly discerned from a sequence logo.) It quickly turned into development of a general bioinformatics library in JavaScript, and a chance to see how far and fast I could push V8 at doing these sequence comparisons.
I would love (at some point) to go into significantly more detail and compare what I've written here with existing tools. If you're willing to offer mentorship or guidance (or know somebody who would be), it would be fantastic to present the information in a more thoroughly peer-reviewed context. Otherwise, the post (and associated library) are meant primarily as learning tools for both biologists and developers.
Your initial problem, if you frame it as the desire to simply enumerate all the degenerate sequences and loci, could be solved any number of ways as other commenters have mentioned. Probably I would reach first for a regex. But sure, no crime in learning to implement a new algorithm while also testing the limits of V8. Probably half of my grad school time was spent that way ;)
I think your best bet if you wanted to publish would be to find a use case for in-browser alignment. It would be hard to answer the obvious question, "why not server-side?" though. But who knows, people are somewhat often taking old bioinformatics algorithms and saying "but now you can do it on your phone!!". And they do publish.
But as for matching both speed and accuracy of state-of-the-art aligners with Javascript, it is my considered, scholarly opinion that you have no chance in hell. So you shouldn't present it that way. It would be unnecessary to compare (at least for speed) if you weren't making claims about speed.
"Faster", now that's a more interesting title if only because it was compared to something else. (Then again, I reviewed one paper which said the algorithm was faster and more memory efficient than the previous version of itself. But with no numbers.)
"Fastest" is of course much harder to pull off. :)
The aforementioned colleague says he usually editorially rejects those.
Actually another amusing thing about this title is that actually this is not the first time that "ultra-fast" has been used to describe an aligner in a title. The STAR aligner did too.
No sane reviewer or editor would allow "fastest" unless in the context of "provably fastest" which is probably not within the skillset of most bioinformaticians. If it is just "fastest among the currently available tools" then that claim will be out of date within a year.
If I want to do fast genome alignment, I'll use lastz which is already blazing fast and written in C. If I want to just look for homology, I'll use blast.
There isn't much need for improved alignment algorithms. If it's a big job, most bioinformatics have access to clusters. If it is a small job, who cares about speed.
That said, I wonder if grep wouldn't be much faster, since this program is only looking for exact matches, which are easily transformed into a regex by replacing the ambiguous nucleotides with something like (A|T|C).
[1] http://www.cs.utexas.edu/users/moore/publications/sustik-moo...
The article wasn't meant to be highly comprehensive, but if my method is indeed new, I'd be more than happy spending more time writing an article that's a bit more technical. (To note, storing and comparing nucleotide sequences as binary strings isn't novel in and of itself. I haven't found evidence of the method of comparison I've used, however.)
Note also that intel/AMD SSE4+ has a 32 bit/64 bit popcnt instruction with 3 cycle latency/1 cycle throughput (for both 32 bit and 64 bit version), and so is faster for counting bits/matches than any of the methods you are using :)
[1] http://blog.codinghorror.com/the-principle-of-least-power/
What it really shows is that the FASTA format is just terrible for computational efficiency :(
for comparison.