(I'm the author of ripgrep.)
You’ve saved me tens if not hundreds of hours with ripgrep, and I’ve become a huge evangelist of it at my workplace. When I’m helping someone understand how to debug customer issues, the first thing I tell them is to install ripgrep. Truly a fantastic piece of software.
EDIT: Found their email via git. Always forget about that one.
Yeah, but only when used together with fzf, the other favorite new cross-platform shell utility. I mean, after rg spits out a list you do want to narrow it down and then do something with the files in that list, right?
https://github.com/junegunn/fzf/blob/master/ADVANCED.md#ripg...
Before you do something like that, always ask yourself: "What if everyone else started doing this?"
If the result feels like a nightmare in the making, don't do it.
No, I don't think so. There is no universality implied in my comment or in the specific practice here. You can make value judgments based on specific circumstances. For example:
* How many people try 'cargo install rg' and have it do the wrong thing? I'd say "probably a lot."
* Is 'rg' on its own something that is a likely useful or desirable name on its own? No, I don't think so.
This doesn't have to mean that everyone should do it for every possible alias of every crate out there. You can say things like "yeah I think it makes sense to squat a name here to improve failure modes for folks."
Other than that, I have squatted a few names before. I don't see anything wrong with the practice in and of itself. It's when it gets abused that it starts to become a problem.
Even flat namespaces are virtually infinite; a couple of extra names that correct user error do not pose a serious exhaustion risk.
1.) The use of names as a speculative financial instrument (in all shades of grey, up to and including extortion for lapsed or stolen names)
2.) The use of names as vectors of attack, such as by exploiting typos or homographs (such as malicious packages)
3.) The reserving of names you don't have a sincere or immediate intention to use (hoarding/FOMO)
This isn't very much like the situation with domains, which is primarily a result of #1 (there is no market for crates.io names, as far as I'm aware). #3 is a problem to some degree on crates.io, my understanding is that they basically treat this as a human moderation problem. #2 is endemic to all package managers.
By putting a helpful instead of malicious package here, the community (and Richard Dodd in particular) are able to mitigate the hazard of #2 (unless this account is compromised or turns malicious - a better but imperfect situation). If a project called `rg` comes around, they can appeal to moderators to get this name, and probably succeed (as if this were a #3 problem).
This isn't a perfect way to do things by any means, but it seems like a decent balance of concerns to me.
Seems fine to me. Something like one tenth of packages reserving a second name? Not a big deal.
In CPAN, you create a module with a hierarchical name (Net::LDAP), and people inherit from it and extend the namespace to add new functionality (Net::LDAP::Batch). Finding a package that does what you want is [relatively] easy. Old code gets maintained rather than somebody reinventing it for the 72nd time with a hodge-podge of functionality.
I also squatted `memap` and `memap2` for the same reasons.
I wonder if there is an algorithmic way to decide when two crate names are 'near' each other. Then, if you added a crate with `cargo add` and there is another similarly-named crate with much higher usage, a warning could be emitted.
*EDIT* I know there's already https://en.wikipedia.org/wiki/Levenshtein_distance, but I wonder if there is a better measure that looks at e.g. keyboard layouts and likely typos. I'm sure there will have been research done on this.
That way "ripgrep" could include "rg", searching cargo for "rg" brings back "ripgrep", not a second package named "rg", and an install could tell the user the correct name for any attempt to install it.
This also covers typo-squats, so there would be no need for packages like "memap".
Obviously this represents a low-effort vector for massive squatting, so maintainers would need to be responsible for preventing that, and could add some typos themselves, being the ones which see the request for the mis-typed packages.
println!("You meant to install ripgrep: type `cargo uninstall rg` followed by `cargo install ripgrep`"); compile_error!("You meant to …");
so that the install would fail and `cargo uninstall rg` wouldn't be needed.Not sure how to feel about this... on an individual-package level, it seems a sensible enough idea, but if it becomes a widespread practice, the namespace could get really cluttered.
Namespaces are a solution or mitigation to some problem, but that problem is not malicious typo-squatting.
Crates.io is incredibly cluttered with namesquatting. It’s probably the worst package registry for it, even surpassing NPM.
Part of the problem is that they explicitly say name squatting isn’t against the rules.
This installs a library by some authors not affiliated with AWS.
Instead of: `pip install awscli`
Which is what you expect.
I can't believe that a good way to see what's inside is to make a rust project, add the crate and then go searching around the local filesystem.
I usually use lib.rs instead: https://lib.rs/crates/rg
That has a link to source: https://docs.rs/crate/rg/0.1.0/source/
And here's the Rust code: https://docs.rs/crate/rg/0.1.0/source/src/main.rs
This one just depends on the correct `scikit-learn` package though.
> You tried to install “pytorch”. The package named for PyTorch is “torch”
https://github.com/fregante/npm-helpful-typosquatting
Here’s what it looks like: https://www.npmjs.com/package/webext
I love Python but pip/pypi and imports always felt wierd to me because of namespaces, package names, special imports "as", etc., maybe this is a bias because I started using them when I was younger and now I'm more experienced, I already know how to use most package managers.
BTW Ripgrep is awesome, I'm learning Rust and it's an inspiration to me, thanks burntsushi!
I can imagine for example, importing keys from only the authors that I think I can trust, and passing a flag to cargo that only allows using those packages for cargo install or cargo add.
In this case I think just checking the top level crates signature (and not dependencies) would be enough to mitigate a lot of issues including typo squatting.
Better to just make `cargo install rg` fail so that it never worked in the first place. `cargo install ripgrep` is also more self-describing and gives you a better search engine query.
Let people do the mistakes once and learn the correct package name, instead of relying on a hack and potentially introduce confusion later.
The solutions here are non-flat namespacing (which has worse UX, since `cargo install some-tool` now becomes `cargo install whats-their-handle-again/some-tool`) or some kind of content addressing (which is similarly bad for UX, if not worse). Most package indices choose neither, and "solve" the problem by playing whac-a-mole with abuse instead.
This means the first package to squat on the name can use the shorthand version, while allowing other packages with the same name in other namespaces. (which may be forks or entirely different packages)
Edit: Because I'm on a Zoom call that will never end.
"ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files."
Unfortunately it defaults to parsing a git tree's gitignore file and skipping over files listed in it.
The idea behind it is that it acts a heuristic for reducing false positives from your search results. For example, ripgrep replaced several little grep wrapper scripts I had in ~/bin.
And fortunately the default behavior is easy to disable. `rg -uuu foo` will search the same stuff as `grep -r foo ./`, but will do it faster.
That's a feature.
Like, it's the entire point of ripgrep. It's designed to search through the things a developer actually cares about searching through.
If you actually want to search everything, just use grep.
Is that true? How could anybody think that this non-orthogonal monstrosity would make any sense?
If you have `ripgrep-team/ripgrep` rather than `ripgrep`, it doesn't help at all with people typing the wrong thing, like `rg-team/rg`. I fail to see how it helps.
It's even worse with packages that are (currently) authored by a single person, how many people know the name of ripgrep's author? Or rand? Or bevy?
If someone can find a meaningful case where ag is faster than ripgrep, then I'm happy to accept a bug report. I'll do my best at that point to give an analysis of the benchmark, and if it's correct, I'll either try to fix it or say why it's hard to fix.
By "meaningful" I mean "something that is noticeable to humans." So for example, reporting a bug because ripgrep took 9ms and ag took 7ms on a tiny repo is one I would consider not meaningful. :)
(Sorry about the verbose caveats, but just trying to head off responses I've got in the past.)
There's even a huge sign with only 12 words pithily explaining what the shop has inside.