Shirt Without Stripes (opens in new tab)

krick6y ago

This is pointless overcomplicating. I might agree if the example would be slightly more interesting, but "without stripes" isn't even "absence of <stripes>", it is essentially a colour/pattern and can be easily attributed to a range of things exactly the same way "green" can be. Google translate correctly associates much more dubious and abstract concepts than that, and does it with statistical methods, i.e. associating word combinations with a location in a vector space. The fact all major search engines fail to do it here is just shameful. Especially Amazon, where it is pretty much a primary search function.

Aardwolf6y ago

> but you don't know where it belongs

Yes, the English grammatical rules make it unambiguous where it belongs. This is solvable.

glup6y ago

As a postdoc in computational linguistics, my go-to example for talks is asking Siri not to show me the weather.

https://www.google.com/search?q=shirt+-"stripes"

ardy426y ago

> This problem is known as "attribution" - you have a "no" or "without" in the sentence, but you don't know where it belongs. One could (and one does) argue that the problem cannot be solved with statistical methods (ML), especially not in any domain where accuracy is required, such as medical recored analysis: "no evidence of cancer" and "evidence of no cancer" are very different things.

Couldn't you just parse the sentence into a dependency tree and look at the relationships to figure that out? CoreNLP got both of your examples right (try it at http://nlp.stanford.edu:8080/corenlp/process, can't link the result directly).

snarfy6y ago

It seems like the attribution problem is an English problem. The query doesn't have to be English.

zelphirkalt6y ago

> you have a "no" or "without" in the sentence, but you don't know where it belongs

Well, one could argue, that it belongs exactly where anyone entering the query put it. Before "stripes".

The problem is often, that search engines try to be too clever, while not offering any kind of switch "exactly those words in this order" and that is just a bad user interface.

conanbatt6y ago

Believing this is a problem of attribution, I would expect to see results that are shirts without stripes, or not-shirts with stripes.

If it just disregards the word without, well, that's pretty bad.

KorematsuFred6y ago

I am little surprised with this result. When I worked on similar products we constantly look at our query stream, sorted the high volume queries and manually intervened to present better results to our users.

I will not be surprised millions of dollars are being lost because of this substandard query result per year.

aero1426y ago

If the problem is that it didn't know where to apply the without, then why does it show me results from only a single entity? I would prefer so see an interleaved set of results containing all ambiguous entities.

https://www.aclweb.org/anthology/P14-1007.pdf

animalCrax0rz6y ago

There is this, too, from 4 years ago, which seems reasonable in my not-very-well-informed opinion (speaking of which, I'm not sure the work referenced here can deal with hyphenated negation, but it should be simple to include)

code: https://github.com/ffancellu/NegNN

DenisM6y ago

Thank you, I knew my effort was not for naught!

joshspankit6y ago

Yes, you have points, but they break down here:

“Shirt -stripes” is unambiguous to a system, yet the first result on Amazon(.ca) is a striped shirt, and the 3rd is sweatpants.

peterwwillis6y ago

This doesn't seem like you need fancy linguistics ML to fix. You take 90s-era search engine tech and add a database of attributes about every kind of product there is, and take a guess at whether "without" is a search term, a search result modifier, or a filter for an attribute of a product. When you display the results, simply ask the user if that was right; if it wasn't, ask them if they preferred the other filter method. Use those responses as a corpus to train the algorithm.

I mean, context is key, right? You're on Amazon and your first search term is "shirts". Unless their is a band called "shirts without stripes", the user wants shirts. The rest of the query is probably some filter of that product. You know shirts sometimes have stripes. It's not a one-size-fits-all algorithm, but it's simple enough that the user should end up with the results they wanted.

Myrmornis6y ago

I think you’re mystifying a lot of people in this thread! Can you add to your explanation why it’s hard for ML to associate the negation with “stripes”? It seems easy: language => English, in English “without” modifies following phrase, not preceding.

njharman6y ago

less than lay person, but in your example

> "no evidence of cancer" and "evidence of no cancer" are very different things.

Why is it not as simple "no belongs to the word it precedes" ? like unary operator, ! (not), in typical computer languages.

danielscrubs6y ago

Reminds me of the confusion with negatives and languages: -Vill du inte ha glass? [Don't you want ice cream?] -はい [Yes]

Does she want ice cream? Answer: No, she doesn't. I added a not, so she's reversing the answer as Japanese people do.

The number of times I've been dumbstruck by this is larger than I'd like to admit, and I'm a coder.

mikorym6y ago

I don't think this about a lack of interest.

There have been some lengthy discussions on HN about vertical search and how Google doesn't always buy up a small company; they litigate.

rjurney6y ago

The latest embeddings/networks like BERT can handle encoding this logic. They take the surrounding words in context when they're encoded.

smoyer6y ago

While this is indeed an example of the attribution problem, I'd argue that this particular query will never be solved. I don't search for a "shirt without stripes", I search for a "solid <insert color here> shirt, or a "<insert color here> hawai'ian shirt".

I'd be curious to see how many sentences with attribution problems actually have other structural issues. If I want to write clearly and without ambiguity, I rewrite sentences that have these problems. Why wouldn't I do the same for search queries?

mcbits6y ago

I don't think this particular problem is related to the language model. "[item] without [attribute]" is trivial to understand even without a sophisticated language model.

The bad results are because they're not positively indexing the absense of the feature by deeply analyzing the images or products beyond the descriptions. "Shirt with stripes" yields almost exclusively striped shirts. Exclude those results from all "shirts" and there are still a lot of striped shirts that the search algorithm doesn't know enough to exclude.

jodrellblank6y ago

Because there aren't two kinds of shirt, striped or solid coloured. You even acknowledge that by mentioning Hawai'ian shirts. You might want a small check shirt, a mostly plain shirt but not mind logos so not completely monocolour, a plaid shirt, a large check lumberjack style, spotted or dotted or diamond pattern, flowery pattern, you might not even know what the other available options are to search for them one by one - you just know that you don't want stripes.

There is no ambiguity in "not stripes", you can't invert it and write it in the positive form of what you want; the neatest way to describe the category of what you want to browse is "things which are not stripey".

Particular personal bugbear is car websites where you can filter in "petrol engine" or "diesel engine", but there is no support for negative filtering, so you can't choose "not LPG". In so many search-and-filter options you can't exclude your dealbreakers, and it's much more likely that I have a single dealbreaker which rejects a choice overriding all other considerations, than that I have a single dealmaker which makes a choice overriding all else.

smoe6y ago

Is it that odd though? I'm not a native english speaker, but I would say "shirt without patterns", not solid. Or for example "without visible logo"

derivativethrow6y ago

Consider the query, "non-glass skyscrapers", which suffers from the same problem.

What do you call a skyscraper like that if you want to refer to it? They exist, but you can't find them using that search term on Google.

angara6y ago

Fascinating, do you have any links to papers about machine-verifiable formalisms?

onurcel6y ago

The point of the OP is that they claim they understand everything. Example: https://www.blog.google/products/search/search-language-unde...

rgovostes6y ago

The point that the author is making, in a very understated way, is that all three companies have PR websites that breathlessly describe their advanced AI capabilities, yet they cannot understand a very simple query that young children can.

Alex39176y ago

At least this is relatively innocuous. Until recently if you did a Google Image Search for "person" or "people", it only showed white men.

belorn6y ago

One can play this game a lot and most results will return expected cultural biased results. A "kind person" is apparently a white girl. A "good person", a white woman. A "bad person", white men. A "evil person", white men. A "honest person", equal mix of white women and white men. "Dishonest person", white men in suits. "Generous person", hands of white women. "Happy person", women of color. "Unhappy person", old white men. "Criminal person", Hispanic men. "Insane person", white men. "Sane person", white women.

Is it surprising that very few of the result surprises me?

bartread6y ago

I couldn't quite believe your comment when I read it so I did a Google image search for "person" and the results weren't a lot better than you'd suggested. Mostly white men, a few white women, a very few black women, a handful of Asians, and multiple instances of Terry Crews.

The net result of that Google search, combined with the "Shirt Without Stripes" repo, leaves me even more unimpressed with the capabilities of our AI overlords.

9 more replies

wonderwonder6y ago

Most of the person results appear to be 'Time Person of the Year' related. Another result is a guy with the last name Person. The results don't seem to be related to the definition of the word 'person'.

Thorentis6y ago

You've raised an entirely unrelated problem. Showing shirts with stripes when you search for "shirts without stripes" is just plain wrong. Showing only a single demographic of person when you search for "person" is correct, it just doesn't have the level of diversity you seem to want. Nothing about diversity is implied in the query, and so your observation is completely unrelated to a plainly incorrect query.

_kst_6y ago

A former coworker had the last name "Person". They once received a letter (snail mail) addressed to "TheirFirstName Human".

I never figured out what kind of mistake could have led to that.

shanty6y ago

I think search algorithms still have a long way to go to really understand the intention. Try your image search results for "white person" "black person" "asian person" "white inventors" "black inventors" "asian inventors" Doesn't quite deliver what would be expected.

Symmetry6y ago

Huh, I tried that with 'people' and the first result that was all white was #15, first result that was 100% men was #8.

If I search for 'person' it's a mixed-race woman, then a white woman (Greta Thurnberg), then a white man.

afiori6y ago

More than racism on the part of google[1] I would attribute that to it being an hard problem with too many dimensions. About three years ago if you searched "white actors" google would give two full pages of only black people (I have no idea whether the actor part was correct).

Many interpreted this along tribal lines, but likely it is that there is constant tuning and lots of complex constraints.

[1] not to say that you implied the reason was racism, but often it is attributed to something along those lines

sbierwagen6y ago

The inverse: a favorite trope of the American far right is that GIS for "american family" will show you photos of... mixed race families. (Something the far right has strong opinions on, and is a tiny minority of all marriages in the US)

Something of a corollary to Brooksian egg-manning: with an infinite number of possible searches, you can find at least one whose results do not exactly match the current demographics of the state from which you place the search.

jimmaswell6y ago

Did they manually skew the results of the algorithm once this started making bad PR?

205guy6y ago

And when I search for "men without hats" I see men from Men Without Hats with hats. Language is hard.

fouc6y ago

DDG does pretty good for "person" or "people"

foolinaround6y ago

What is your point?

The google image search you did -- did not provide incorrect answers, unlike the OP's

alex_duf6y ago

Wouldn't that be a reflection of the world's bias rather than Google's bias?

iconjack6y ago

Google American Inventors and you'll get 95% black men.

globular-toast6y ago

I hate comments like this that only exist to create drama.

apocalypstyx6y ago

“I’m sorry, Monsieur, we are all out of cream — how about with no milk?”

diegoperini6y ago

In Zizek's words, white coffee without milk isn't black coffee.

acdha6y ago

Google has for years put out puff pieces talking about high accuracy on image tagging. It’s only within the last few months that searching my Photos library for “cat” returned something other than pictures of my dog.

There’s a nuanced argument that practitioners know how ML is so dependent on training data and accuracy tails off sharply, but that nuance tends to removed from anything selling to potential customers — which has not been a great way to keep them in my experience.

smsm426y ago

I think a huge Chinese room type parser with a bunch of heuristics bolted on probably provides much better bang for the buck than trying to implement actual NLP (in every possible language, or even just in English). So that's probably what nearly everybody is doing.

oconnor6636y ago

Searching for "now one with stripes this time please" yielded similarly disappointing results :)

Edit: "stripes" not "stripped" ugh

devy6y ago

Google's result is noticeably better though. :)

[1] https://en.wikipedia.org/wiki/Commonsense_knowledge_(artific...

BickNowstrom6y ago

That point is akin to stating: These three companies have not solved the hard problem of common sense [1], so are not allowed to advertise their AI without looking silly.

Nobody has solved the common sense knowledge problem yet. A solution for that would qualify as Artificial General Intelligence and pass the Turing Test.

But search engines have come a long way. I even suspect that when search engines place too much logical - or embedding relevance to stop words such as "without", that, on average, the relevant metrics would go down. It is not completely ignored as "shirt with stripes" surfaces more striped shirts than "shirt without stripes". "shirt -stripes" does what you want it to do.

Searching for "white family USA" shows a lot of interracial families. Here "white" is likely not ignored as much, and thus it surfaces pages with images where that word is explicitly mentioned, which is likely happening when describing race.

You can use Google to find Tori Amos when searching for "redhead female singer sings about rape". Bing surfaces porn sites. DDG surfaces lists (top 100 female singers) type results. The Wikipedia page that Google surfaces does not even contain the word "redhead", yet it falls back to list style results when removing "redhead" from your query, suggesting "redhead" and "Tori Amos" are close in their semantic space. That's impressive progress over 10-20 years back.

ckarmann6y ago

Does Amazon pretends to do AI? They are just offering a platform to do your own Machine Learning. I don't think they ever said their search engine was doing anything smart.

EDIT: scrap that, I didn't mean Alexa, which is doing AI obviously, but the search engine of Amazon's retail website.

Anyway, NLP is hard and everyone sucks at it. Think about it: just building something that could work with any <N1> <preposition> <N2> or any other way to express the same requests would mean understanding the relationships of every possible combinations of N1 and N2. It means building a generalized world model that is quite different from simply applying ML to a narrow use case. Cracking that would more or less mean solving general AI which probably won't happen soon.

5cott06y ago

alexa, downvote this.

Der_Einzige6y ago

ML and AI are the same thing

You're right the NLP is hard, but not everyone sucks at it.

https://www.google.com/search?q=tie+dye+shirt

exhilaration6y ago

I disagree, "shirt without stripes" is an unusual word choice, not one that our ML models would be optimized for. Try "solid color shirt" and you'll see how much better the results are - at least on Google.

sib6y ago

"Shirt without stripes" may (or may not) be an unusual word choice to enter into a search engine, but it's definitely one that a child would understand.

Additionally, "shirt without stripes" is not the same as "solid color shirt"; as an example, take a look at:

LanguageGamer6y ago

Yes, that exact sequence of words isn't particularly common. And yet a child, even if they have hey have never been exposed to it, has no problem understanding what it means.

Whereas all these services seem to be processing the input in such a superficial way that they give the searcher results that aren't just inaccurate but are the opposite of what was asked for.

chrisseaton6y ago

> "shirt without stripes" is an unusual word choice

Lol what? These are words a toddler would understand.

aguyfromnb6y ago

>I disagree, who says "shirt without stripes"? That's an unusual word choice, not one that our ML algorithms would be optimized for.

If your "ML algorithm" doesn't understand straightforward language, how is it any better than a couple if-then statements?

Beyond that, I'm unsure how you think "<something> without <something>" is at all unusual or difficult to decipher.

zacksinclair6y ago

Every human understands that phrase and yet the AI doesn't. That's the gap that has to be fixed.

bootloop6y ago

I don't want to teach myself how to talk to a machine. I want the machine to understand what I am saying.

“Chicken without head”, “men without pants”, “sky without clouds” only work because the users uploading the images tended to tag them as such... (in that case the users do the hard coding of meaning)

arcturus176y ago

I searched "plain dress shirt" and similar terms on Amazon and Google, and they returned plenty of shirts with stripes and checkered patterns.

How am I supposed to explicitly search for a shirt without stripes, then?

seiferteric6y ago

I have noticed in the past few years google results have become noticeable worse for similar reasons. Google used to _surprise_ me with how good it was able to understand what I was really looking for even when I put in vague terms. I remember being shocked on several occasions when putting in half remembered sentences, lyrics, expressions from something I had heard years ago and it being the first! result. I almost never have this experience anymore. Instead it seems to almost always return the "dumb" result, i.e. the things I was not looking for, even trying to avoid using clever search terms. It's almost like it is only doing basic word matching or something now. Also, usually the first page is all blogspam SEO garbage now.

floatingatoll6y ago

Google was good at launch because it was harvesting data from webrings and directories to provide it "high quality" link ranking data. However, they didn't thank or credit or share any of their revenue with the sites whose human curation helped their results become so impressive. Seeing that Google search was effective, most human curators stopped curating directories and webrings. The SEO industry picked up the slack and began curating "blogs" that are junk links to junk products. This pair of outcomes led to the gradual and ongoing decay of Google's result quality.

Google has not yet discovered how to automate "is this a quality link?" evaluation or not, since they can't tell the difference between "an amateur who's put in 20 years and just writes haphazardly" and "an SEO professional who uses Markov-generated text to juice links". They have started to select "human-curated" sources of knowledge to promote above search results, which has resulted in various instances of e.g. a political party's search results showing a parody image. They simply cannot evaluate trust without the data they initially harvested to make their billions, and without curation their algorithm will continue to fail.

nikanj6y ago

Your search for "skiing Norway" mostly returns results for skiing in the French Alps, because those pages have much higher visit rates.

Google is a dumbass nowadays, and regularly ignores half your search terms to present you with absolutely irrelevant results, that have gotten lots of visits in the past.

sonar_un6y ago

I have also found that search results are getting frustratingly worse. Often even when I put in explicit search terms and quotes, and filter out words that I don't want, Google will return results that don't adhere to what I am looking for or just return no results at all. I remember when I would search for something and find much more relevant information. Now the first 5 or 6 search results are ad-sponsored and aren't relevant, but I have to go to the 3rd or 4th page to find something that matches. I also often have to search for things that were posted in the last year or less because the older postings are increasingly irrelevant.

6gvONxR4sf7o6y ago

I suspect their job has gotten way harder. It's easy to forget that they aren't just passively indexing. The web is basically google's adversary, with every page trying to be top ranked regardless of whether it "should" be.

Lammy6y ago

I think it's disingenuous to say "the web is basically google's adversary" when Google AdWords is the reason so many pages fight for top ranking.

fastball6y ago

That was when people built websites to deliver content. Now people build websites to get highly ranked in Google. No matter how good google's algos are, they can't win when the underlying content is just SEO'd garbage.

Nextgrid6y ago

SEO'd garbage often contains ads, including Google ads. There's no incentive for Google to fix this problem.

Google's job is not to give you great search results, it's to keep you clicking on ads. Ideally it would be the ads on the search results page directly, but if that doesn't work then a blogspam website with Google ads is the next best thing.

If Google was a paid service this problem would be solved the next day. Oh, and Pinterest would completely disappear from Google too. :)

ori_b6y ago

Google has gotten worse for me BECAUSE of the stuff you're talking about: It used to search everything and find the words that I cared about.

Today, it will silently guess at what I want, and rewrite the query. If they have indexed pages that contain the words I put in, but don't meet their freshness/recency/goodness criteria, they will return OTHER pages with content that contains vaguely related words. "Oh, he couldn't have meant that, it's from 6 months ago, and it's niche!"

They'll even show this off by bolding the words I didn't want to search for.

So, if I'm looking for something that isn't popular -- duckduckgo it is. It doesn't do this kind of rewriting, so my queries still work.

XCSme6y ago

I agree with search getting worse, now I always have to add quotes around my keywords: yes, I really wanted to search for this exact thing. I also end up adding "reddit" in my query just so I can reach some genuine useful questions/responses instead of top 10 blog posts written just for the Amazon affiliate links.

https://trends.google.com/trends/explore?date=all&q=solid%20...

4gotunameagain6y ago

Imagine if somehow it would be possible to pick any release and search with that.

transreal6y ago

Searching "men without pants" versus "men with pants" gives much better results.

This is a case where, while it makes sense to say the sentence, it's not a common use of language, and at the end of the day, the search engine will find what's written down, it's not a natural language processor yet (despite any marketing).

Shirt stores don't advertise "Shirts without stripes - 20% off", they describe them as "Solid shirts" or "Plain shirts". Men's fashion blogs talk about picking "solid shirts" or "plain shirts" for a particular look. If I walked into a clothing store and asked for "shirts without stripes", the sales person would most likely laugh and say "er, you mean you want plain shirts?".

Plain shirts/solid shorts are the most common way to refer to these, and people seem to be searching this way:

Regarding moving towards natural language processing - the "without" part is not as important as knowing the context.

My kids will ask me to get from the bakery things like "the round bread with a hole and seeds", which I know means "sesame bagel", or "the sticky bread", which means "cinnamon twists" - which I understand because I know the context. Sometimes they say "I want the red thingy", and I need to ask a bunch of questions to eventually get at what they want (sometimes it's a red sweater, sometimes it's cranberry juice).

Unless Google starts asking questions back, I don't think there is any way it can give you what you want right away.

pbhjpbhj6y ago

Thankfully, "men without pants" shows me exclusively men wearing pants, underpants that is, as I'm in the UK.

Searching "pants" only shows me "trousers", that's a big fail for Google IMO, I'm accessing google.co.uk.

dkdbejwi3836y ago

Similarly, "sandwich with cheese" is probably going to return more relevant results than "sandwich without scorpions"

wkyle6y ago

Vaguely similar to a joke from Ninotchka that Zizek often uses about the difference between 'coffee without cream' and 'coffee without milk'. He usually uses it to reference the concept of negation in the Hegelian dialectic, but he's also mentioned the difficulty of computers understanding negation in the context of the coffee/cream example.

The joke from Zizek: https://www.youtube.com/watch?v=wmJVsaxoQSw

albertzeyer6y ago

Why should it not be possible to solve this with statistical methods? The model just needs to be able to understand the important meaning of "no" in here, in the context of the whole sentence. I would guess that most modern NNs from the NLP area (Transformer or LSTM) would be able to correctly differentiate the meaning. The problem is, I think there is no fancy NN (yet) behind Google search, and the other web searches.

To extend on that, you can think of the human brain as just another (powerful) statistical model.

animalCrax0rz6y ago

"there is no evidence of cancer" and "there is evidence of no cancer" are two different statements with different meaning, so it's more complex a task than just understanding the importance of "no" in a sentence. It's involves semantic analysis of the sentence. The paper I linked to below describes a technique they call "deep parsing." Check it out for more context.

pbhjpbhj6y ago

Lexical analysis is surely clever enough to make "no evidence" and "no cancer" the atoms used and should even differentiate "no cancer" from "no, cancer" pretty easily? Are we really still at 80s chatbot level functionality when it comes to string parsing?

Related aside: It frustrates me no end that spellcheck still doesn't appear to use any probablistic considerations, like Markov chains, to determine the intended word. And that when I click the next to last letter to make an adjustment it doesn't then change the suggestions to alternate endings, etc.. Perhaps newer devices than I have do this.

dhimes6y ago

While it was an interesting point, it doesn't seem completely applicable here. Presumably the classifications include context. "A shirt with no stripes" should be distinguishable from "a stripes with no shirt" in the this context.

atq21196y ago

This is true, but isn't really that relevant to the parent's point about statistical methods. Statistical methods (and "deep learning" is such a method) could certainly take the order of words in a sentence into account, for example.

londt86y ago

I tried to search for "cheese without holes" on Google and it yielded good results. I think the problem here is that the query is something people would rarely search.

albertzeyer6y ago

Yes, sure, exactly. It's a complex task. I'm just saying that there is no reason why a statistical model should not be able to solve the task. And you seem to agree on that. You even linked such a method.

BickNowstrom6y ago

> Why should it not be possible to solve this with statistical methods?

Doing just that for 10 years, beating hand-coded systems: https://www-nlp.stanford.edu/pubs/SocherLinNgManning_ICML201... [pdf]

> I would guess that most modern NNs from the NLP area (Transformer or LSTM) would be able to correctly differentiate the meaning.

Yes. See demos like: https://demo.allennlp.org/constituency-parsing/MTczNjYyNA== and https://demo.allennlp.org/dependency-parsing/MTczNjYyNg==

> I think there is no fancy NN (yet) behind Google search,

During the deep learning boom, Google made a huge push towards NN-based NLP. SEO's and their PR calls their efforts collectively RankBrain: https://en.wikipedia.org/wiki/RankBrain

I think we are on the cusp of combining symbolical/logical operations over the vectors produced by Neural Networks (or at least, major effort there). Could be by neatly tying up all these different NN-based NLP modules (parsing, semantic distance, knowledge bases, ...) with another set of decision layers stacked on top.

DenisM6y ago

This very question was the subject of a lengthy debate between Norvig and Chomsky. I won't rehash it here, but here's a glimpse:

Chomsky: Statistical analysis of snowflakes falling outside the window may predict the next snowflake, but it will do very little for weather prediction, and nothing for climate analysis.

Norvig: Give us enough data and we will get close enough for all practical purposes.

rhizome6y ago

>The model just needs to be able to understand the important meaning of "no" in here, in the context of the whole sentence

It's easy to say, isn't it? Unfortunately, sticking the word "just" in there doesn't affect the difficulty. I do it all the time, too.

That said, "meaning" is not statistical.

ScoutOrgo6y ago

The OP most likely isn't up to date on latest NN architectures which can learn from the ordering of words. Statistical methods <> ML, not anymore in NLP.

https://en.wikipedia.org/wiki/Allopathic_medicine

caust1c6y ago

My favorite was "What do vegetarians eat" which was broken for years: https://twitter.com/Caust1c/status/855193855422943234

gumby6y ago

I'm significantly more concerned about what humanitarians eat, especially in today's economy.

GuB-426y ago

Fun experiment on Google:

- Shirt Without Stripes: shirts where the description contains both "without" and "stripes". Example: a shirt without collar, with stripes.

- "Shirt Without Stripes": a mess, with and without stripes, suggesting an unusual search query. In fact, the linked article site is the first result in web search.

- Stripeless shirt: sexy women in strapless shirts

- "stripeless shirt": pictures of Invader Zim...

- "stripeless" shirt: mostly shirts without stripes, but there are some shirts with stripes that are described as stripeless...

The last one may give us a hint at the problem. If you have to mention a shirt is without stipes, you are probably comparing is to a shirt with stripes. For example imagine a forum, some guy is posting a picture of a shirt with stripes, I can expect some people to ask questions like "do they sell this shirt without stripes"? Or maybe the seller himself may have a something like "shirt without stripes available here (link)" in the description. So the search engines tie "shirt without stripes" to pictures of shirts with stripes.

I remember an incident where searching for "jew" on Google led to antisemitic websites. The reason was simply that that exact word was rarely used in other contexts. Mainstream and Jewish source tend to use the words "jews" and "jewish" but not "jew". And because Google doesn't look at the dictionary meanings of words but rather what people use them for, you get issues like that.

knodi1236y ago

> The reason was simply that that exact word was rarely used in other contexts

I had a similar problem when I was trying to convince a friend that homeopathy was a complete and utter fraud with absolutely no basis in reality. She was convinced that the internet's overwhelming consensus was that homeopathy was valuable and regular doctors were control-freaks who make things up when they don't know the answers.

To prove her point, she did an internet search for allopathic medicine and showed me how the majority of the results were negative.

Just a humorous anecdote, not trying to start any conversations about the relative value of different medical paradigms.

bentona6y ago

To me, the most interesting implication here is that this must not adversely affect Google's ad revenue. If it did, they would surely fix it. This, in turn, means that apparently we have been trained to interface with search engines such that this is not a problem.

Sometimes I wonder how much my brain has changed to use search engines / how much of it is dedicated to effective googling. Makes me feel like a cyborg.

Saaster6y ago

That sounds like an ad business version of the efficient market theory. E.g, that can't possibly be a hundred dollar bill on the ground, because if it was someone else would surely already have picked it up by now.

I think you're overestimating Google's sophistication.

jjeaff6y ago

Exactly. And how would Google know whether this would improve add recenue? They have never created and tested it.

burger_moon6y ago

I suppose that's a base of why 'Google fu' and 'Googling skills' are such common phrases.

Knowing how the machine will interpret humans is just as important to finding your results.

0x000000006y ago

The other day I found that googling something like “mesothelioma -lawyer” will exclude results with “lawyer” but all the ads will still be for lawyers and contain the word “lawyer”.

I guess because they leave it up to the advertiser to determine the negative match words and that seems to always have priority.

tantalor6y ago

Similar thing happens on Twitter. If you mute a phrase it will block organic content including the phrase from appearing in your timeline, except ads including the muted terms still appear in your timeline.

https://help.twitter.com/en/using-twitter/advanced-twitter-m...

ken6y ago

That statement was true of every past search engine, too. AltaVista must have been great and its severe limitations must not have affected revenue, or else they surely would have improved it.

Everything is for the best, in this best of all possible search engines -- the Candide fallacy.

bcoates6y ago

What exactly would happen if it was a material negative impact on their revenue and they didn't fix it?

If Google isn't under survival pressure to get better (and they aren't) the incentives aren't aligned for them to improve or even to not get worse every year.

If Google is failing first gradually then suddenly it might not even be within the institutional power to notice how bad it's become before it's too late.

MattRix6y ago

There are lots of things Google could do improve their search and ad revenue that they haven't done yet.

prepend6y ago

This is the issue I think. Amazon doesn’t care about selling a shirt without stripes, it is just as happy as selling any shirt.

This assumes that AI wants truth. These three companies AI don’t necessarily want truth, they want revenue.

eru6y ago

A customer who searches for a shirt without stripes is probably more likely to buy shirts without stripes than shirts with stripes?

Tade06y ago

My Operating Systems professor (Tomasz Jordan Kruk, PhD) in college had an appropriate anecdote for this:

"Humans usually don't intuitively understand the word 'no'. Please imagine a non-pink elephant."

davesque6y ago

Maybe you mean, "Please don't imagine a pink elephant." Imagining a non-pink elephant seems pretty easy.

natefox6y ago

Along these lines, I once heard somewhere that people do not process the word 'dont'. As a coach, I've had to shift my vocabulary to focus on the 'do's rather than the 'dont's

Eg: If you're doing a sport where leaning forward is bad, avoid telling yourself 'dont lean forward' as your mind only hears 'lean forward', therefore reinforcing the thing you're trying to avoid. Alternatively, tell yourself 'lean back' or 'stay straight' or whatever you're focusing on for that maneuver or drill.

throwawaymsft6y ago

Easy, after first imagining the pink elephant you aren't supposed to.

[1] https://trends.google.com/trends/explore?geo=US&q=%22white%2... [2] https://trends.google.com/trends/explore?geo=US&q=%22plain%2... [3] https://trends.google.com/trends/explore?geo=US&q=plain%20sh...

marcosdumay6y ago

If the first thing that comes to your mind when you hear "imagine a non-pink elephant" isn't either a pink elephant or "WTF, how is a pink elephant?", then you are in a very small minority.

BaronSamedi6y ago

The funny thing is that this search is pretty easy to put into first order logic (shirt(x) & ~striped(x)). I guess we now have computers that are bad at logic.

captainmuon6y ago

This is something that has annoyed me since the Altavista times. I want to search for "madonna but not the singer", and find pictures of the holy icon. I can do "madonna -singer", but that fails if the page mentions the word "singer" a single time. Even if it is "This is a page about madonna statues, but not about the famous singer."

It would be great if I could add negative keywords to a website, or mark text as "don't index" or "index with a negative weight". But probably, people would game this in ways I can't imagine.

There is probably a clever ML solution for this, like having meaning-vectors for distinct ideas, and pushing pages that are close to one meaning away from the other meaning. Classification is easy if you have a keywords like "painting" and "catholic", but if it is "virgin" or "prayer" then it could be either meaning, so there is never a bullet-proof solution.

cscurmudgeon6y ago

A few years back (in around 2012) I attended an NLP talk.

The theme of this talk was how they did a study that showed prepositions and articles do have meaning. A big deal was made out of the results.

I think things like this happens when people consider engineering approximations such as bag of words to be the truth over time.

blahedo6y ago

I have a PhD in NLP (which is what we often call it on the CS side, but is almost synonymous with CL="computational linguistics" on the cognitive/linguistics side of the field). I remember a talk at our annual conference, well-attended, perhaps around 2003 or so. The speaker was from one of the labs that was really leaning into "big data", which was only just becoming possible at that point, and argued persuasively that we should all just throw out our parsers and formalisms—ditch the computational linguistics side, basically—because we were on the edge of functionally infinite (unsupervised) data, and supervised and partially supervised systems would never ever be able to keep up. He presented performance numbers and how the unsupervised systems needed a lot more data to compete with the supervised systems, but that data was available, and he threw more and more and more data at the system and it got better and better. (I no longer remember the specific task he was using to illustrate his point.)

There were gasps in the room and a kind of depressed acquiescence: geez, he might be right. And the pendulum indeed swung in that direction, hard, and the field has been overwhelmingly dominated by the statistical machine learning folks on the CS side of the field, while the linguists kind of quietly keep the flame alive in their corner.

But I thought then, and I still think now, that it really just was another swing of the pendulum (which has gone back and forth a few times since the birth of the field in the 1960s). Perhaps it's now time again for someone to ring up the linguists and let them apply their expertise again?

c3534l6y ago

If you select "I don't like this recommendation" for a video on youtube, you will get to provide feedback on why you did so: either "I don't like this video" or "I've already watched this video." I've pressed the latter on literally thousands of videos at this point, and after well over a year of this, YouTube still hasn't figured out that I don't want to be recommended videos that I've already watched.

Likewise, Google says I should log into their website for personalized search results, but after years of always clicking on Python 3 results over Python 2.7 results, it never learned to show me the correct result.

Eventually I realized that personalized recommendations are more or less just a thin cover for collecting vast amounts of data with no benefit to the consumer. I believe we have the technology to do better, but we don't use it. In fact, we seem to be using it less and less.

disqard6y ago

My experience has been that most ads that "follow me around" peddle the exact same thing I purchased most recently. No, I will not buy a second electric kettle again, let alone the exact make and model that I now own. I'd rather have generic ads, so I might discover some new product that I could (at least in theory) actually buy.

ChuckMcM6y ago

I love this. It is such an easy to grasp example of what is "wrong" with search. Historically, searching was keyword based so documents with "shirt" and "stripes" would rank highly, even though none of those pages had the keyword "without".

As humans we know immediately that the search is for documents about shirts where stripes are not present. But the term 'without' doesn't make it through to the term compositor step which is feeding terms in a binary relationship. We might make such a relationship as

Q = "shirt" AND NOT "stripes"

You could onebox it (the Google term for a search short circuit path that recognizes the query pattern and some some specific action, for example calculations are a onebox) and then you get a box of shirts with no stripes and an bunch of query results with.

You can n-gram it, by ranking the without-stripes n-gram higher than the individual terms, but that doesn't help all that much because the English language documents don't call them "shirts without stripes", generally they are referred to as "plain shirts" or "solid shirts" (plain-shirt(s) and solid-shirt(s) respectively). But you might do okay punning without-stripes => plain or to solid.

From a query perspective you get better accuracy with the query "shirts -stripes". This algorithmic query uses unary minus to indicate a term that should not be on the document but it isn't very friendly to non-engineer searchers.

Finally you can build a punning database, which is often done with misspellings like "britney spears" (ok so I'm dating my tenure with that :-)) which takes construction terms like "without", "with", "except", "exactly" and creates an algorithmic query that is most like the original by simple substitution. This would map "<term> without <term>" => "<term> -<term>". The risk there is that "doctors without borders" might not return the organization on the first page (compare results from "doctors without borders" and "doctors -borders", ouch!)

When people get sucked into search it is this kind of problem that they spend a lot of time and debate on :-)

ggggtez6y ago

Perhaps, but would you really say "Hi, I'm wearing a shirt without stripes"?

It's a completely artificial construct. Simply the fact that this hacker-news entry is the #1 search result shows that real human people do not perform this search in significant quantity. But we can quantify that with data to backup the assumption [1][2]. When people want to buy a shirt without stripes, they do not describe the shirt by what it doesn't have.

In fact, it's trivial to cherry pick a random selection of words that on the face of it sounds like something a human might search for, but it turns out never occurs in practice. Add to that the fact that the term is being searched without quotes [3], which results in the negation not actually being attached to anything.

Do you go to a store to buy it along with your Pants Without Suspenders, Socks Without Animal Print, and other items defined purely by what they don't have?

VohuMana6y ago

Is it just me or does it feel like in the last couple years all of these companies have had the quality of their search go down? I've noticed large portions of my search will go ignored and it will just grab the most popular terms in my search rather than searching all terms.

rbetts6y ago

This is also confusing what you search for vs. what the vendor thinks you will buy. Product catalog searches often intentionally return items outside your search parameters.

I would never search for something this way. If I wanted to find a 4WD car, I wouldn't search for "cars without 2WD."

Likewise, here, I would search for solid-colored shirts.

And these services are limited to the content/terminology utilized by the cataloged sites/products.

If I am selling a "black shirt" or a "solid black shirt," it is not google's job to catalog it as a "shirt without stripes," unless I advertise it as a "black shirt without stripes."

I would use natural language to test a services' NLP ability.

jorvi6y ago

But it what if you just hate shirts with stripes but do like polka dots or other patterns? You can do a fancy advanced search query with OR and EXCLUDE tricks but that is not what this post is trying to emphasize.

Let's assume the point of the OP is that Google sucks at understanding this search that most people would logically understand.

But let's also logically assume that most content on the web is telling the visitor what their page or product is, not what it is not.

I would expect a shirt to be advertised and described as what it is: “black shirt” not “shirt without stripes.”

So if a content creator does not include the exact term “without stripes” in the description of shirt, then you are relying on google to infer meaning on your behalf and the content creator.

Now, this is relatively inconsequential for a shirt and perhaps not well representative, as a fashion-related searches a different than many searches. If I search for “news without coronavirus,” should I expect only articles that do not refer to coronavirus? I wouldn’t.

If I was allergic to peanuts and I searched for “food without peanuts,” I would expect results from content creators and sellers of products who took care to include the term “without peanuts,” because they are advertising their product as safe for those with peanut allergies. I would not rely on google or amazon to make that determination for me.

Both for google and individual sites, there are better options to further narrow results. If you don’t want to narrowly define your result to a specific pattern or color, the first search more broadly and then used advanced settings or filters to omit terms and/or include others.

danpalmer6y ago

Your car example makes sense because there are pretty much only 2 options.

Unfortunately people don't search for "solid shirts". At best they search for "plain shirts", but there's a lot of taste to clothing that means people often do want a shirt without stripes, but are open to patterned/plain.

I think searching "shirts without stripes" is very legitimate in fashion.

I say this having built a clothing search function for the company I work for, and one that does not support this sort of query.

So the one you built will exclude shirts with stripes if "shirts without stripes" is typed directly into a search box? Or there are secondary filters to omit and/or only show certain patterns?

wooders6y ago

We're a company coming out of the YC W20 batch working on the product attribution problem http://glisten.ai/.

There's too many products nowadays to be manually attributed (e.g. pattern=stripes), making it hard return good results even with entity resolution for queries. We train classifiers to categorize products, including what something is not, using their images and descriptions.

schmichael6y ago

Google Photo's search is a similar source of amusement for me. While it's quite good, it also fails fairly regularly and sometimes amusingly. For me "turtle" includes understandable mistakes like fish, a snail, and a rock that does look a bit like a turtle. However "turtle" also includes this, a picture of sequined slippers reflecting light?! https://i.imgur.com/4aSlA4B.jpg

I'm guessing one of those reflections looks like a turtle? Or maybe a pattern on the floor, wall, or rug?

Although there are examples where I'm unsure if the AI is dumber than my 4yo or smarter than me. This is a result for "truck": https://i.imgur.com/JcgXZAG.jpg

Even (especially?) my 4yo knows those are Brio trains, not trucks. However, trains have components called trucks! https://en.wikipedia.org/wiki/Steam_locomotive_components I'm unsure whether or not any of the wheel assemblies on these toy trains are considered trucks, so either the AI is extremely smart or slightly dumber than a 4yo.

dicytea6y ago

I can almost see the left slipper looking kinda like a head of a turtle

antman6y ago

A good display of the current state of search engines.

nmstoker6y ago

Although search within apps can be even worse. I was looking through Google Movies the other day for the film 2001 and instead it swamps the results with those from the year 2001 - one could argue that there are lots of people who are massively keen to buy films based on the year of production, but I suspect it's better to satisfy those looking for years in the title first and then after that brief interruption list the year based results, rather than the other way around. Similarly looking for "The Book of Why" on Audible is dismal: even when it's in quotes it isn't until the 42 result that the exact match shows up, with a load of useless not-obviously connected results turning up first. Both these failures interest me as they have a tangible financial implication (I clearly had money to spend) and yet they remain unfixed.

noworriesnate6y ago

I realize the obvious fix for the Audible problem is to interpret double quotes as "only search for this phrase," but I wonder if an alternative solution would be do TFIDF for combinations of wordsm, not just individual words. For example, the search crawler could watch for the phrase "the book", "the book of", "the book of why", "book of", "book of why", and "of why", and weight the search results accordingly.

blondin6y ago

the icab link from yesterday showed a screenshot of amazon when their categories listings were still meaningful and helpful. nowadays, it's just a pain.

i recently looked for dough scrapers. i wanted to see what's selling best and what's most rated. they are everywhere. in dessert & decoration, in utensils, in bakeware, and many other categories. i mean i get it...

it's not just search that's hard. categorization is also an issue here.

leephillips6y ago

The results are exactly what I would expect and hope to get.

enumjorge6y ago

When you search for “shirt without stripes” you expect and hope to get shirts and t-shirts with stripes?

joshmn6y ago

Proof that SELECT with GROUP BY doesn't work if your tags aren't correct.

Joking aside, it doesn't surprise me that this isn't being picked up — aren't most of these AI teams more R&D than actual public-facing? Maybe I'm just cynical though.

dEnigma6y ago

This contrasts with my query of "guys in jean jumpers singing too ra loo ra loo" a few years back, which Google correctly identified as "Come on Eileen" by Dexys Midnight Runners. To this day my favourite search experience.

js26y ago

If it were butter, you'd want an unstriped shirt. If it were provolone, you'd want a non-striped shirt. But because it's neither of those, I think you just want a "shirt" or maybe, a "plain shirt". Indeed, I get much better results with either of the latter two search terms. There's no need to mention stripes at all, since no pattern is the default state, isn't it?

Weirdly, searching for

'shirt no stripes'

on Google returned this web page at top of the organic results.

So at some point, searching for a shirt online will involve this conversation. Even more confusing.

(Although I expect my filter bubble will play a part in that)

because google does understand that no and without are interchangeable. But, understandably, it does not correlate "shirt without stripes" as being the same thing as "solid-colored shirts." Why, because no one advertises or describes a solid-colored shirt as a shirt without stripes and no one searches that way. It's an irrelevant point, in my opinion.

my point was that this very web page on HN got to top of Google search in 42 minutes or less on this search term

''' Shirt Without Stripes | Hacker Newsnews.ycombinator.com › item 42 mins ago - The point that the author is making, in a very understated way, is that all three companies have PR websites that breathlessly describe their ...

'''

elsamukoOP6y ago

I was searching for a DisplayPort adapter for a Mac Mini and got MiniDisplayPort adapters for Mac :(

jjnoakes6y ago

They are different though - shirts with patterns (hawaiian, tie-dyed, etc) are "shirts without stripes" but they are not "solid-colored shirts".

throw345hn6y ago

Only slightly related but a couple of years back I got an alexa as a gift. When you open the alexa app, they had the option to add list of todos as a reminder. The first thing I did is to say something like - Alexa, add a reminder to get milk and eggs and paper. The app literally added a single item like this - milkANDeggsANDpaper.

After that I facepalmed myself and turned it off.

lokedhs6y ago

Every once in a while I try the voice recognition by trying to speak normally to it. Normally, as is saying things like: "please set a reminder for five... umm... no I mean 6 o'clock".

Normal humans do this all the time, and if I can't do it speaking to it becomes incredibly frustrating to the point that I never want to do it again. I don't want to plan ahead what to say before I say it.

Granted, it's been a couple of years since I last tried so maybe they're better now.

kator6y ago

Joe: "Hey is Lisa back from vacation?"

Larry: "I saw a red Lamborghini in the parking lot!"

Most people will assume Lisa is driving a red Lamborghini and back from Vacation, meanwhile, all the bots are searching for Lamborghini vacations and trying to figure out what's going on in the conversation.

partomniscient6y ago

"shirts without stripes" results: https://www.amazon.com/s?k=shirts+without+stripes&ref=nb_sb_...

"shirts -stripes" results: https://www.amazon.com/s?k=shirts+-stripes&ref=nb_sb_noss_2

So basically the AI doesn't convert "without x" to "-x" even though the basic capability needed is there. This is why AI is a hard problem, especially when it meets the real world.

It's 2020 and we're still quibbling about the terminology used in SQL, what did we expect?

mabbo6y ago

It's not enough to say "Oh, we should add a rule that 'without' means negate the next word" because that only applies to this one situation, in this one language. Let's generalize the problem: We aren't correctly translating from English (or other spoken languages) to Computer/Logic.

The state of the art in machine translation (from what I've read at least) is translating from language-A to a language-less "concept space" and then from there to language-B. Could that be done where the output language is something a search engine can use to find what you want correctly?

Given that pattern, I suspect we could see much better results in cases like this.

slaymaker19076y ago

I think that this is actually really encouraging in showing that we still have a ways to go in improving search engines. A lot of people treat search engines as a solved problem, at least for non-question answering aspects.

bityard6y ago

Google has gotten significantly _less_ useful for finding technical information over the last decade or so. It used to be the case that when searching for some tech-related item (say, how to use functions in bash), the results would take you to someone's personal website or a wiki. Ostensibly, the more people linked to it and the more people clicked on it in the results, the better it ranked for that query and similar queries.

Today, entering in any tech-related query at all takes you to StackOverflow, end of story. Not only are SO answers quite often outdated (or even terrible advice in general), most of the time I'm not looking for a "here's how you do X", I'm looking for background information on a topic.

Most non-tech queries I put into google are even _more_ useless as the results tend to fall into these categories:

  * Wikipedia (okay for _very_ general things, useless for domain-specific knowledge)
  * SEO-enhanced blogspam, (a.k.a. "8 Weird Ways to Earn Millions Through Gaming The System!")
  * Tweets on twitter (!)

The dev/tech industry desperately needs a search engine that somehow prioritizes _quality_ content, not one-off answers, blogspam, and tweets.

hrktb6y ago

I am amazed at the amount of things people are willing to treat as "solved problem".

At a point there was a TED talk explaining social networks were a solved problem now that facebook was dominant. Recycling was seen as solved problem until it wasn't etc.

I wonder how many actual "sovled problems" we have.

scabarott6y ago

Reminds me of this: https://www.amazon.com/End-History-Last-Man/dp/0743284550

alanbernstein6y ago

Or you can do this: https://www.google.com/search?q=shirt+-stripes

sulam6y ago

The '-' operator stopped working for me with Google a while back. Not sure why, it's fairly frustrating.

speedgoose6y ago

They want to always return results even when they don't have any. So they will remove some of your query. This last week Google even returned months old results when I asked for the last 24 hours. It's so frustrating. I wish the Google team that decided this nonsense have to migrate Python 2 to Python 3 projects until they retire.

chickenpotpie6y ago

Yeah that works, but it's not the point. The author isn't saying there's no way to search for shirts that don't have stripes. They're saying search engines still don't understand how to find shirts without stripes in the same way humans do.

flixic6y ago

You can also do "plain shirt", but the repo is making a message about the smartness of our search engines.

phalangion6y ago

Still includes some shirts with stripes, since the search is now just for shirts that don't explicitly mention stripes.

leephillips6y ago

That's not what the - operator is supposed to do. It's supposed to suppress pages containing the keyword after the -.

saagarjha6y ago

That's not right, because that would not include pages that have the text "shirt without stripes" on them.

staz6y ago

doesn't works

varelaz6y ago

Problem here is not about negation, but there is no product that's described as "shirt without stripes". Stripes and shirt will come together in a different sense, since Google cannot find whole phrase it has to find parts. For example check for "shirt without shoulders"

pugworthy6y ago

Interestingly, Google can handle these searches just fine...

"birds without flight"

"cars without wheels"

"cats without tails"

"dogs without hair"

"intersections without lights"

"poems without rhyme"

"shirts without collars" (also "sleeves", "shoulders", "buttons", "logos", "pockets", and more)

lolc6y ago

That's because all these things are readily labelled by humans as such. So they don't have to understand the sentence. Just match it.

gowld6y ago

That's because these examples are commonly described in text with those words, but "Shirts without stripes" are not.

imgabe6y ago

Humans can kind of make some assumptions based on context, but it's really just a poorly defined, vague query.

What if you walked into a store and asked an associate for a shirt without stripes? What would you get?

Probably some further questions for clarification. What about checked shirts? Floral prints? Plaid? Do you want no pattern at all? T-shirt? Polo shirt? Dress shirt?

Granted, the AI results are particularly bad because they give you the one thing that you specifically didn't ask for, but that's also the only information you provided. Defining a query in terms of what you don't instead of what you do isn't going to go well.

What if you went to google and said "Show me all the webpages that aren't about elephants"? Sure, you'd get something, but would it be anything useful?

hombre_fatal6y ago

This is a good example of the bar HNers must have these days when they bafflingly assert that Google is somehow getting worse from what they remember.

Google has gotten better, it's just HNer expectations that have changed as they expect more and more magic.

For example, the subtitle on the repo is "Stupid AI" when this query has never worked in these search engines, and it won't anytime soon.

You'd think the technical HN crowd would be more advanced than to make the same mistakes that (they complain that) stakeholders/users/gamers make when they mistakenly think everything is much easier than it actually is. Things aren't "stupid" just because they can't yet read your mind.

Nextgrid6y ago

> has never worked in these search engines

I would expect there to be an e-commerce site or blog post somewhere containing a page with the exact title "shirts without stripes" and I'd expect it to be the first match.

rjurney6y ago

That darn conceptual search sure is hard :) The technical approach to achieving this involves a sentence embedding that then uses vector search to match documents based on a distance metric like cosine similarity. If you encode a description of a shirt in an embedding trained on all shopping item descriptions, it should match up with the search query. The trick is in getting a sentence embedding from a short query to match a longer description in a document description - long summaries of text in embeddings tends to average too much and cloud meaning. The other problem is including the vector search feature without screwing up other searches.

kinkrtyavimoodh6y ago

On a meta note, I am a bit tired of HN submissions being used more as "Writing Prompts" rather than as links to substantive material.

This thread is an excellent example. The author of the linked page didn't have the decency to actually make a substantive point, instead sharing three screenshots and posting the link here, chumming the HN waters with the kind of stuff that brings in the sharks from far and wide.

Bashing on big cos: Check

Vague pronouncements about AI: Check

Generic side-swipes about 'ad revenue': Check

This is why a coherent thesis is required to even initiate a proper discussion, because in the absence of that it invariably devolves to lowest-common-denominator shit-flinging.

ltbarcly36y ago

Here's another fun fact about how commerce search engines work (I spent a couple of years on this):

Negations sidestep almost all of the algorithms that try to provide an improved result set, and fall through to pure text relevancy. So try searching on amazon for shirt, then search for: shirt -xkxkxkxk. Since xkxkxkxk doesn't match any documents, the negation should have no effect, but it does, the effect it has is to sidestep all the fancy relevancy work and hardcoded query rewrite rules, domcat rules, demand and sales/impression statistics etcetc, and give you basically awful search results. You don't even get shirts.

twodave6y ago

I'm actually not sure I expect this much from a search engine. Typically there is going to be a useful word to describe what you want without having to hope it can understand "no" or "without" (for example, without stripes -> "solid" or even "NOT striped" in many cases).

Anyone with a programming background knows there is an art to forming useful search queries--it is an acquired skill. I'd personally much rather the engine bring back predictable results given mundane rules and keywords than attempt to understand sentences using an opaque method of understanding.

elicash6y ago

I think Google should design primarily for people who DON'T know a ton about crafting queries, even if it's at the expense of a much smaller number of folks who are experts.

That said, this seems like an obvious place for improvement where both groups can be made happy.

pvtmert6y ago

Since there is no context is provided, I do not expect it to understand prepositions itself.

Given exact query to human, they create environment thus context themselves.

It may also depend on whom you are asking to. For example, myself, entering this site to find out news about software & tech. Also since 'Stripe' is a company name, I assumed link will get the list of shirt shops who do not accept Stripe as a payment method/provider. (Thus some kind of protest related thing)

I literally thought about that yesterday and did not see the page thinking "That's too much for tonight".

Now seeing topic is somewhat very different.

civil_engineer6y ago

Wikipedia gets it wrong too: Try “men without hats” https://en.wikipedia.org/wiki/Men_Without_Hats

thedeviantdev6y ago

Try searching Google for 'white couples' and 'black couples'.

The former returns lots of mixed race couples, mostly not white couples. However the latter returns black couples.

What is going on here? Similar phenomenon perhaps?

heavenlyblue6y ago

To be fair, the only thing that Google needs to do internally is to match this query to “shirt -stripe” and then you’ll get the necessary answer. The bigger question is why they are not doing that.

quickthrower26y ago

"Plain shirt" works a charm though. What is a 'shirt without stripes' anyway? That could be a shirt with diamonds? Or a plain one? Or a Hawaiian shirt?

What is the expected result, can we agree?

notRobot6y ago

> What is a 'shirt without stripes' anyway

It is a shirt with anything except stripes.

https://medium.com/pinterest-engineering/pinsage-a-new-graph...

lgessler6y ago

A good demonstration of the linguistic fact that far from being meaningless, prepositions (adpositions, more generally) are actually highly consequential for meaning and are highly ambiguous between different meanings. Here's a paper that'll give you a good appreciation of this from an NLP perspective if you're curious: https://www.aclweb.org/anthology/W16-1712.pdf

V-26y ago

I believe the future of AI, as showcased by this simple usecase, is not one central AI such as Google search engine recognizing the context, but rather each of us having a "smart assistant" with a personalized, trained understanding of the contexts that we mean.

And it's only that smart assistant that automates coping with the deficiencies of a one-size-fits-all central solution, finding me shirts with no stripes by using a rather dumb search engine. (Or "a pizza I would like", etc.)

dk89966y ago

I'm kinda late to this conversation but there are companies and Engineers trying to solve this problem basically adding more "semantics" to visual content. Good place to start is with this blog from Pinterest.

dailypeeker6y ago

Why is everything a git repo when it could have been a blog post?

jjjjjjjjjjjjjjj6y ago

Why do you think a blog post would've been better?

Minor49er6y ago

Because a repo is meant for source control. People are using the issue tracker to leave general comments. There's no reason to branch or fork a blog post. It pollutes the contributions metrics. That said, I don't really care how people use Github, but there are better blog solutions out there already that would make more sense to use. However, this aches of the old adage "when all you have is a hammer, everything looks like a nail"

obarthelemy6y ago

Reminds me of the the challenge:

"Don't think of a cow !"

What did you just think of ? A cow, of cowrse.

If you want a shirt w/o stripes, just google "plain shirt" or "dress shirt -stripes.

dtunkelang6y ago

As others have pointed out, most search engines don't support natural language search in general, let alone natural language negation in particular.

There are several reasons for this, including the following:

1) Natural language understanding for search has gotten a lot better, but it is still not as robust as keyword matching. The upside of delighting some users with natural language understanding doesn't yet justify the downside of making the experience worse for everyone else.

2) Most users today don't use natural language search queries. That is surely a chicken-and-egg problem: perhaps users would love to use natural language search if it worked as well or better than keyword search. But that's where we are today. So, until there's a breakthrough, most search engine developers see more incremental gain from optimizing some form of keyword search than from trying to support natural language search.

3) Even if the search engine understands the search query perfectly, it still has to match that interpretation against the documentation representation. In general, it's a lot easier to understand a query like "shirt with stripes" than to reliably know which of the shirts in the catalog do or don't have stripes. No one has perfectly clean, complete, or consistent data. We need not just query understanding, but item understanding too.

4) Negation is especially hard. A search index tends to focus on including accurate content rather than exhaustive content. That makes it impossible to distinguish negation from not knowing. It's the classic problem of absence of evidence is not being evidence of absence. This is also a problem for keyword and boolean search -- negating a word generally won't negate synonyms or other variations of that word.

5) The people maintaining search indexes and searchers co-evolve to address -- or at least work around -- many of these issues. For example, most shoppers don't search for a "dress without sleeves"; they search for a "sleeveless dress". Everyone is motivated to drive towards a shared vocabulary, and that at least addresses the common cases.

None of this is to say that we shouldn't be striving to improve the way people and search engines communicate. But I'm not convinced that an example like this one sheds much light on the problem.

If you're curious to learn more about query understanding, I suggest you check out https://queryunderstanding.com/introduction-c98740502103

I think, looking at shirt without stripes and shirt with out stripes in Google images, that without is decompounded, which then ends up giving you shirt with stripes, however the slight difference between the two searches "shirt without stripes" and "shirt with out stripes" is that the there are some exact hits mixed in also, so there are some results for "shirt without stripes" mixed in with the decompounded query.

Just my theory.

holdenc1376y ago

Wake me up when I can google "anything but crocodiles"

jpswade6y ago

Google search doesn’t work that way, it’s still based on how we link to things.

Nobody would describe a plain shirt as a shirt without stripes unless it’s within that context.

rubatuga6y ago

Also, what really angers me is when websites don't support the minus operator for search queries. It's a simple feature introduced decades ago!

https://www.google.com/search?q=plain+shirt

https://www.amazon.com/s?k=plain+shirt

on edit: https://www.google.com/search?q=shirt+-stripes

cosmojg6y ago

Ah, but you see, he specifically wants a shirt without stripes as dialectically opposed to a shirt with stripes. A plain shirt just won't do.

See: https://www.youtube.com/watch?v=wmJVsaxoQSw

kempbellt6y ago

Strangely, in the context of the "Coffee without cream" example given in the video, I can see it possibly interpreted as "Coffee, with cream on the side, but not in it - I'll pour it myself". It's a stretch of assumption, but depending on culture, it could make sense.

"Shirt without stripes" is harder to imagine it meaning "strips on the side. I'll add them myself".

Language is ever so complicated and contextual.

labster6y ago

If there's a dialectic here, does that mean there is a synthesis state between stripes and non-stripes? Is it polka dots?

phillc736y ago

The author doesn't want a plain shirt. He wants one without stripes. This includes harlequin shirts, shirts with logos, shirts with pictures, tie die shirts. All good, as long as they don't have stripes.

leephillips6y ago

So the - is working, contrary to the claims of several people here. The bottom line is that it is super easy to get results with shirts that don't have stripes, if you don't expect your search engine to understand English (which it would be wildly unrealistic to expect, and, besides, is not what you want).

ofTheMountain6y ago

While it is easy to get results based on certain keyword matching, Google has been publicly touting their ability to use NLP in search to better understand english. So this just shows a small limitation in their BERT models (https://www.blog.google/products/search/search-language-unde...). Doesn't make it any less impressive.

chrisan6y ago

Man that - operator is one of the first things I learned to help filter results.

These days either my searches have become better or their interpretation/context has as I rarely have to use that anymore

nardi6y ago

So the - is working, contrary to the claims of several people here.

I wouldn't say so. For the five images at the top of my search results, 3 of the 5 are striped, and 1 is plaid.

marcosdumay6y ago

Yeah... As expected, negating the stripes on that last google search makes no apparent difference for me, and "plain shirt" has an entirely different meaning from "shirt without stripes".

https://imgur.com/a/XBzfOsF

if it really makes no apparent difference to you then I guess Google has decided I'm somebody who cares about using the - operator and you're not and for this reason serves me greatly different results because shirt -stripes is quite different from shirt with stripes or shirt without stripes.

not that there aren't a couple of striped shirts in there, but nothing like otherwise. So I mean, there is an extremely apparent difference from me, even though there is not the perfect result you might wish for.

rammy12346y ago

Author's idea is the negative keyword. "Without"

neycoda6y ago

How does a search engine know whether you wanted shirts that didn't have stripes or results that contained the words shirts, without, and stripes?

ddebernardy6y ago

The correct query would have been "shirt -stripes". That works fine, or at least does on Google. But yeah, sentence parsing fail.

PeterCorless6y ago

Literally the first image that comes up for me is a striped shirt.

ddebernardy6y ago

No idea where you're from or what you did...

Might your search history be so that you're so contrarian that Google suggests contrarian results? :D

raindropm6y ago

Apparently, this kinda works in Thai language too(and I think other language also) The search keyword is "เสื้อไม่มีแถบ" which is literally translated as 'Shirt without stripes'. It's common words to speak, unlike 'without' in English.

The result, of course, show shirt with some kind of stripe, albeit not prominent like the English one.

vekker6y ago

I worked on an ingredient parser a few years ago. This exact kind of thing made things a lot more difficult than they seemed at first.

rjurney6y ago

The latest embeddings/networks like BERT can handle encoding this logic. They take the surrounding words in context when they're encoded.

Google can do this now, for example in a prototype. The tough thing is to get it to consumer-grade quality without messing up other searches. The QA process is utterly brutal because one weird search can be a scandal.

wordabby6y ago

On a positive note Google used to have trouble with a query like "words with q without u", now the top 5 pages at least all show the correct results, eg: https://word.tips/words-with/q/without/u/

CPLX6y ago

aj76y ago

Since I was a teenager, if someone energetically asserts a statement is “true” or “false,” I drop the true or false and evaluate the statement. In essence, their only communication to me is, ‘I think this is important!’ Often, why they think it’s important is more pressing than whether the statement is true.

adamredwoods6y ago

I wonder if this is a need for humans need to learn search queries. "-stripes" instead of "without stripes".

Or does input need to have basic filters applied before handing to ML? "without X" or "no X" = "-X"? Can be foiled with "shirt without having stripes".

leonardopucci6y ago

I think that query analysis in terms of volume of actual people using this query will show that very little people if any actually type "shirt without stripes". Once enough people do it, feedback is accumulated that results are bad (by CTR analysis), and results will auto-correct.

aaron6956y ago

Not sure if people actually search for "Shirt Without Stripes" or this was picked for academia over what is actually needed

But make a script that scrapes the top X results for these sites. Get your own AI / humans to rate it.

Make it competitive for these large sites <==> give them an incentive.

softwaredoug6y ago

In search we know it’s easy to cherry pick queries and criticize any search engine. A search engine is optimizing for billions of queries. Most of which are on the long tail.

The real question is “shirts without stripes” really a query people enter? Or representative of a real pattern in the data?

altfredd6y ago

> A search engine is optimizing for billions of queries. Most of which are on the long tail.

Citation needed.

As far as my personal observations go, Google is NOT optimized for long tail at all. It is always trying to return most popular results from cache of most popular results. Once the cache is exhausted, Google starts to return completely irrelevant trash (anything after first two pages of search is pure spam and meaningless keyword soup).

If you try to look up some obscure keyword and find nothing, try again after couple of months. There is a very high likehood, that you will see dozens of "new" results — most of them being from several years old pages. Perhaps, the actual long-tail searches still happen somewhere in background, but you are not going to see their output right away — instead you need to wait until they get committed to the nearby cache.

Another alarming change, that happened relatively recently (4-5 years ago), is tendency to increase number of results at expense of match precision. A long time ago Google actually returned exact results when you quoted search phrase. Then they started to ignore quotes. Then they started to ignore some of search terms, if doing so results in greater number of results. Finally, Google gained horrifying ability to ignore MOST of search terms. OP's example probably has the same cause — Google's NLP knows the meaning of word "without". But Alphabet Inc. can't afford to hose all those websites, that use AdWords to sell you STRIPED SHIRTS. This would mean a loss of money! THE LOSS OF MONEY!!!

b3kart6y ago

I think what the original poster meant is that the tail is really long, and it's hard to cover it, hence these bad examples.

saalweachter6y ago

It's representative of a real pattern in queries, albeit maybe not the most common one. You'd most likely see this sort of query as a followup. The user types in [shirts], gets a bunch of results, some with stripes and some without, and wants to only see the results without. So they modify their query from [shirts] to [shirts without stripes].

And since most search applications are basically just finding you the results with the most keyword matches, with a little bit of extra magic thrown in, the above is basically what you see.

These queries are basically the equivalent of optical illusions in cognitive psychology when studying the visual system -- seeing how the systems break tells you a lot about how they work.

Have you ever seen shirt dresses in your dress shirt queries, or vice versa? The search application isn't caring enough about bigrams and compound words.

Have you ever seen bowls in your bowling queries or fish in your fishing queries? The search application is over-stemming.

Natural language search is a real pain on any general purpose search application, particularly ones that have to deal with titles. The obvious simple fix to this query is to rewrite [x without y] to [x -y], but then when someone goes to search for [a day without rain] or [a year without summer], you are going to totally break those queries.

mirimir6y ago

Searching "plain shirts" does in fact yield results for shirts without patterns. And "paisley shirts" works too.

So it's not such a big deal that negation doesn't work.

Also, "shirts -stripes" does seem to work in both Amazon and Google. Or at least, I see no striped shirts.

moultano6y ago

Ya'll might be interested in this paper. https://arxiv.org/abs/1907.13528

> in particular, it shows clear insensitivity to the contextual impacts of negation.

cfv6y ago

While I'm sure this is A Hard Problem to solve by NLP I for whatever reason was under the impression that this is trivial to special-case.

As in, "X without Y" sounds like a common enough use case to have it's own little parser branch in places as big as Google or Amazon

hbosch6y ago

I mean, if I google the phrase "shirts -stripes" and click the Images tab I see mainly shirts without stripes.

So it's essentially the same input, and essentially the same expected output, but there must be quite a knot between understanding the word "without" and literally just using the - operator.

https://www.amazon.ca/s?k=shirt+without+stripes&ref=nb_sb_no...

realo6y ago

Tried this on amazon.ca (instead of .com) and got quite a different, but also amazing, result...

carapace6y ago

    shirt -stripes

> "Am I going crazy or is it the world around me!?"

Fishbone - Drunk Skitzo https://youtu.be/SaPGH4Yd_zc?t=231

(Apologies for the snarky low-content flip reply.)

arnaudsm6y ago

I wonder why this problem hasn't been resolved yet, considering we had NLP systems capable of this for a decade now. Maybe it's too hard to scale to production. Or Pagerank is still better most of the time. Or plain old monopoly and risk aversion.

atupis6y ago

My bet is that sites are generally optimized to a keyword search and this kind NLP search engine would return subpar answers most cases.

So I’d have to ask, is the problem the AI doesn’t intuit “without stripes”, find shirts that satisfy that condition (what kind of shirts? Dress shirts? T shirts?) and then do an image search identifing shirts and their quality of stripeyness

GnarfGnarf6y ago

Why not simply say "shirt -stripes" (negation in front of "stripes").

harimau7776y ago

It seems to me that the problem isn't so much that this search performs incorrectly. Rather it is that many search engines have removed the tools that allowed a user to specify exactly what they are looking for (e.g. shirt -stripes).

cachestash6y ago

Key question here, do all three even profess to using AI/ML in the search feature?

dpcan6y ago

Yes, exclusive search is a huge problem.

You have to know to search for "solid colored shirt", but when you can't think of this variation of search, or maybe there isn't one, exclusion is your only option, and it's broken.

rammy12346y ago

I see Bing is poor of the lot. It did not understand the "without" keyword

DangerousPie6y ago

Counterexample: https://www.google.com/search?client=firefox-b-d&q=Doctors+W...

crote6y ago

Is it, though? Try "Docters with borders".

kitplummer6y ago

Why is this a Github repo? I can't get past the abuse of a git repository.

sceptically6y ago

mv46y ago

Not surprising. Also, "shirt -stripe" might product better results.

jayd166y ago

This works quite well but don't add the quotation marks.

https://www.google.com/search?q=woman+without+makeup&tbm=isc...

MaysonL6y ago

Reminds me of the difficult time I had finding socks without elastic.

otikik6y ago

Poignant but accurate.

On Amazon's side of things I would also include the obnoxious "Hey you just bought a pair of sneakers so now I will change all your recommendations to sneakers".

tiborsaas6y ago

I get the author's point, but if you think about it, a search engine is a database that serves you results you want to see. Why should a search engine be fine tuned towards things you don't want see?

If it's meaningful for some reason, then it works:

If it's an user error (like a dumb query) it fails and it shouldn't be a surprise:

https://www.google.com/search?q=sea+without+ships&tbm=isch

l0b06y ago

"shirt -stripes" seems to work on google.com at least, even though they and others like DDG have been getting really bad at ignoring "-foo" terms recently.

_0w8t6y ago

Searching in Russian on Yandex gives the same ridiculous results.

ddlutz6y ago

Noticed something interesting, if you search for 'shirt without sleeves' in google images, you DO get sleeveless shirts. So why doesn't this work with stripes?

mr_toad6y ago

I’m guessing that BERT understands that ‘without sleeves’ and sleeveless means the same thing, and there are images labelled as sleeveless shirts.

But there probably aren’t many images labelled as stripeless.

I’m not sure why BERT doesn’t try shirt -stripes.

peter_retief6y ago

If you search for "plain shirt" its good If you add "plain shirt no stripes" it adds stripes Strangely "striped shirt" has some plain results.

lavp6y ago

For the google search, I get better results by typing "shirt -stripes". Still not perfect, but it's better than the seemingly redundant 'without'.

Despoisj6y ago

What about using the "-" sign to filter results instead of relying on complex language understanding?

=> "shirt -stripes" works pretty well on google at least

Skunkleton6y ago

Good thing search engines generally support a more machine-centric process for communicating intent. Try searching for "shirt -stripes".

We are in a funny place with UIs.

dvh6y ago

https://www.google.com/search?q=leap+years+in+1900s

prvc6y ago

On a related note, Google seems to favor forum replies which instruct users to perform a search in order to find the answer to the question that they had asked.

fortran776y ago

Here's a better example:

"shirt without sleeves"

That something that someone may actually search for. (At least the guys at my gym would!) And Amazon gets it mostly wrong.

melvinram6y ago

`shirt -stripes` gives the type of results one would expect. I guess we haven't reached that level of natural language processing yet.

willdearden6y ago

One time I ordered from stitch fix and asked for “a shirt which is not red, white, and blue” and got a red, white, and blue shirt.

wwarner6y ago

I thought the minus prefix instead of "without" would exclude "stripes", but it doesn't (any more).

__ryan__6y ago

Similarly, ask Siri to “play all of my music, except classical music”. Siri responds “OK, classical music coming up.”

activatedgeek6y ago

I'm sorry but what's the point being made here?

Search results could be better? Sure.

Can we find adversarial examples? Almost always.

impostervt6y ago

This should be the next captcha model.

hajimuz6y ago

It’s like the plot in the movie Inception. How to plan an idea “don’t think elepant” to an human idea?

flamtap6y ago

In the Amazon app, a search for “shirt without stripes” now get corrected to “shirt without strips”.

cgb2236y ago

When the Robot Wars come we’ll all be wearing striped shirts as camouflage to confuse their AIs

robbiemitchell6y ago

In my experience, semantic embedding is simply not very good at taking negation into account.

sceptically6y ago

Just a short question: Why does someone choose to publish something like that on github?

detaro6y ago

because it is easy for them?

https://www.nationalgeographic.com/animals/2019/09/zebra-pse...

skizm6y ago

Could this be an SEO opportunity to capture some simple negative phrases like this?

billiam6y ago

It's a shopping problem, not a language problem, according to these companies.

voldacar6y ago

Doesn't Peter Norvig work at google? maybe they should pick up his book

java-man6y ago

Bag of words does not work.

diegorbaquero6y ago

This is expected, not wanted though, I would expect some semantic analysis translated into "shirt -stripes", but what you really mean is "solid color shirt". This is a tough one but surely something that can be tackled with research

saagarjha6y ago

shirt -stripes does not mean "solid color shirt", as the - operator looks for that text on the page, instead of performing a semantic "I don't want stripes".

SAI_Peregrinus6y ago

And "shirt without stripes" doesn't mean solid color shirt either. Could have spots. Could be Hawaiian print. Is plaid striped?

leni5366y ago

stripes are not the only pattern besides solid color.

tempodox6y ago

Stripes seem to hold an irresistible attraction for impostor “AI”.

holdenc1376y ago

wake me up when I can google "Anything but crocodiles"

montroser6y ago

You're joking, but perhaps you're right. It's unnatural to search in the negative -- "solid shirt" or something like that would be more likely from a human with that actual intent.

This just suggests to me that real humans haven't issued that type of search query enough for the AI to know what to do with it. Which wouldn't be so big of a problem.

AA-BA-94-2A-566y ago

Did nobody tell OP you can search "shirt -stripes"?

paulftw6y ago

Are they also building AI cars that drive without accidents?

DonHopkins6y ago

Stay positive: "solid shirts" works just fine.

_pmf_6y ago

But muh "autonomous driving is almost there".

longtermd6y ago

A pretty accurate descriptions of the State of AI ;D jk

downshun6y ago

It's called plain, as in plain shirt. PEBKAC

PeterCorless6y ago

Someone needs to learn how to use the ¬ operator.

dvduval6y ago

Most important is what the advertisements at the top show. The organic results are so yesterday. The Google Ads AI should already be teaching you that All you base belong to us.

ChrisArchitect6y ago

hm, 'without' is a tough one. You're not looking for a zebra without stripes. You're looking for a horse.

moron4hire6y ago

Not necessarily

myhf6y ago

Try searching for a "shirt without zebras"

jlmcguire6y ago

without must be a tough one? It does seem that bing is the worst at figuring this out from the pictures.

lucb1e6y ago

None of them make an attempt at filtering this. While I found it amusing because they advertise with AI magic in other areas, those search engines all advertise to be keyword searches and one of them being "the worst" means it's either random (since none of them are trying) or they are the best (since it matches the stripes keyword).

To be honest I wish they actually were keyword searches and that the machine doesn't try to be smarter than you. Many times when I carefully specify which keywords must appear on the page, it'll ignore parts of the query or add unrelated synonyms. Usually one can work around it with operators but it's tedious and doesn't work reliably.

kleer0016y ago

"shirt -stripes" works, wtf?

softwarejosh6y ago

its obvious but the solution is shirt -stripes until we make ai interpret attributes

sparrish6y ago

Learn you some Google-foo. Don't say what it isn't. Say what it is. "shirt solid color"

spelunker6y ago

Google-Fu == working around machine learning limitations which is the point of this post.

saagarjha6y ago

> Learn you some Google-foo.

That is, learn to structure your query in a way that Google understands what you're trying to say. This used to be what yuo had to do, but now that Google tries to understand the intent of what you're trying to say and advertises as such, it's clearly a hack.

thebean116y ago

But that would exclude polka dots and floral patterns..?

oliveshell6y ago

That’s obviously not the point here.

danielovichdk6y ago

Result without correct match!

noughtme6y ago

Probably not surprising that most people don’t know, but negative keyword search works on all these platforms:

shirt -stripes

coldtea6y ago

That wouldn't show pictures of shirts without stripes.

It would show pictures of shirts in pages that don't mention the word "stripes", whether the shirts have stripes on them or not.

In other words, it has little to do with what the article wants to show...

at_a_remove6y ago

This sounds great, right? But consider the recommendation engines that are part of Amazon, Stack Overflow, whatever. You've just filtered out pages that have a shirt without stripes on it ... but have a striped shirt somewhere in a recommendation bar on the page.

I've run into so many variations of this. You can search for something only to have the recommendation/related results embedded on whatever page to throw off your results one way or another.

I genuinely think that whatever standard HTML/XTHML is at ought to have, either as an attribute or a semantic tag, some kind of "related" or "recommended" ability to set that content apart. My cynical thought is that even if it were adopted, it would probably get abused in some fashion.

derekp76y ago

It would be really nice if the search engines would automatically apply the "-" when it sees the work "without".

oliveshell6y ago

I’m not so sure.

What about a search query for “Doctors without Borders“ or “Men Without Hats“?

Surely interpreting “without“ as the negative operator would ruin those searches.

noughtme6y ago

it actually does work though. keywords will need to be generated whether manually, from the manufacturer, or from image analysis.

qayxc6y ago

Search engines are indexing content based on [edit]STATIC [/edit] keywords, though.

The author's intent exceeds both the capabilities and intended use case of search engines.

The query "shirts without stripes" if interpreted by human would require any search system to not only analyse the keywords and tags (of the products/images), but also the content, which is an infeasible task given its dynamic nature.

So the author wants: select all shirts where content analysis of returned images yields no stripes.

This is a context-sensitive image/product search based on arbitrary, dynamically created criteria and shows that user isn't aware of what the search functionality does as opposed to exposing weak "AI". [edit]To clarify: you cannot add all possible keywords/criteria in advance[/edit]

eggie56y ago

simple LTR w/ clickstream data would fix this easy

arkanciscan6y ago

BUT THEY HAD MILK!

kebman6y ago

shirt -stripes Thank me later. :D

aaronsnoswell6y ago

Brilliant :)

renewiltord6y ago

This is amusing but not a problem.

inopinatus6y ago

Tortoise: But we must be careful in combining sentences. For instance you’d grant that “Politicians lie” is true, wouldn’t you?

Achilles: Who could deny it?

Tortoise: Good. Likewise, “Cast-iron sinks” is a valid utterance, isn’t it?

Achilles: Indubitably.

Tortoise: Then, putting them together, we get “Politicians lie in cast iron sinks”. Now that’s not the case, is it?

---- Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books, 1979

WilliamEdward6y ago

querying "shirt no stripes" yields slightly better results.

JabavuAdams6y ago

bag of words?

meritt6y ago

The point of this isn't asking how to apply boolean search operators, it's showing that the largest AI-focused companies in the world absolutely suck at NLP.

packetlost6y ago

Why would you really apply NLP to a search engine though? Generally speaking a weighted keyword search is good enough 95% of the time and requires significantly less resources to perform.

binarymax6y ago

I work specifically in this field with clients, and deliver training on applying NLP to search.

You’d be surprised how effective NLP is for use when identifying query intent, and pulling out modifiers that should apply as metadata filters.

Weighted keyword search works a lot, but it fails hard for many long tail queries (especially in e-commerce and other attribute heavy domains).

IMO there really isn’t a good excuse for these firms to fail at queries like this. The query itself isn’t particularly difficult when using a decent NLP stack and following well known practices.

michelpp6y ago

Because you want to sell shirts without stripes to customers who want them.

headcanon6y ago

From a product perspective, I would say there is a reasonable expectation that a customer will provide that query and expect the results to come back with plain shirts. Anything different is a degraded customer experience. Sure, a technical user will understand which queries to provide better, but 90% of customers won't have that skillset. Its our job as engineers to serve those people, and the queries they provide.

So NLP is totally a thing you want to have in search. Arguably, its the whole point of search as it exists now.

PascLeRasc6y ago

Agreed. I love how I can go to Men's Wearhouse and tell the salesman:

"Shirts"

"Polka dot shirts"

"Floral shirts"

"Wikipedia list of clothing patterns"

"Houndstooth shirts"

onion2k6y ago

Why would you really apply NLP to a search engine though?

If you go to Google's homepage and click the microphone at the end of the search input box you can search by speaking. All it does is convert to speech to text, but it implies you might be able to search in a more "natural language" way.

aglionby6y ago

Depends on your audience, but I imagine many find the answer boxes on Google search pretty useful. Getting the population of a city without having to click any links is probably good for your perceived value. For this you need some NLP tech to extract intent from the query and match it to the right entity in their knowledge graph (in addition to something to help you build the graph in the first place).

Google have a blog post from October last year with some more complex examples of where more sophisticated NLP helps https://www.blog.google/products/search/search-language-unde...

rjurney6y ago

Because there's something called conceptual search that people are dying for to find what they actually want.

uoaei6y ago

I agree, I think it's a better idea to improve the knowledge management (in terms of a relational hypergraph) before trying to create the bridge between the natural/digital knowledge representations. I have a hunch a good knowledge management system would be more amenable to being queried using natural language since relations are probably constructed similarly to the patterns we see in natural language syntax.

zymhan6y ago

Because not everyone thinks in search operator keywords.

gravitas6y ago

Certain query styles which the NLP operators help clarify. Recent personal examples: "is butyl rubber an organic or inorganic compound?" and "which gloves are best for both acetone and nitric acid?"

tsukurimashou6y ago

Because that's how people use it and kind of marketed as such

balnaphone6y ago

Perhaps it's considered "good enough" because users balk at performing NLP-style queries when they find they almost never work properly.

DenisM6y ago

5% revenue sales and revenue increase? Why indeed!

notatoad6y ago

all three of these search engines offer "voice assistant" platforms that encourage you to speak to them in natural language, and send your query to the search engine verbatim under a broad set of circumstances.

waynecochran6y ago

Every systems sucks at NLP. It is an extremely hard problem and we are nowhere near matching the human cerebral cortex.

binarymax6y ago

Generally, yes. But you can go very far and cover a lot of cases when you stick to a narrow domain and common patterns that customers of your product will typically use.

Justin_K6y ago

I'd go as far as saying "AI" is only as good as someone has taken the time to program the use case.

gumby6y ago

Yes, basically 1990s Expert Systems but with even more manual labor.

alexpapworth6y ago

So, it's really just a series of if statements?

5 more replies

scabarott6y ago

Huh? It's supposed to be the opposite, hence the "I"

[1] https://www.youtube.com/watch?v=bwDrHqNZ9lo

scabarott6y ago

Or maybe there's just not a lot of AI in their main search product (for whatever reason). They seem to be pretty good in other areas (Translate, Cloud speech-to-text, Alexa etc.)

still_grokking6y ago

For the matter of translation they actually aren't pretty good. DeepL outsmarted all of them (at least for the supported languages). Given the resources thrown at this problem by the different companies I would even say the results of Google, MS and such are actually quite disappointing. (And I don't think one could say "but this companies did the upfront work" as the basic ideas are something 60 years old).

dbliss6y ago

It's not as simple as NLP. The shirts them selves have to labelled as having stripes or not. If striped is not a attribute of the shirt, it doesn't matter of the NLP can parse the meaning of striped.

crimsonalucard6y ago

In terms of AI, the following is literally the best I have ever seen and it's not even done by a professional (meaning you can make it too):

https://aidungeon.io/

aerovistae6y ago

This comment would go from "unreadable" to "interesting" if you had phrased it as:

"Vaguely similar to a joke from _the movie_ Ninotchka that _the Slovenian philosopher_ Zizek often uses...."

Give people context. Don't assume people know what you know.

dang6y ago

We detached this subthread from https://news.ycombinator.com/item?id=22925490 and marked it off topic.

baddash6y ago

Saying this is unreadable is untrue. Also, I agree clear communication is important, but in this case it is highly trivial; if you don't know something that someone is referencing, a quick Google search immediately clarifies it. This is much like the fight couples have over keeping a toilet seat up or down: it doesn't really matter what you do or who does it. Because of that, you come across as querulous.

As an aside, I think that there are much more important factors to consider in regard to clarity and readability. I believe some of those are accurate articulation and logical tractability of ideas communicated.

oh_sigh6y ago

How is the comment more or less readable if you know that Ninotchka is a movie and Zizek is a Slovenian philosopher?

The difference between coffee without cream and coffee without milk is the same whether Ninotchka and Zizek are the things described above, or if they are a city and a taxi driver.

stuartaxelowen6y ago

This feels like a perfect example of how contextual natural language is!

ForHackernews6y ago

Somebody someone will developer a hipster AI capable of deciphering lefty twitter.

paganel6y ago

People on an website like this should at least have a general idea of who Zizek is. I've never yet read any of his books but as a guy living in the Balkans this short clip [1] of Zizek explaining the geography that separates Central Europe from the Balkans is one of the best things on YT.

yters6y ago

Human level NLP is the halting problem, so unsurprising that AI cannot do simple expressions.

lostmsu6y ago

You got it backwards. This is the opposite of the problem I and many other people tend to have with search engines lately. I do not want the damn thing to combine words, exclude unpopular ones, and search for synonyms without me telling it to explicitly. As little as you can semantics please.

If Google wants to group words by semantics, they should have a semantical grouping operator. For example "shirts (without stripes)". What if I am looking for a song text with these exact words in random positions?

If what author wants was implemented, it would make my experience with Google even worse, unless it could think for me also. But then why would it need me in the first place?

j / k navigate · click thread line to collapse

617 comments

DenisM6y ago

Zooming out, the language field breaks into several subfields:

- A large group of Chomsky followers in academia are all about logical rules but very little in the way of algorithmic applicability, or even interest in such.

alberth6y ago

The # 1 google result for “Shirt without stripes” is this very own HN post

https://www.google.com/search?q=%22Shirt+without+Stripes%22&

krick6y ago

Aardwolf6y ago

> but you don't know where it belongs

Yes, the English grammatical rules make it unambiguous where it belongs. This is solvable.

glup6y ago

As a postdoc in computational linguistics, my go-to example for talks is asking Siri not to show me the weather.

https://www.google.com/search?q=shirt+-"stripes"

ardy426y ago

snarfy6y ago

It seems like the attribution problem is an English problem. The query doesn't have to be English.

zelphirkalt6y ago

> you have a "no" or "without" in the sentence, but you don't know where it belongs

Well, one could argue, that it belongs exactly where anyone entering the query put it. Before "stripes".

The problem is often, that search engines try to be too clever, while not offering any kind of switch "exactly those words in this order" and that is just a bad user interface.

conanbatt6y ago

Believing this is a problem of attribution, I would expect to see results that are shirts without stripes, or not-shirts with stripes.

If it just disregards the word without, well, that's pretty bad.

KorematsuFred6y ago

I will not be surprised millions of dollars are being lost because of this substandard query result per year.

aero1426y ago

https://www.aclweb.org/anthology/P14-1007.pdf

animalCrax0rz6y ago

code: https://github.com/ffancellu/NegNN

DenisM6y ago

Thank you, I knew my effort was not for naught!

joshspankit6y ago

Yes, you have points, but they break down here:

“Shirt -stripes” is unambiguous to a system, yet the first result on Amazon(.ca) is a striped shirt, and the 3rd is sweatpants.

peterwwillis6y ago

Myrmornis6y ago

njharman6y ago

less than lay person, but in your example

> "no evidence of cancer" and "evidence of no cancer" are very different things.

Why is it not as simple "no belongs to the word it precedes" ? like unary operator, ! (not), in typical computer languages.

danielscrubs6y ago

Reminds me of the confusion with negatives and languages: -Vill du inte ha glass? [Don't you want ice cream?] -はい [Yes]

Does she want ice cream? Answer: No, she doesn't. I added a not, so she's reversing the answer as Japanese people do.

The number of times I've been dumbstruck by this is larger than I'd like to admit, and I'm a coder.

mikorym6y ago

I don't think this about a lack of interest.

There have been some lengthy discussions on HN about vertical search and how Google doesn't always buy up a small company; they litigate.

rjurney6y ago

The latest embeddings/networks like BERT can handle encoding this logic. They take the surrounding words in context when they're encoded.

smoyer6y ago

mcbits6y ago

I don't think this particular problem is related to the language model. "[item] without [attribute]" is trivial to understand even without a sophisticated language model.

jodrellblank6y ago

smoe6y ago

Is it that odd though? I'm not a native english speaker, but I would say "shirt without patterns", not solid. Or for example "without visible logo"

derivativethrow6y ago

Consider the query, "non-glass skyscrapers", which suffers from the same problem.

What do you call a skyscraper like that if you want to refer to it? They exist, but you can't find them using that search term on Google.

angara6y ago

Fascinating, do you have any links to papers about machine-verifiable formalisms?

onurcel6y ago

The point of the OP is that they claim they understand everything. Example: https://www.blog.google/products/search/search-language-unde...

rgovostes6y ago

Alex39176y ago

At least this is relatively innocuous. Until recently if you did a Google Image Search for "person" or "people", it only showed white men.

belorn6y ago

Is it surprising that very few of the result surprises me?

bartread6y ago

The net result of that Google search, combined with the "Shirt Without Stripes" repo, leaves me even more unimpressed with the capabilities of our AI overlords.

9 more replies

wonderwonder6y ago

Thorentis6y ago

_kst_6y ago

A former coworker had the last name "Person". They once received a letter (snail mail) addressed to "TheirFirstName Human".

I never figured out what kind of mistake could have led to that.

shanty6y ago

Symmetry6y ago

Huh, I tried that with 'people' and the first result that was all white was #15, first result that was 100% men was #8.

If I search for 'person' it's a mixed-race woman, then a white woman (Greta Thurnberg), then a white man.

afiori6y ago

Many interpreted this along tribal lines, but likely it is that there is constant tuning and lots of complex constraints.

[1] not to say that you implied the reason was racism, but often it is attributed to something along those lines

sbierwagen6y ago

jimmaswell6y ago

Did they manually skew the results of the algorithm once this started making bad PR?

205guy6y ago

And when I search for "men without hats" I see men from Men Without Hats with hats. Language is hard.

fouc6y ago

DDG does pretty good for "person" or "people"

foolinaround6y ago

What is your point?

The google image search you did -- did not provide incorrect answers, unlike the OP's

alex_duf6y ago

Wouldn't that be a reflection of the world's bias rather than Google's bias?

iconjack6y ago

Google American Inventors and you'll get 95% black men.

globular-toast6y ago

I hate comments like this that only exist to create drama.

apocalypstyx6y ago

“I’m sorry, Monsieur, we are all out of cream — how about with no milk?”

diegoperini6y ago

In Zizek's words, white coffee without milk isn't black coffee.

acdha6y ago

smsm426y ago

oconnor6636y ago

Searching for "now one with stripes this time please" yielded similarly disappointing results :)

Edit: "stripes" not "stripped" ugh

devy6y ago

Google's result is noticeably better though. :)

[1] https://en.wikipedia.org/wiki/Commonsense_knowledge_(artific...

BickNowstrom6y ago

That point is akin to stating: These three companies have not solved the hard problem of common sense [1], so are not allowed to advertise their AI without looking silly.

Nobody has solved the common sense knowledge problem yet. A solution for that would qualify as Artificial General Intelligence and pass the Turing Test.

ckarmann6y ago

Does Amazon pretends to do AI? They are just offering a platform to do your own Machine Learning. I don't think they ever said their search engine was doing anything smart.

EDIT: scrap that, I didn't mean Alexa, which is doing AI obviously, but the search engine of Amazon's retail website.

5cott06y ago

alexa, downvote this.

Der_Einzige6y ago

ML and AI are the same thing

You're right the NLP is hard, but not everyone sucks at it.

https://www.google.com/search?q=tie+dye+shirt

exhilaration6y ago

sib6y ago

"Shirt without stripes" may (or may not) be an unusual word choice to enter into a search engine, but it's definitely one that a child would understand.

Additionally, "shirt without stripes" is not the same as "solid color shirt"; as an example, take a look at:

LanguageGamer6y ago

Yes, that exact sequence of words isn't particularly common. And yet a child, even if they have hey have never been exposed to it, has no problem understanding what it means.

Whereas all these services seem to be processing the input in such a superficial way that they give the searcher results that aren't just inaccurate but are the opposite of what was asked for.

chrisseaton6y ago

> "shirt without stripes" is an unusual word choice

Lol what? These are words a toddler would understand.

aguyfromnb6y ago

>I disagree, who says "shirt without stripes"? That's an unusual word choice, not one that our ML algorithms would be optimized for.

If your "ML algorithm" doesn't understand straightforward language, how is it any better than a couple if-then statements?

Beyond that, I'm unsure how you think "<something> without <something>" is at all unusual or difficult to decipher.

zacksinclair6y ago

Every human understands that phrase and yet the AI doesn't. That's the gap that has to be fixed.

bootloop6y ago

I don't want to teach myself how to talk to a machine. I want the machine to understand what I am saying.

arcturus176y ago

I searched "plain dress shirt" and similar terms on Amazon and Google, and they returned plenty of shirts with stripes and checkered patterns.

How am I supposed to explicitly search for a shirt without stripes, then?

seiferteric6y ago

floatingatoll6y ago

nikanj6y ago

Your search for "skiing Norway" mostly returns results for skiing in the French Alps, because those pages have much higher visit rates.

Google is a dumbass nowadays, and regularly ignores half your search terms to present you with absolutely irrelevant results, that have gotten lots of visits in the past.

sonar_un6y ago

6gvONxR4sf7o6y ago

Lammy6y ago

I think it's disingenuous to say "the web is basically google's adversary" when Google AdWords is the reason so many pages fight for top ranking.

fastball6y ago

Nextgrid6y ago

SEO'd garbage often contains ads, including Google ads. There's no incentive for Google to fix this problem.

If Google was a paid service this problem would be solved the next day. Oh, and Pinterest would completely disappear from Google too. :)

ori_b6y ago

Google has gotten worse for me BECAUSE of the stuff you're talking about: It used to search everything and find the words that I cared about.

They'll even show this off by bolding the words I didn't want to search for.

So, if I'm looking for something that isn't popular -- duckduckgo it is. It doesn't do this kind of rewriting, so my queries still work.

XCSme6y ago

https://trends.google.com/trends/explore?date=all&q=solid%20...

4gotunameagain6y ago

Imagine if somehow it would be possible to pick any release and search with that.

transreal6y ago

Searching "men without pants" versus "men with pants" gives much better results.

Plain shirts/solid shorts are the most common way to refer to these, and people seem to be searching this way:

Regarding moving towards natural language processing - the "without" part is not as important as knowing the context.

Unless Google starts asking questions back, I don't think there is any way it can give you what you want right away.

pbhjpbhj6y ago

Thankfully, "men without pants" shows me exclusively men wearing pants, underpants that is, as I'm in the UK.

Searching "pants" only shows me "trousers", that's a big fail for Google IMO, I'm accessing google.co.uk.

dkdbejwi3836y ago

Similarly, "sandwich with cheese" is probably going to return more relevant results than "sandwich without scorpions"

wkyle6y ago

The joke from Zizek: https://www.youtube.com/watch?v=wmJVsaxoQSw

albertzeyer6y ago

To extend on that, you can think of the human brain as just another (powerful) statistical model.

animalCrax0rz6y ago

pbhjpbhj6y ago

dhimes6y ago

atq21196y ago

londt86y ago

I tried to search for "cheese without holes" on Google and it yielded good results. I think the problem here is that the query is something people would rarely search.

albertzeyer6y ago

BickNowstrom6y ago

> Why should it not be possible to solve this with statistical methods?

Doing just that for 10 years, beating hand-coded systems: https://www-nlp.stanford.edu/pubs/SocherLinNgManning_ICML201... [pdf]

> I would guess that most modern NNs from the NLP area (Transformer or LSTM) would be able to correctly differentiate the meaning.

Yes. See demos like: https://demo.allennlp.org/constituency-parsing/MTczNjYyNA== and https://demo.allennlp.org/dependency-parsing/MTczNjYyNg==

> I think there is no fancy NN (yet) behind Google search,

During the deep learning boom, Google made a huge push towards NN-based NLP. SEO's and their PR calls their efforts collectively RankBrain: https://en.wikipedia.org/wiki/RankBrain

DenisM6y ago

This very question was the subject of a lengthy debate between Norvig and Chomsky. I won't rehash it here, but here's a glimpse:

Chomsky: Statistical analysis of snowflakes falling outside the window may predict the next snowflake, but it will do very little for weather prediction, and nothing for climate analysis.

Norvig: Give us enough data and we will get close enough for all practical purposes.

rhizome6y ago

>The model just needs to be able to understand the important meaning of "no" in here, in the context of the whole sentence

It's easy to say, isn't it? Unfortunately, sticking the word "just" in there doesn't affect the difficulty. I do it all the time, too.

That said, "meaning" is not statistical.

ScoutOrgo6y ago

The OP most likely isn't up to date on latest NN architectures which can learn from the ordering of words. Statistical methods <> ML, not anymore in NLP.

https://en.wikipedia.org/wiki/Allopathic_medicine

caust1c6y ago

My favorite was "What do vegetarians eat" which was broken for years: https://twitter.com/Caust1c/status/855193855422943234

gumby6y ago

I'm significantly more concerned about what humanitarians eat, especially in today's economy.

GuB-426y ago

Fun experiment on Google:

- Shirt Without Stripes: shirts where the description contains both "without" and "stripes". Example: a shirt without collar, with stripes.

- "Shirt Without Stripes": a mess, with and without stripes, suggesting an unusual search query. In fact, the linked article site is the first result in web search.

- Stripeless shirt: sexy women in strapless shirts

- "stripeless shirt": pictures of Invader Zim...

- "stripeless" shirt: mostly shirts without stripes, but there are some shirts with stripes that are described as stripeless...

knodi1236y ago

> The reason was simply that that exact word was rarely used in other contexts

To prove her point, she did an internet search for allopathic medicine and showed me how the majority of the results were negative.

Just a humorous anecdote, not trying to start any conversations about the relative value of different medical paradigms.

bentona6y ago

Sometimes I wonder how much my brain has changed to use search engines / how much of it is dedicated to effective googling. Makes me feel like a cyborg.

Saaster6y ago

I think you're overestimating Google's sophistication.

jjeaff6y ago

Exactly. And how would Google know whether this would improve add recenue? They have never created and tested it.

burger_moon6y ago

I suppose that's a base of why 'Google fu' and 'Googling skills' are such common phrases.

Knowing how the machine will interpret humans is just as important to finding your results.

0x000000006y ago

The other day I found that googling something like “mesothelioma -lawyer” will exclude results with “lawyer” but all the ads will still be for lawyers and contain the word “lawyer”.

I guess because they leave it up to the advertiser to determine the negative match words and that seems to always have priority.

tantalor6y ago

https://help.twitter.com/en/using-twitter/advanced-twitter-m...

ken6y ago

That statement was true of every past search engine, too. AltaVista must have been great and its severe limitations must not have affected revenue, or else they surely would have improved it.

Everything is for the best, in this best of all possible search engines -- the Candide fallacy.

bcoates6y ago

What exactly would happen if it was a material negative impact on their revenue and they didn't fix it?

If Google isn't under survival pressure to get better (and they aren't) the incentives aren't aligned for them to improve or even to not get worse every year.

If Google is failing first gradually then suddenly it might not even be within the institutional power to notice how bad it's become before it's too late.

MattRix6y ago

There are lots of things Google could do improve their search and ad revenue that they haven't done yet.

prepend6y ago

This is the issue I think. Amazon doesn’t care about selling a shirt without stripes, it is just as happy as selling any shirt.

This assumes that AI wants truth. These three companies AI don’t necessarily want truth, they want revenue.

eru6y ago

A customer who searches for a shirt without stripes is probably more likely to buy shirts without stripes than shirts with stripes?

Tade06y ago

My Operating Systems professor (Tomasz Jordan Kruk, PhD) in college had an appropriate anecdote for this:

"Humans usually don't intuitively understand the word 'no'. Please imagine a non-pink elephant."

davesque6y ago

Maybe you mean, "Please don't imagine a pink elephant." Imagining a non-pink elephant seems pretty easy.

natefox6y ago

Along these lines, I once heard somewhere that people do not process the word 'dont'. As a coach, I've had to shift my vocabulary to focus on the 'do's rather than the 'dont's

throwawaymsft6y ago

Easy, after first imagining the pink elephant you aren't supposed to.

marcosdumay6y ago

If the first thing that comes to your mind when you hear "imagine a non-pink elephant" isn't either a pink elephant or "WTF, how is a pink elephant?", then you are in a very small minority.

BaronSamedi6y ago

The funny thing is that this search is pretty easy to put into first order logic (shirt(x) & ~striped(x)). I guess we now have computers that are bad at logic.

captainmuon6y ago

It would be great if I could add negative keywords to a website, or mark text as "don't index" or "index with a negative weight". But probably, people would game this in ways I can't imagine.

cscurmudgeon6y ago

A few years back (in around 2012) I attended an NLP talk.

The theme of this talk was how they did a study that showed prepositions and articles do have meaning. A big deal was made out of the results.

I think things like this happens when people consider engineering approximations such as bag of words to be the truth over time.

blahedo6y ago

c3534l6y ago

disqard6y ago

ChuckMcM6y ago

Q = "shirt" AND NOT "stripes"

When people get sucked into search it is this kind of problem that they spend a lot of time and debate on :-)

ggggtez6y ago

Perhaps, but would you really say "Hi, I'm wearing a shirt without stripes"?

Do you go to a store to buy it along with your Pants Without Suspenders, Socks Without Animal Print, and other items defined purely by what they don't have?

VohuMana6y ago

rbetts6y ago

This is also confusing what you search for vs. what the vendor thinks you will buy. Product catalog searches often intentionally return items outside your search parameters.

I would never search for something this way. If I wanted to find a 4WD car, I wouldn't search for "cars without 2WD."

Likewise, here, I would search for solid-colored shirts.

And these services are limited to the content/terminology utilized by the cataloged sites/products.

If I am selling a "black shirt" or a "solid black shirt," it is not google's job to catalog it as a "shirt without stripes," unless I advertise it as a "black shirt without stripes."

I would use natural language to test a services' NLP ability.

jorvi6y ago

Let's assume the point of the OP is that Google sucks at understanding this search that most people would logically understand.

But let's also logically assume that most content on the web is telling the visitor what their page or product is, not what it is not.

I would expect a shirt to be advertised and described as what it is: “black shirt” not “shirt without stripes.”

So if a content creator does not include the exact term “without stripes” in the description of shirt, then you are relying on google to infer meaning on your behalf and the content creator.

danpalmer6y ago

Your car example makes sense because there are pretty much only 2 options.

I think searching "shirts without stripes" is very legitimate in fashion.

I say this having built a clothing search function for the company I work for, and one that does not support this sort of query.

So the one you built will exclude shirts with stripes if "shirts without stripes" is typed directly into a search box? Or there are secondary filters to omit and/or only show certain patterns?

wooders6y ago

We're a company coming out of the YC W20 batch working on the product attribution problem http://glisten.ai/.

schmichael6y ago

I'm guessing one of those reflections looks like a turtle? Or maybe a pattern on the floor, wall, or rug?

Although there are examples where I'm unsure if the AI is dumber than my 4yo or smarter than me. This is a result for "truck": https://i.imgur.com/JcgXZAG.jpg

dicytea6y ago

I can almost see the left slipper looking kinda like a head of a turtle

antman6y ago

A good display of the current state of search engines.

nmstoker6y ago

noworriesnate6y ago

blondin6y ago

the icab link from yesterday showed a screenshot of amazon when their categories listings were still meaningful and helpful. nowadays, it's just a pain.

it's not just search that's hard. categorization is also an issue here.

leephillips6y ago

The results are exactly what I would expect and hope to get.

enumjorge6y ago

When you search for “shirt without stripes” you expect and hope to get shirts and t-shirts with stripes?

joshmn6y ago

Proof that SELECT with GROUP BY doesn't work if your tags aren't correct.

Joking aside, it doesn't surprise me that this isn't being picked up — aren't most of these AI teams more R&D than actual public-facing? Maybe I'm just cynical though.

dEnigma6y ago

js26y ago

Weirdly, searching for

'shirt no stripes'

on Google returned this web page at top of the organic results.

So at some point, searching for a shirt online will involve this conversation. Even more confusing.

(Although I expect my filter bubble will play a part in that)

my point was that this very web page on HN got to top of Google search in 42 minutes or less on this search term

'''

elsamukoOP6y ago

I was searching for a DisplayPort adapter for a Mac Mini and got MiniDisplayPort adapters for Mac :(

jjnoakes6y ago

They are different though - shirts with patterns (hawaiian, tie-dyed, etc) are "shirts without stripes" but they are not "solid-colored shirts".

throw345hn6y ago

After that I facepalmed myself and turned it off.

lokedhs6y ago

Every once in a while I try the voice recognition by trying to speak normally to it. Normally, as is saying things like: "please set a reminder for five... umm... no I mean 6 o'clock".

Granted, it's been a couple of years since I last tried so maybe they're better now.

kator6y ago

Joe: "Hey is Lisa back from vacation?"

Larry: "I saw a red Lamborghini in the parking lot!"

partomniscient6y ago

"shirts without stripes" results: https://www.amazon.com/s?k=shirts+without+stripes&ref=nb_sb_...

"shirts -stripes" results: https://www.amazon.com/s?k=shirts+-stripes&ref=nb_sb_noss_2

So basically the AI doesn't convert "without x" to "-x" even though the basic capability needed is there. This is why AI is a hard problem, especially when it meets the real world.

It's 2020 and we're still quibbling about the terminology used in SQL, what did we expect?

mabbo6y ago

Given that pattern, I suspect we could see much better results in cases like this.

slaymaker19076y ago

bityard6y ago

Most non-tech queries I put into google are even _more_ useless as the results tend to fall into these categories:

  * Wikipedia (okay for _very_ general things, useless for domain-specific knowledge)
  * SEO-enhanced blogspam, (a.k.a. "8 Weird Ways to Earn Millions Through Gaming The System!")
  * Tweets on twitter (!)

The dev/tech industry desperately needs a search engine that somehow prioritizes _quality_ content, not one-off answers, blogspam, and tweets.

hrktb6y ago

I am amazed at the amount of things people are willing to treat as "solved problem".

At a point there was a TED talk explaining social networks were a solved problem now that facebook was dominant. Recycling was seen as solved problem until it wasn't etc.

I wonder how many actual "sovled problems" we have.

scabarott6y ago

Reminds me of this: https://www.amazon.com/End-History-Last-Man/dp/0743284550

alanbernstein6y ago

Or you can do this: https://www.google.com/search?q=shirt+-stripes

sulam6y ago

The '-' operator stopped working for me with Google a while back. Not sure why, it's fairly frustrating.

speedgoose6y ago

chickenpotpie6y ago

flixic6y ago

You can also do "plain shirt", but the repo is making a message about the smartness of our search engines.

phalangion6y ago

Still includes some shirts with stripes, since the search is now just for shirts that don't explicitly mention stripes.

leephillips6y ago

That's not what the - operator is supposed to do. It's supposed to suppress pages containing the keyword after the -.

saagarjha6y ago

That's not right, because that would not include pages that have the text "shirt without stripes" on them.

staz6y ago

doesn't works

varelaz6y ago

pugworthy6y ago

Interestingly, Google can handle these searches just fine...

"birds without flight"

"cars without wheels"

"cats without tails"

"dogs without hair"

"intersections without lights"

"poems without rhyme"

"shirts without collars" (also "sleeves", "shoulders", "buttons", "logos", "pockets", and more)

lolc6y ago

That's because all these things are readily labelled by humans as such. So they don't have to understand the sentence. Just match it.

gowld6y ago

That's because these examples are commonly described in text with those words, but "Shirts without stripes" are not.

imgabe6y ago

Humans can kind of make some assumptions based on context, but it's really just a poorly defined, vague query.

What if you walked into a store and asked an associate for a shirt without stripes? What would you get?

Probably some further questions for clarification. What about checked shirts? Floral prints? Plaid? Do you want no pattern at all? T-shirt? Polo shirt? Dress shirt?

What if you went to google and said "Show me all the webpages that aren't about elephants"? Sure, you'd get something, but would it be anything useful?

hombre_fatal6y ago

This is a good example of the bar HNers must have these days when they bafflingly assert that Google is somehow getting worse from what they remember.

Google has gotten better, it's just HNer expectations that have changed as they expect more and more magic.

For example, the subtitle on the repo is "Stupid AI" when this query has never worked in these search engines, and it won't anytime soon.

Nextgrid6y ago

> has never worked in these search engines

I would expect there to be an e-commerce site or blog post somewhere containing a page with the exact title "shirts without stripes" and I'd expect it to be the first match.

rjurney6y ago

kinkrtyavimoodh6y ago

On a meta note, I am a bit tired of HN submissions being used more as "Writing Prompts" rather than as links to substantive material.

Bashing on big cos: Check

Vague pronouncements about AI: Check

Generic side-swipes about 'ad revenue': Check

This is why a coherent thesis is required to even initiate a proper discussion, because in the absence of that it invariably devolves to lowest-common-denominator shit-flinging.

ltbarcly36y ago

Here's another fun fact about how commerce search engines work (I spent a couple of years on this):

twodave6y ago

elicash6y ago

I think Google should design primarily for people who DON'T know a ton about crafting queries, even if it's at the expense of a much smaller number of folks who are experts.

That said, this seems like an obvious place for improvement where both groups can be made happy.

pvtmert6y ago

Since there is no context is provided, I do not expect it to understand prepositions itself.

Given exact query to human, they create environment thus context themselves.

I literally thought about that yesterday and did not see the page thinking "That's too much for tonight".

Now seeing topic is somewhat very different.

civil_engineer6y ago

Wikipedia gets it wrong too: Try “men without hats” https://en.wikipedia.org/wiki/Men_Without_Hats

thedeviantdev6y ago

Try searching Google for 'white couples' and 'black couples'.

The former returns lots of mixed race couples, mostly not white couples. However the latter returns black couples.

What is going on here? Similar phenomenon perhaps?

heavenlyblue6y ago

quickthrower26y ago

"Plain shirt" works a charm though. What is a 'shirt without stripes' anyway? That could be a shirt with diamonds? Or a plain one? Or a Hawaiian shirt?

What is the expected result, can we agree?

notRobot6y ago

> What is a 'shirt without stripes' anyway

It is a shirt with anything except stripes.

https://medium.com/pinterest-engineering/pinsage-a-new-graph...

lgessler6y ago

V-26y ago

dk89966y ago

dailypeeker6y ago

Why is everything a git repo when it could have been a blog post?

jjjjjjjjjjjjjjj6y ago

Why do you think a blog post would've been better?

Minor49er6y ago

obarthelemy6y ago

Reminds me of the the challenge:

"Don't think of a cow !"

What did you just think of ? A cow, of cowrse.

If you want a shirt w/o stripes, just google "plain shirt" or "dress shirt -stripes.

dtunkelang6y ago

As others have pointed out, most search engines don't support natural language search in general, let alone natural language negation in particular.

There are several reasons for this, including the following:

None of this is to say that we shouldn't be striving to improve the way people and search engines communicate. But I'm not convinced that an example like this one sheds much light on the problem.

If you're curious to learn more about query understanding, I suggest you check out https://queryunderstanding.com/introduction-c98740502103

Just my theory.

holdenc1376y ago

Wake me up when I can google "anything but crocodiles"

jpswade6y ago

Google search doesn’t work that way, it’s still based on how we link to things.

Nobody would describe a plain shirt as a shirt without stripes unless it’s within that context.

rubatuga6y ago

Also, what really angers me is when websites don't support the minus operator for search queries. It's a simple feature introduced decades ago!

https://www.google.com/search?q=plain+shirt

https://www.amazon.com/s?k=plain+shirt

on edit: https://www.google.com/search?q=shirt+-stripes

cosmojg6y ago

Ah, but you see, he specifically wants a shirt without stripes as dialectically opposed to a shirt with stripes. A plain shirt just won't do.

See: https://www.youtube.com/watch?v=wmJVsaxoQSw

kempbellt6y ago

"Shirt without stripes" is harder to imagine it meaning "strips on the side. I'll add them myself".

Language is ever so complicated and contextual.

labster6y ago

If there's a dialectic here, does that mean there is a synthesis state between stripes and non-stripes? Is it polka dots?

phillc736y ago

leephillips6y ago

ofTheMountain6y ago

chrisan6y ago

Man that - operator is one of the first things I learned to help filter results.

These days either my searches have become better or their interpretation/context has as I rarely have to use that anymore

nardi6y ago

So the - is working, contrary to the claims of several people here.

I wouldn't say so. For the five images at the top of my search results, 3 of the 5 are striped, and 1 is plaid.

marcosdumay6y ago

Yeah... As expected, negating the stripes on that last google search makes no apparent difference for me, and "plain shirt" has an entirely different meaning from "shirt without stripes".

https://imgur.com/a/XBzfOsF

rammy12346y ago

Author's idea is the negative keyword. "Without"

neycoda6y ago

How does a search engine know whether you wanted shirts that didn't have stripes or results that contained the words shirts, without, and stripes?

ddebernardy6y ago

The correct query would have been "shirt -stripes". That works fine, or at least does on Google. But yeah, sentence parsing fail.

PeterCorless6y ago

Literally the first image that comes up for me is a striped shirt.

ddebernardy6y ago

No idea where you're from or what you did...

Might your search history be so that you're so contrarian that Google suggests contrarian results? :D

raindropm6y ago

The result, of course, show shirt with some kind of stripe, albeit not prominent like the English one.

vekker6y ago

I worked on an ingredient parser a few years ago. This exact kind of thing made things a lot more difficult than they seemed at first.

rjurney6y ago

The latest embeddings/networks like BERT can handle encoding this logic. They take the surrounding words in context when they're encoded.

wordabby6y ago

On a positive note Google used to have trouble with a query like "words with q without u", now the top 5 pages at least all show the correct results, eg: https://word.tips/words-with/q/without/u/

CPLX6y ago

aj76y ago

adamredwoods6y ago

I wonder if this is a need for humans need to learn search queries. "-stripes" instead of "without stripes".

Or does input need to have basic filters applied before handing to ML? "without X" or "no X" = "-X"? Can be foiled with "shirt without having stripes".

leonardopucci6y ago

aaron6956y ago

Not sure if people actually search for "Shirt Without Stripes" or this was picked for academia over what is actually needed

But make a script that scrapes the top X results for these sites. Get your own AI / humans to rate it.

Make it competitive for these large sites <==> give them an incentive.

softwaredoug6y ago

In search we know it’s easy to cherry pick queries and criticize any search engine. A search engine is optimizing for billions of queries. Most of which are on the long tail.

The real question is “shirts without stripes” really a query people enter? Or representative of a real pattern in the data?

altfredd6y ago

> A search engine is optimizing for billions of queries. Most of which are on the long tail.

Citation needed.

b3kart6y ago

I think what the original poster meant is that the tail is really long, and it's hard to cover it, hence these bad examples.

saalweachter6y ago

And since most search applications are basically just finding you the results with the most keyword matches, with a little bit of extra magic thrown in, the above is basically what you see.

These queries are basically the equivalent of optical illusions in cognitive psychology when studying the visual system -- seeing how the systems break tells you a lot about how they work.

Have you ever seen shirt dresses in your dress shirt queries, or vice versa? The search application isn't caring enough about bigrams and compound words.

Have you ever seen bowls in your bowling queries or fish in your fishing queries? The search application is over-stemming.

mirimir6y ago

Searching "plain shirts" does in fact yield results for shirts without patterns. And "paisley shirts" works too.

So it's not such a big deal that negation doesn't work.

Also, "shirts -stripes" does seem to work in both Amazon and Google. Or at least, I see no striped shirts.

moultano6y ago

Ya'll might be interested in this paper. https://arxiv.org/abs/1907.13528

> in particular, it shows clear insensitivity to the contextual impacts of negation.

cfv6y ago

While I'm sure this is A Hard Problem to solve by NLP I for whatever reason was under the impression that this is trivial to special-case.

As in, "X without Y" sounds like a common enough use case to have it's own little parser branch in places as big as Google or Amazon

hbosch6y ago

I mean, if I google the phrase "shirts -stripes" and click the Images tab I see mainly shirts without stripes.

So it's essentially the same input, and essentially the same expected output, but there must be quite a knot between understanding the word "without" and literally just using the - operator.

https://www.amazon.ca/s?k=shirt+without+stripes&ref=nb_sb_no...

realo6y ago

Tried this on amazon.ca (instead of .com) and got quite a different, but also amazing, result...

carapace6y ago

    shirt -stripes

> "Am I going crazy or is it the world around me!?"

Fishbone - Drunk Skitzo https://youtu.be/SaPGH4Yd_zc?t=231

(Apologies for the snarky low-content flip reply.)

arnaudsm6y ago

atupis6y ago

My bet is that sites are generally optimized to a keyword search and this kind NLP search engine would return subpar answers most cases.

GnarfGnarf6y ago

Why not simply say "shirt -stripes" (negation in front of "stripes").

harimau7776y ago

cachestash6y ago

Key question here, do all three even profess to using AI/ML in the search feature?

dpcan6y ago

Yes, exclusive search is a huge problem.

You have to know to search for "solid colored shirt", but when you can't think of this variation of search, or maybe there isn't one, exclusion is your only option, and it's broken.

rammy12346y ago

I see Bing is poor of the lot. It did not understand the "without" keyword

DangerousPie6y ago

Counterexample: https://www.google.com/search?client=firefox-b-d&q=Doctors+W...

crote6y ago

Is it, though? Try "Docters with borders".

kitplummer6y ago

Why is this a Github repo? I can't get past the abuse of a git repository.

sceptically6y ago

mv46y ago

Not surprising. Also, "shirt -stripe" might product better results.

jayd166y ago

This works quite well but don't add the quotation marks.

https://www.google.com/search?q=woman+without+makeup&tbm=isc...

MaysonL6y ago

Reminds me of the difficult time I had finding socks without elastic.

otikik6y ago

Poignant but accurate.

On Amazon's side of things I would also include the obnoxious "Hey you just bought a pair of sneakers so now I will change all your recommendations to sneakers".

tiborsaas6y ago

I get the author's point, but if you think about it, a search engine is a database that serves you results you want to see. Why should a search engine be fine tuned towards things you don't want see?

If it's meaningful for some reason, then it works:

If it's an user error (like a dumb query) it fails and it shouldn't be a surprise:

https://www.google.com/search?q=sea+without+ships&tbm=isch

l0b06y ago

"shirt -stripes" seems to work on google.com at least, even though they and others like DDG have been getting really bad at ignoring "-foo" terms recently.

_0w8t6y ago

Searching in Russian on Yandex gives the same ridiculous results.

ddlutz6y ago

Noticed something interesting, if you search for 'shirt without sleeves' in google images, you DO get sleeveless shirts. So why doesn't this work with stripes?

mr_toad6y ago

I’m guessing that BERT understands that ‘without sleeves’ and sleeveless means the same thing, and there are images labelled as sleeveless shirts.

But there probably aren’t many images labelled as stripeless.

I’m not sure why BERT doesn’t try shirt -stripes.

peter_retief6y ago

If you search for "plain shirt" its good If you add "plain shirt no stripes" it adds stripes Strangely "striped shirt" has some plain results.

lavp6y ago

For the google search, I get better results by typing "shirt -stripes". Still not perfect, but it's better than the seemingly redundant 'without'.

Despoisj6y ago

What about using the "-" sign to filter results instead of relying on complex language understanding?

=> "shirt -stripes" works pretty well on google at least

Skunkleton6y ago

Good thing search engines generally support a more machine-centric process for communicating intent. Try searching for "shirt -stripes".

We are in a funny place with UIs.

dvh6y ago

https://www.google.com/search?q=leap+years+in+1900s

prvc6y ago

On a related note, Google seems to favor forum replies which instruct users to perform a search in order to find the answer to the question that they had asked.

fortran776y ago

Here's a better example:

"shirt without sleeves"

That something that someone may actually search for. (At least the guys at my gym would!) And Amazon gets it mostly wrong.

melvinram6y ago

`shirt -stripes` gives the type of results one would expect. I guess we haven't reached that level of natural language processing yet.

willdearden6y ago

One time I ordered from stitch fix and asked for “a shirt which is not red, white, and blue” and got a red, white, and blue shirt.

wwarner6y ago

I thought the minus prefix instead of "without" would exclude "stripes", but it doesn't (any more).

__ryan__6y ago

Similarly, ask Siri to “play all of my music, except classical music”. Siri responds “OK, classical music coming up.”

activatedgeek6y ago

I'm sorry but what's the point being made here?

Search results could be better? Sure.

Can we find adversarial examples? Almost always.

impostervt6y ago

This should be the next captcha model.

hajimuz6y ago

It’s like the plot in the movie Inception. How to plan an idea “don’t think elepant” to an human idea?

flamtap6y ago

In the Amazon app, a search for “shirt without stripes” now get corrected to “shirt without strips”.

cgb2236y ago

When the Robot Wars come we’ll all be wearing striped shirts as camouflage to confuse their AIs

robbiemitchell6y ago

In my experience, semantic embedding is simply not very good at taking negation into account.

sceptically6y ago

Just a short question: Why does someone choose to publish something like that on github?

detaro6y ago

because it is easy for them?

https://www.nationalgeographic.com/animals/2019/09/zebra-pse...

skizm6y ago

Could this be an SEO opportunity to capture some simple negative phrases like this?

billiam6y ago

It's a shopping problem, not a language problem, according to these companies.

voldacar6y ago

Doesn't Peter Norvig work at google? maybe they should pick up his book

java-man6y ago

Bag of words does not work.

diegorbaquero6y ago

saagarjha6y ago

shirt -stripes does not mean "solid color shirt", as the - operator looks for that text on the page, instead of performing a semantic "I don't want stripes".

SAI_Peregrinus6y ago

And "shirt without stripes" doesn't mean solid color shirt either. Could have spots. Could be Hawaiian print. Is plaid striped?

leni5366y ago

stripes are not the only pattern besides solid color.

tempodox6y ago

Stripes seem to hold an irresistible attraction for impostor “AI”.

holdenc1376y ago

wake me up when I can google "Anything but crocodiles"

montroser6y ago

You're joking, but perhaps you're right. It's unnatural to search in the negative -- "solid shirt" or something like that would be more likely from a human with that actual intent.

This just suggests to me that real humans haven't issued that type of search query enough for the AI to know what to do with it. Which wouldn't be so big of a problem.

AA-BA-94-2A-566y ago

Did nobody tell OP you can search "shirt -stripes"?

paulftw6y ago

Are they also building AI cars that drive without accidents?

DonHopkins6y ago

Stay positive: "solid shirts" works just fine.

_pmf_6y ago

But muh "autonomous driving is almost there".

longtermd6y ago

A pretty accurate descriptions of the State of AI ;D jk

downshun6y ago

It's called plain, as in plain shirt. PEBKAC

PeterCorless6y ago

Someone needs to learn how to use the ¬ operator.

dvduval6y ago

Most important is what the advertisements at the top show. The organic results are so yesterday. The Google Ads AI should already be teaching you that All you base belong to us.

ChrisArchitect6y ago

hm, 'without' is a tough one. You're not looking for a zebra without stripes. You're looking for a horse.

moron4hire6y ago

Not necessarily

myhf6y ago

Try searching for a "shirt without zebras"

jlmcguire6y ago

without must be a tough one? It does seem that bing is the worst at figuring this out from the pictures.

lucb1e6y ago

kleer0016y ago

"shirt -stripes" works, wtf?

softwarejosh6y ago

its obvious but the solution is shirt -stripes until we make ai interpret attributes

sparrish6y ago

Learn you some Google-foo. Don't say what it isn't. Say what it is. "shirt solid color"

spelunker6y ago

Google-Fu == working around machine learning limitations which is the point of this post.

saagarjha6y ago

> Learn you some Google-foo.

thebean116y ago

But that would exclude polka dots and floral patterns..?

oliveshell6y ago

That’s obviously not the point here.

danielovichdk6y ago

Result without correct match!

noughtme6y ago

Probably not surprising that most people don’t know, but negative keyword search works on all these platforms:

shirt -stripes

coldtea6y ago

That wouldn't show pictures of shirts without stripes.

It would show pictures of shirts in pages that don't mention the word "stripes", whether the shirts have stripes on them or not.

In other words, it has little to do with what the article wants to show...

at_a_remove6y ago

I've run into so many variations of this. You can search for something only to have the recommendation/related results embedded on whatever page to throw off your results one way or another.

derekp76y ago

It would be really nice if the search engines would automatically apply the "-" when it sees the work "without".

oliveshell6y ago

I’m not so sure.

What about a search query for “Doctors without Borders“ or “Men Without Hats“?

Surely interpreting “without“ as the negative operator would ruin those searches.

noughtme6y ago

it actually does work though. keywords will need to be generated whether manually, from the manufacturer, or from image analysis.

qayxc6y ago

Search engines are indexing content based on [edit]STATIC [/edit] keywords, though.

The author's intent exceeds both the capabilities and intended use case of search engines.

So the author wants: select all shirts where content analysis of returned images yields no stripes.

eggie56y ago

simple LTR w/ clickstream data would fix this easy

arkanciscan6y ago

BUT THEY HAD MILK!

kebman6y ago

shirt -stripes Thank me later. :D

aaronsnoswell6y ago

Brilliant :)

renewiltord6y ago

This is amusing but not a problem.

inopinatus6y ago

Tortoise: But we must be careful in combining sentences. For instance you’d grant that “Politicians lie” is true, wouldn’t you?

Achilles: Who could deny it?

Tortoise: Good. Likewise, “Cast-iron sinks” is a valid utterance, isn’t it?

Achilles: Indubitably.

Tortoise: Then, putting them together, we get “Politicians lie in cast iron sinks”. Now that’s not the case, is it?

---- Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books, 1979

WilliamEdward6y ago

querying "shirt no stripes" yields slightly better results.

JabavuAdams6y ago

bag of words?

meritt6y ago

The point of this isn't asking how to apply boolean search operators, it's showing that the largest AI-focused companies in the world absolutely suck at NLP.

packetlost6y ago

Why would you really apply NLP to a search engine though? Generally speaking a weighted keyword search is good enough 95% of the time and requires significantly less resources to perform.

binarymax6y ago

I work specifically in this field with clients, and deliver training on applying NLP to search.

You’d be surprised how effective NLP is for use when identifying query intent, and pulling out modifiers that should apply as metadata filters.

Weighted keyword search works a lot, but it fails hard for many long tail queries (especially in e-commerce and other attribute heavy domains).

IMO there really isn’t a good excuse for these firms to fail at queries like this. The query itself isn’t particularly difficult when using a decent NLP stack and following well known practices.

michelpp6y ago

Because you want to sell shirts without stripes to customers who want them.

headcanon6y ago

So NLP is totally a thing you want to have in search. Arguably, its the whole point of search as it exists now.

PascLeRasc6y ago

Agreed. I love how I can go to Men's Wearhouse and tell the salesman:

"Shirts"

"Polka dot shirts"

"Floral shirts"

"Wikipedia list of clothing patterns"

"Houndstooth shirts"

onion2k6y ago

Why would you really apply NLP to a search engine though?

aglionby6y ago

Google have a blog post from October last year with some more complex examples of where more sophisticated NLP helps https://www.blog.google/products/search/search-language-unde...

rjurney6y ago

Because there's something called conceptual search that people are dying for to find what they actually want.

uoaei6y ago

zymhan6y ago

Because not everyone thinks in search operator keywords.

gravitas6y ago

Certain query styles which the NLP operators help clarify. Recent personal examples: "is butyl rubber an organic or inorganic compound?" and "which gloves are best for both acetone and nitric acid?"

tsukurimashou6y ago

Because that's how people use it and kind of marketed as such

balnaphone6y ago

Perhaps it's considered "good enough" because users balk at performing NLP-style queries when they find they almost never work properly.

DenisM6y ago

5% revenue sales and revenue increase? Why indeed!

notatoad6y ago

waynecochran6y ago

Every systems sucks at NLP. It is an extremely hard problem and we are nowhere near matching the human cerebral cortex.

binarymax6y ago

Generally, yes. But you can go very far and cover a lot of cases when you stick to a narrow domain and common patterns that customers of your product will typically use.

Justin_K6y ago

I'd go as far as saying "AI" is only as good as someone has taken the time to program the use case.

gumby6y ago

Yes, basically 1990s Expert Systems but with even more manual labor.

alexpapworth6y ago

So, it's really just a series of if statements?

5 more replies

scabarott6y ago

Huh? It's supposed to be the opposite, hence the "I"