> It could also understand that, in the context of hiking, to “prepare” could include things like fitness training as well as finding the right gear.
> fall is the rainy season on Mt. Fuji so you might need a waterproof jacket.
> MUM could also surface helpful subtopics for deeper exploration — like the top-rated gear or best training exercises
> you might see results like where to enjoy the best views of the mountain, onsen in the area and popular souvenir shops
Or, my favorite line:
> MUM would understand the image and connect it with your question to let you know your boots would work just fine. It could then point you to a blog with a list of recommended gear.
(in other words: "Thanks for showing you're interested in hiking gear. Here's a lot of hiking gear you can buy.")
Embedded in this answer seems to be the mindset that only buying things will solve problems.
Don't get me wrong -- I'm not a consumerist luddite, I use my credit card points like any good and proper citizen -- but when your mindset is "all problems can be solved by buying more shit", well, that's a pretty lonely existence.
Google's gotta make money, and helping people buy useful shit is a fine way of doing it, but just don't fall into the mindset trap that everything solution in life is just a Google Pay away.
Google's search was at its peak in 2008 when advertising hadn't fully compromised search quality. Google is an advertising business that supports its otherwise money losing properties. Why will things change in the future because you can synthesize data from multiple sources only to compromise that quality with the realities of Google's business model?
Is there any empirical evidence to back this up? If we’re talking anecdote, I swear as soon as google started labeling ads more clearly people complained more about ads. And if google really is getting worse, I would expect that I would get frustrated with DuckDuckGo bit getting the job done less often.
I do share your concerns though. Just look at YouTube as an example. You search for something, and half way down the page are completely unrelated videos that you watched before. This is because YouTube just wants you to click, they don’t care about you finding what you were after.
Try searching "how valve index works" or "how valve index controllers work". My interpretation of "how it works" is "technical information on how an item operates". Google will interpret this instead as "how well it performs its intended functions" and flood me with both links to purchase the Valve Index as well as endless reviews. Results on Google are not tailored toward retrieval of factual information anymore. They're tailored to ordinary, garden-variety consumers, and obviously designed to sell you a Valve Index.
To this day I still have not found really good information on how the controllers in the Valve Index actually work. All I get are pushes and nudges into getting me to buy something.
e.g. Search for "camera with wifi" versus "camera with wifi reddit". If you're doing any research, you will find the latter more useful. Now I know some will say many people just want to buy the product and will be satisfied with a direct link to purchase, but the thing is a good search engine will mix in different types of results. What you get here is dozens of virtually identical results with any genuine info - e.g. a recent post on a reputable personal blog or a social media post - completely buried.
Do any other engines do it better? Maybe not. But Google itself certainly used to do it better, if only because it didn't have the majority of the internet trying to game its algo.
I am a happy paying customer for GCP, Play Books+Movies, etc., but I think they need to step up the quality of their services for paying YouTube customers.
Thanks for your comment.
When did Google start labeling ads more clearly? See: https://searchengineland.com/figz/wp-content/seloads/2016/07...
Today the labeling consists of the letters "Ad" in black next to the result.
source: https://searchengineland.com/search-ad-labeling-history-goog...
No. This is the same HN post as "Facebook is dying, pretty soon all their users will be gone and they'll collapse".
Google would much prefer to be the sole source of your traffic instead of pushing you to other sites. Google's business is advertising. Why would they want to lose that traffic?
Check this article about the Google MUM announcement, which basically says the same thing:
"MUM is part of Google’s long-term shift away from ranked search results and toward the creation of AI algorithms that can answer user questions faster—often without ever clicking a link or leaving Google’s results page. (Think, for example, of the “knowledge panels” that now appear at the top of many search results pages and display an answer from a website so you don’t have to visit the site yourself.) This shift promises to reduce the amount of work it takes to find information through Google. But it’s not clear that this is a problem in need of a solution." [0]
The Google of today is not the Google of 2008. Google in 2008 was a search engine. Today it's an advertising business that would much prefer you not leave Google properties.
[0] https://qz.com/2010802/googles-mum-is-making-search-worse-by...
1. Any remotely commercial search has an entire first page of ads, organic results are pushed way down.
2. Google has made it difference between ads and search results as minimal as possible. I long for the days of the early 00s of big yellow boxes.
3. On many pages the amount of content Google stuffs in at the top before you get to actual search results gets more annoying every year.
Honestly, I wish I had a button that made Google result pages look like they did 15 years ago.
Those are the same thing. If garbage websites can game their way up the search listing then Google is failing.
This is a simple problem of competition. Google doesn't have any, so they don't need to provide a good product. They can optimize for ad placement and revenue instead of search quality because users perceive that they have no real choice but to use Google. If another search engine manages to get some real market share Google results will get much better again.
A few years ago I felt that Bing and Google search were basically on par. Google has definitely upped the ante regarding search in the last couple of years. It may just be that it does more interpretation than you've come to expect so you need to retrain yourself how to query it. There are also occasions where verbatim search is required for technical topics. But Google's search quality has shown real improvements.
This one line is echoed again and again on HN and yet in my experience all its competitors still pale in comparison. I hate Google now as much as the next HNer for its evil shenanigans but their search is still superior and if a browser comes with a default like Bing or Ddg (like ff on linux mint) the first I do is change it back to Google since the results are truly aweful otherwise.
In this way, Google gets all the value from Internet properties they don't own without having to push any traffic to those sources. So, they get their cake and eat it too. They create a way to regurgitate information from the vast trove of info on the Internet without ever having to share traffic with those sources by moving traffic from their search engines to those sites, like they do now.
They get to sell advertising to those who want to capture eyeballs for search results, without having to share any ad revenue with the content providers that are powering that transformer-based search.
Ain't it grand?
- 1000 times more powerful than BERT, but still transformer architecture
- trained on 75+ languages, can transfer knowledge between languages
- can do text and images (not audio and video yet)
- can understand context, go deeper in a topic and generate content
Not much apart from their words about how amazing it is. Paper? Demo?
There is zero possibility that Google accomplished proper "language transfer" with the vast majority of Silicon Valley programmers being native English speakers.
In some languages, if you accidentally use a wrong single syllable in any sentence, you can end up saying something extremely embarrassing--and entirely different. This is the case with many Slavic languages.
This is a memorable "classic" [1]:
> "Tony Henry belted out a version of the Croat[ian] [national] anthem before the 80,000 crowd, but made a blunder at the end. He should have sung 'Mila kuda si planina' (which roughly means 'You know my dear how we love your mountains'). But he instead sang 'Mila kura si planina' which can be interpreted as 'My dear, my penis is a mountain'."
Many languages are much more grammatically complex than English, and also have an unbelievable amount of implicit contextual information derived from the grammatical morphology. For example, Slavic languages tend to be this way. The Slavic language that I speak, Croatian, tends to be very clean, direct, and concise, while being extremely complicated grammatically. Also, we have a lot of the same words for the same thing in Croatian, which in combination with the complicated grammar, it makes it a very expressive language. English, however, can be more expressive, in the sense that it allows for more figurative language, like with the usage of idioms.
[1] BBC: Anthem gaffe 'lifted Croatia': http://news.bbc.co.uk/sport2/hi/football/7109058.stm
This speaks to ignorance of who Google employs. A ton of the engineers are immigrants there. When I was on Google Photos in MTV, I'd estimate it being about evenly split between native, English-first speakers, vs people who were either non-native English speakers or grew up with two languages simultaneously (children of first gen immigrants in the US).
Silicon Valley has a huge amount of cultural and ethnic diversity, so I don't know why you would make this mistake.
I don't know the people who worked at this project, but you do realise that Google employs swaths of programmers that are not native English speakers?
Here's my guess: Some team under web search trained a large Transformer based model but with some adjustment here but now on a massive dataset from the crawled web pages using tons of TPUs. It made an incremental improvement to the search quality metrics and was shipped to production.
Their models accomplish ridiculously powerful things. Tbh I think it's far _more_ likely the answer is "this is crazy powerful, but the engineers didn't feel like writing a blog post about it, and the marketing team hasn't figured out how to monetize it yet".
But judging by the comments her, when Captain Picard asks the ship how long to Starbase 17 at Warp 9, rather than answer you want it to tell the Captain to visit WarpTravelCalculator.com
If you publish information in this world, there’s nothing preventing people from learning it and rewriting it in a new way. Humans do it all the time and they don’t pay the people they learned it from a portion of proceeds.
Future AI will do this too. I want machine learning to read every book and paper ever written and be able to answer queries and summarize things for me.
We may need to find a better model for encouraging content contribution to society besides copyright and demanding royalties on every use.
1. It mixes mapping math calculations with published information like texts.
2. The AI in star trek worked to serve the end user, in this case Picard. In our world the AI systems are designed to serve the software's owner such as Google. It's not trying to give you the best answer. Instead it's trying to provide you responses that make Google the most money or get them into positions of power and influence the leaders want.
3. Star Trek takes place in a world where the Federation doesn't use money and everyone is motivated to put in a hard days work. On most planets they don't have poor. This does not fit the societal cultural dynamic we have now.
> We may need to find a better model for encouraging content contribution to society besides copyright and demanding royalties on every use.
Right now we have a problem where people are trying to step on content creators. I was reading an example of where singers were trying to get added to songs as writers when they didn't write songs so they could get more of the writers royalty from sales. We live in a world where some will beg, borrow, steal, plagiarize, and generally try to hurt others to get a leg up. Including many at big businesses who would leverage AI for that.
We may hope for the best but we should plan for the worst.
Sorry to be off-topic but it's hard to get excited about blue sky ventures when the search UI offers no capability for simple things like delivering search results in date order. You can filter results by date, but not sort them.
To be useful, your “sort” would really just need to be another parameter to the existing relevancy model. And if you did that, then people would probably complain that “it’s not a real sort” and we’re back to square one.
Edit: You know what, this probably is simple for Google, because they’re freakin Google. To your point, I guess they probably don’t do this because money.
Exactly this. There are many controls they could have given us to trivially improve search for end users without needing this AI, but they would have made search less good for their customers, the advertisers.
I'm currently in Spain. I'm not Spanish. If I want results that don't have to do with that country, and aren't in Spanish, I need to use Duckduckgo. Google is unable to not give localized results.
1) terms are split into tokens 2) tokens are looked up to find documents 3) documents are ranked by scoring functions
I suspect sorting by chronological order might require too many document metadatas to be retrieved at step 2. (A lot of filtering occurs between steps 2 and 3.)
Meanwhile the Reddit’s and whatnots can’t afford to not have Google index them, so this is just the price of admission. I wonder if they need an expansion to do not crawl that lets you specify how the data could be used?
Don't know if it's related, but the above is Arup’s speech for the computer he christened Mumbo-Jumbo.
While Google is building a bigger and "better" Behemoth we should ask if this kind of innovation is really doing anything at all to make the world a better place in a meaningful way. Better monetization of search seems like a way to make the world worse in my opinion.
It's sad to see that they'll be spending so much time, effort and money on this...
"Google MUM MultiTask Unified Model Introduction" https://youtu.be/s7t4lLgINyo
I originally posted the LaMDA video: https://youtu.be/aUSSfo5nCdM
Ah yes, that totally common scenario which I'm faced with all the time.
I love this. It perfectly illustrates the peril we are in with the current state of AI research. That the author would choose this as a problem to solve shows exactly the socioeconomic class they come from, and how that influences the way they solve problems. It may seem like a trivial and meaningless example, but these subtle biases will creep their way into these systems and be amplified. And you can bet that this kind of work is the foundation for what will become the technology that eventually governs every facet of our lives once AGI is a thing.
I, for one, am terrified of the implications that a bougie tech bro AI overlord entails.
If that's your concern though, the good news is that in its purest form, machine learning tends to bend AWAY from this. You need large data sets to get good results, which means these projects tend to sample huge chunks of the general Internet, not just the isolated bubbles of SV types. Of course this still has limits, any data set has limits. You can only scrape data from the net if someone has posted that data in the first place, for example.
But in their initial form, a lot of these models are pretty diverse. That's why AI Dungeon had all kinds of "objectionable" content that kept getting the always-offended on their case: GPT-3 is just built off the general Internet, including a lot of weird, fucked up shit. The real problem is that inevitably someone complains, and they start hacking away at the ideal model to try to make it squeaky clean and ruin it in the process.
If you want to keep the tech from being perverted by "bougie tech bros," focus on the censorship. The models often start off pretty good.
How about the millions of people in rural counties and developing countries without access to vehicles who rely on walking across difficult terrain to make deliveries / get to work / get to school / visit family? Are they also techbros? My grandfather was an electrician in Albania and he would regularly walk dozens of miles on foot including through mountain ranges in order to get between jobs. Granted, this was dozens of years ago, but there's no reason to believe there isn't someone doing the same thing today.
If anything your own upper middle class bias is showing here, because you assume that everyone who navigates terrain is doing so for fun and not because they don't have other options.
https://www.thesun.co.uk/news/10248155/climber-livestreams-d...
Indoor climbing is definitely a SF techie thing. Tons of tech people climbed at Mission Cliffs.
Ask MUM.
Ignoring AI etc, my kids play a couple of games where there is clearly some backend that "knows" Taylor Swift is a Singer, is Female, and has acted in this movie X
You can go a long way in a Turing test with that and I was wondering if folks knew where those graphs were built ?
[1] https://www.wikidata.org/wiki/Wikidata:Main_Page
[2] https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/...
The key notion here is scale relativity. This is the reason why transformer models have been so, well, transformative. Bigger models are better than smaller models in a proportional manner. That is, they display scale relativity. Where is the limit? Where does this break down? We don't know. We haven't found the ceiling yet.
Another important notion is multimodality. When you can cross-reference your text-based knowledge of an apple with your image-based knowledge of an apple, you can use this information as leverage. Archimedes said, "Give me a place to stand on, and I'll move the Earth." It might seem ridiculous to say that the same is true when it comes to information, but it is. Informational leverage is powerful. Multimodality allows you to make very accurate predictions. The McGurk effect is a nice demonstration of how we do the exact same thing. We rely on visual information from a speaker's lips to predict what they're going to say. In other words: we make use of multimodal leverage.
The twin notions of scale relativity and multimodality explain what makes MUM possible. As some of you have pointed out, there's another aspect that we can't ignore: utility. Google will be using MUM to make money. Which means that they'll have to train MUM to make you spend it. But if you're uncomfortable with this idea, you are uncomfortable with capitalism in general. Which is fair, but I think it's important to keep it in mind.
As I'm sure they've already considered at Google, MUM can be used to revolutionize education. Imagine people all over the world having access to an expert instructor who can answer all of your questions. You might think this sounds like a dream, but we're a mere stone toss away from achieving it. That's the true power of scale relativity + multimodality: we can now make advanced systems that can communicate with us.
I appreciate the skeptics and naysayers here: you keep the rest of us sane. For that, I thank you. At the same time, I want you to open your eyes to the possibility that something very important and transformative is happening right now. You don't have to go full Kurzweil, but I think you would benefit from reflecting on the opportunities this new technology might offer.
"Is there any work left to be done?"
The short answer is an emphatic “Yes! Dismantling your monster of a corporation!”Google could search captions on all the Youtube (etc) videos. Not sure why this doesn't happen. Along with a few other big resources not indexed.
I think the big thing with the article(Taken as a workable technology) is it's not search, it's getting other peoples information and transforming it into a Google resource.
Which does add to humanities knowledge, but it's owned and profited on by Google.
Instead it's an announcement that Google has made a new, even bigger, pile of linear algebra that can sort of answer questions and won't end up like Watson.
I like that they put in a deadpan bit about how they are very ethical when they make and then exploit their huge collections of data found by their spiders. There sure hasn't been any AI controversy at google this quarter, no sir-e!
Hands up everyone who is 100% satisfied with Search ... ... OK no one.
So now we have an unsolved problem left behind in favour of ... chat about mountains ...
"MUM has the potential to transform how Google helps you with complex tasks. Like BERT, MUM is built on a Transformer architecture, but it’s 1,000 times more powerful. MUM not only understands language, but also generates it."
Piss off and while you are at it, get BERT to explain my response to MUM or vice versa.
If MUM can decipher my immediately prior sentence given this input then I might start to get interested.
Yeah, it’d have been more code but you would not have needed to destroy a forest to train the thing.
This is the NLP trade off of the 21st century. The code is easier to write but the model is completely opaque, and you need to really burn a lot of electricity to make it work.
This is basically a meme now. We actually have a pretty good understanding of how the models work. In fact that understanding is how you can do things like build chatbots that don't spew hate.
Also the electrical cost of ML training large language models is indeed high (e.g GPT-3 has 175B params and is estimated at 190,000 kWh to train on GPUs). But the folks who pay the cost (Basically OpenAI, Google, MSFT, Facebook, Amazon) are incentivized to make that go down (TPUs are way more efficient than GPUs), and they are incentivized to do it infrequently because it costs $$$.
FWIW Google's datacenters are also technically carbon neutral. I know that's not great because carbon credits don't have the impact that folks think they have, but there is definitely a difference in ecological impact from datacenter electricity and other kinds of energy usage (e.g cars all burning fossil fuels).
Okay also let's compare to bitcoin, which is the real ecological disaster if we want to talk about inefficient software: ~387,096,774 kWh PER DAY. _and_ incentivizing things like cheap coal, and miners are definitely not using their crypto wealth to purchase carbon offset credits :(
There's been a bit of discussion on HN lately about the effectiveness of sophisticated models vs. just good metadata.