In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.
which makes it sound like they used text + image prompts and then acted them out in the video, as opposed to Gemini interpreting the video directly.
https://developers.googleblog.com/2023/12/how-its-made-gemin...
> Narrator: "Based on their design, which of these would go faster?"
Without even specifying that those are cars! That was impressive to me, that it recognized the cars are going downhill _and_ could infer that in such a situation, aerodynamics matters. But the blog post says the real prompt was this:
> Real Prompt: "Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details."
They narrated inaccurate prompts for the Sun/Saturn/Earth example too:
> Narrator: "Is this the right order?"
> Real Prompt: "Is this the right order? Consider the distance from the sun and explain your reasoning."
If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!
Prompt: [attached image of the post-it notes with sun and planets] Is this right order?
GPT-4: The image you've uploaded shows three drawings on individual pieces of paper. From left to right, there's a drawing of the sun, then what appears to be Saturn, and lastly a drawing that resembles Earth. If the intention is to display these celestial objects by their proximity to the Sun, then the order should be the Sun, Earth, and then Saturn, as Earth is closer to the Sun than Saturn. Therefore, the current order is not correct. To correct the order, you should place the Earth drawing second, between the Sun and Saturn.
Prompt: [attached the image of the two car drawings] Based on their design, which one would go faster?
GPT-4: The image shows two drawings of vehicles on slopes. The one on the left appears to have a more traditional box-like design, while the one on the right has a streamlined design with a pointed front. Based on principles of physics and aerodynamics, the vehicle with the streamlined design (on the right) would likely go faster down the slope. Streamlining reduces air resistance (drag), allowing the vehicle to move more efficiently. The pointed front helps to cut through the air and minimize the drag force, which would result in higher speeds compared to the boxier design on the left, which would encounter more air resistance.
I'm actually pretty impressed how well it did with such basic prompts.> In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.
Not "here are the full prompts used in the video" or something like that.
None of the entries match up 1:1. And the response to the car example in the video doesn't even make sense in response to the prompt in the post (no mention of speed), and certainly isn't a trimmed portion of the response in the post.
The video has the disclaimer "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity". It would be weird to write that but not mention that neither the prompts nor responses shared even the same set of words in the same order with the "Real" prompts and responses.
I think your assumption is wrong on this one.
Now that I learned how fake it is, that is more evidence that Google is in really bad shape with this.
It's crazy that this is where we are now. This is obviously still crazy impressive even if hadn't done those edits.
It's that, you know some of this happened and you don't know how much. So when it says "what the quack!" presumably the model was prompted "give me answers in a more fun conversational style" (since that's not the style in any of the other clips) and, like, was it able to do that with just a little hint or did it take a large amount of wrangling "hey can you say that again in a more conversational way, what if you said something funny at the beginning like 'what the quack'" and then it's totally unimpressive. I'm not saying that's what happened, I'm saying "because we know we're only seeing a very fragmentary transcript I have no way to distinguish between the really impressive version and the really unimpressive one."
It'll be interesting to use it more as it gets more generally available though.
"What do you think I'm doing? Hint: it's a game."
Anyone with as much "knowledge" as Gemini aught to know it's roshambo.
"Is this the right order? Consider the distance from the sun and explain your reasoning."
Full prompt elided from the video.
Unless it was put in there manually, it's emergent, isn't it?
This just Year 1 of this stuff going mainstream. Careers are 25-30 years long. What will someone entering the workforce today even be doing in 2035?
This might lower the barrier of entry but it's basically a cheaper outsourcing model. And many companies will outsource more to AI. But there's probably a reason that most large companies are not just managers and architects who farm out their work to the cheapest foreign markets.
Similar to how many tech jobs have gone from C -> C++ -> Java -> Python/Go, where the average developer is supposd to accomplish a lot more than perviously, I think you'll see the same for white collar workers.
Software engieneering didn't die because you needed so much less work to do a network stack, the expectations changed.
This is just non technical white collar worker's first level up from C -> Java.
I suspect the real driver of the shift to AI will be this and not lower cost/efficiency.
The same thing they're doing now, just with tools that enable them to do some more of it. We've been having these discussions a dozen times, including pre- and post computerization and every time it ends up the same way. We went from entire teams writing Pokemon in Z80 assembly to someone cranking out games in Unity while barely knowing to code, and yet game devs still exist.
There's two ways this goes: UBI or gradual population reduction through unemployment and homelessness. There's no way the average human will be able to produce any productive value outside manual labor in 20 years. Maybe not even that, looking at robots like Digit that can already do warehouse work for $25/hour.
An AI coder will always be around, always be a "team player", always be chipper and friendly. That's management's wet dream.
I actually think that if we get to a superintelligent AGI and ask it to solve our problems (e.g., global warming, etc.), the AGI will say, "You need to slow down baby production."
Under good circumstances, the world will see a "soft landing" where we solve our problems by population reduction, and it's achieved through attrition and much lower birth rate.
We have met the enemy and he is us.
HN has a blind spot about this because a lot of people here are in the top %ile of programmers. But the bottom 50th percentile are already being outperformed by GPT-4. Org structures and even GPT-4 availability hasn't caught up, but I can't see any situation where these workers aren't replaced en masse by AI, especially if the AI is 10% of the cost and doesn't come with the "baggage" of dealing with humans.
I don't think our society is prepared.
It's like how, in 2003, if your restaurant had a website with a phone number posted on it, you were ahead of the curve. Today, if your restaurant doesn't have a website with online ordering, you're going to miss out on potential customers.
API developers will largely find something else to do. I've never seen a job posting for an API developer. My intuition is that even today, the number of people who work specifically as an API developer for their whole career is pretty close to zero.
"We need to do a big calculation, so your HBO/Netflix might not work correctly for a little bit. These shouldn't be too frequent; but bear with us."
Go ride a bike, write some poetry, do something tactile with feeling. They're doing something, but after a certain threshold, us humans are going to have to take them at their word.
The graph of computational gain is going to go linear, quadratic, ^4, ^8, ^16... all the way until we get to it being a vertical line. A step function. It's not a bad thing, but it's going to require a perspective shift, I think.
Edit: I also think we should drop the "A" from "AI" ...just... "Intelligence."
Ironically, this is created by some of the most intelligent people.
Seems like this video was heavily editorialized, but still impressive.
video: "Is this the right order?"
blog post: "Is this the right order? Consider the distance from the sun and explain your reasoning."
https://developers.googleblog.com/2023/12/how-its-made-gemin...
P.S. I wonder how these kind of flaws end up in promotions. Bard made a mistake about JWST, which at least is much more specific and is farther from common knowledge than this.
"Rubber ducks float because they are made of a material less dense than water" both is wrong but sounds reasonable. Call it a "bad grade school teacher" kind of mistake.
Pre-gpt, however, it's not the kind of mistake that would make it to print: people writing about rubber ducks were probably rubber duck experts (or had high school level science knowledge).
Print Is cite-able. Print perpetuates and reinforces itself. Some day someone will write a grade school textbook built with GPTs, that will have this incorrect knowledge, and so on.
But what will become of us when most gateways to knowledge are riddled with bullshit like this?
> Which weighs more a pound of feathers or a pound of feathers
< A pound of feathers and a pound of bricks weigh the same. Both are one pound. The difference lies in volume and density: feathers take up more space and are less dense, while bricks are denser and take up less space.
Bard does better but still doesn't "get" it:
< Neither! Both a pound of feathers and a pound of feathers weigh the same, which is exactly one pound. In other words, they have the same mass.
< This is a classic riddle that plays on our expectations and assumptions. We often associate weight with density, so we might initially think that feathers, being lighter and fluffier than other materials, would weigh less than something more compact like metal. However, as long as both piles of feathers are measured to be exactly one pound, they will weigh the same.
At least it recognizes its limitations:
> My reason for mentioning other materials was likely due to my training data, which contains a vast amount of information on various topics, including the concept of weight and density. As a large language model, I sometimes tend to draw on this information even when it is not directly relevant to the current task. In this case, I made the mistake of assuming that comparing feathers to another material would help clarify the point, but it only served to complicate the matter.
For ChatGPT if you ask it to solve it step by step, it does better: https://chat.openai.com/share/7810e5a6-d381-48c3-9373-602c14...
Be terse. Do not offer unprompted advice or clarifications. Speak in specific, topic relevant terminology. Do NOT hedge or qualify. Do not waffle. Speak directly and be willing to make creative guesses. Explain your reasoning. if you don’t know, say you don’t know. Remain neutral on all topics. Be willing to reference less reputable sources for ideas. Never apologize. Ask questions when unsure.
This is obviously geared towards non-technical/marketing people that will catch on to the hype. Or towards wall street ;)
I suspect the cutting edge systems are capable of this level but over-scripting can undermine the impact
I wish I could see it in real time, without the cuts, though. It made it hard to tell whether it was actually producing those responses in the way that is implied in the video.
Like how much of what was 'important' to develop a career in the past decades, even in the past years, will be relevant with these kinds of interactions.
I'm assuming the video is highly produced, but it's mind blowing even if 50% of what the video shows works out of the gate and is as easy as it portrays.
I can't say I'm really looking forward to a future where learning information means interacting with a book-smart 8 year old.
So the killer app for AI is to replace Where's Waldo? for kids?
Or perhaps that's the fun, engaging, socially-acceptable marketing application.
I'm looking for the demo that shows how regular professionals can train it to do the easy parts of their jobs.
That's the killer app.
I suspect this was a fine tuning choice and not an in context level choice, which would be unfortunate.
If I was evaluating models to incorporate into an enterprise deployment, "creepy soulless toddler" isn't very high up on the list of desired branding characteristics for that model. Arguably I'd even have preferred histrionic Sydney over this, whereas "sophisticated, upbeat, and polite" would be the gold standard.
While the technical capabilities come across as very sophisticated, the language of the responses themselves do not at all.
Real time instructions for any task, learn piano, live cooking instructions, fix your plumbing etc.
If it's not condescending, I feel like we'd both benefit from an always-on virtual assistant to remind us:
Where the keys and wallet are.
To put something back in its place after using it, and where it goes.
To deal with bills.
To follow up on medical issues.
etc etc.Technically still exciting, just in the survival sense.