The challenge to making something like this, or Co-pilot / Ghostwrite, work well is about meeting users where they are. Spreadsheet users dont want to deal with API keys or know what temperature is - but anyone (like this tweet) can set up direct API use with generic models in 10 minutes. This document has all the code to do so ;). [1]
For non-engineers - or folks who need a reliable and familiar syntax to use at scale and across their org - promptloop [2] is the best way to do that. All comments in here are great though. We have been live with users since the summer - no waitlist. And as a note - despite the name "prompt engineering" has almost nothing to do with making this work at scale.
[1] https://docs.google.com/spreadsheets/d/1lSpiz2dIswCXGIQfE69d... [2] https://www.promptloop.com/
Maybe not good to reveal customer names this way, unless they already disclosed it publicly
There must be a set of projects which are cost prohibited now due to having to pay humans but will become feasible exactly because of this tech. For a good portion of these, higher-than-human error rate will also be tolerable or at least correctable via a small degree of human intervention.
There's also increased variance in human accuracy. You might train 100 but 10k people? A model is consistent all the way.
It only works the opposite way, where machines and AI handle the trivial cases and humans handle the non-trivial ones. Many people actually genuinely like to solve hard problems which require thinking and skill, most people strongly dislike mundane repetitive tasks.
Makes total sense to me.
Human first scenarios will be more rare. And probably where the human has to do it by law. Made up example: border control checking passport photos match face. Human checks and if they click OK then AI double checks.
But when the AI is capable of something the person can't do (like Stable Diffusion creating images compared to me) the AI should take first chair.
So the question isn't which one you can add to your tool faster. The question is, if I already have this AI tool setup, is it worth setting up the USPS API to go from 95% accuracy to 99.9% accuracy. For countless applications, it wouldn't be. Obviously if you need to scale and need to ensure accuracy, it's a different story.
Better one would be "based on these three columns, generate a cold outbound email for the person..."
it would suck to be on the receiving end of those, but the use case makes much more sense.
A yet if P(someone unknown is a robot) gets too large, it's going to be a weird adjustment period.
Non-exact outputs are actually a feature and not a bug for other use cases - but this takes a bit of use to really see.
I get the feeling that my visual system and the language I use are respectively pretty bad at processing and conveying precise information from a plot, (beyond simple descriptors like "A is larger than B" or "f(x) has a maximum"). I guess I would find it mildly surprising if any Vision-Language model were able to perform those tasks very well, because the representations in question seem pretty poorly suited.
I get that popular diffusion models for image generation are doing a bad job composing concepts in a scene and keeping relationships constant over the image--even if Stable Diffusion could write in human script, it's a bad bet that the contents of a legend would match a pie chart that it drew. But other Vision-Language models, designed for image captioning or visual question answering, rather than generating diverse, stylistic images, are pretty good at that compositional information (up to, again, the "simple descriptions" level of granularity I mentioned before.)
Note: I'm the founder :) Happy to answer any questions.
Reply below with some sample data/problem and I'll reply with a demo to see if we can solve it out of the box!
> 0 day 7 hour 31 min 42 sec
I've never seen rolling waitlists, it's kind of strange tbh
Me too, different project and different labelling company but the conclusion - it's better to do it in house. Labelling is hard. You need to see, talk with and train your labelling team.
"I tried parsing your messy input. Here's what I came up with. Please make sure it's correct then proceed with the checkout."
Maybe like 1 in my past 2y of many, many spreadsheets has been financing related. I think you might be overgeneralizing to an ungeneralizeably large group -- the set of all human spreadsheets.
"I need to input a number of variables and find their sum and average [or even more]. And I need to see how the outputs change if I change an input...".
Is there any reason to think the situation has substantially improved since then?
Formal | Informal
Lane, Thomas | Tommy Lane
Brooks, Sarah | Sarah Brooks
Yun, Christopher |
Doe, Kaitlyn |
Styles, Chris |
…
Automating something like this is extremely hard with an algorithm and extremely easy with ML. Even better, many people who use spreadsheets aren’t very familiar with coding and software, so they do things manually even in cases where the formula is simple.> Lane, Thomas => Thomas Layne
> Brooks, Sarah => Sarah Brooksy
> Yun, Christopher => Chris Yun
> Doe, Kaitlyn => KD
> Styles, Chris => Chris Spice, Chris Chasm
I'm sure the bot overcomplicated an otherwise simple task, but I think there's always gonna be some creative error if we rely on things like that. It's funny though because these results are plausible for what a real person might come up with as informal nicknames for their friends.
And for the second Kindle review, it summarized one point from the actual review, then completely made up two additional points!
Really impressive Sheets extension, but you'd have to be so careful what you applied this to.
I wonder if this means the AI is dumb or that the AI is smart enough to notice that humans just make shit up sometimes, like when they're not reading carefully or when they need filler.
generates python, then executes
[1] https://arxiv.org/pdf/1703.03539.pdf [2] https://support.microsoft.com/en-us/office/using-flash-fill-...
I'm sure the USPS is already doing this and more, and if not, there's probably some AI jobs lined up for it :)
Here volume matters, and all misses are just lost data which I'm fine with. The general purpose nature of the tool makes it tremendous. There was a time when I would have easily paid $0.05 / query for this. The only problem with the spreadsheet setting is that I don't want it to repeatedly execute and charge me so I'll be forced to use `=GPT3()` and then copy-paste "as values" back into the same place which is annoying.
Like: give me a list of all customers from London who purchased in January a laptop with more than 16GB of RAM and used a coupon between 10% and 25%. Sort it by price payd.
Just ran your exact query through OpenAI's Codex (model: code-davinci-002), and this was the result:
SELECT * FROM customers WHERE city = 'London' AND purchase_date BETWEEN '2019-01-01' AND '2019-01-31' AND product_name = 'laptop' AND product_ram > 16 AND coupon_percentage BETWEEN 10 AND 25 ORDER BY price_paid DESC;
I'd say it's pretty damn accurate.
Agreed. Sooner or later a company is going to do this with its customers, in ways that are fine 95% of the time but cause outrage or even harm on outliers.
And if that company is anyone like Google, it'll be almost impossible for the customers to speak to a human to rectify things.
Also when this is normal and ubiquitous come people who are playing it and AI will be just dumb to recognise, the real humans all fired, game over, stuck at shitty systems and everyone goes crazy.
I just lost the game.
But if you allow some false negatives, such as trying to detect if a bot is a bot, I think that could work? But I feel like the technology to write fake text is inevitably going to outpace the ability to detect it.
But if someone uses this to do 90% of the work and then just edits it to make it personal and sound like themselves, then it's just a great time saving tool.
I mean, in this exact example, 70 years ago you'd have to hand address each thank you card by hand from scratch. 10 years ago you could use a spreadsheet just like this to automatically print off mailing labels from your address list. It didn't make things worse, just different.
This is just the next step in automation.
This is still way too optimistic. Reading through something that's "almost right", seeing the errors when you already basically know what it says / what it's meant to say, and fixing them, is hard. People won't do it well, and so even in this scenario we often end up with something much worse than if it was just written directly.
There is a lot of evidence for this, from the generally low quality of lightly-edited speech-to-text material, to how hard it is to look at a bunch of code and find all of the bugs without any extra computer-generated information, to how hard editing text for readability can be without serious restructuring.
I could extrapolate in my extremely judgmental way that the person who does that probably has a grandiose sense of how valuable their own time is, first of all, and secondly an impractical and sheepishly obedient devotion to big weddings with guest-lists longer than the list of people they actually give a shit about. Increase efficiency in your life further upstream, by inviting fewer people! (Yeah right, might as well tell them to save money by shopping less and taking fewer trips. Like that would ever work!)
But I digress, and anyway don't take any of that too seriously, as 20 years ago I was saying the same kinds of things about mobile phones... like "Who do you think you are, a surgeon, with that phone?" Notice it's inherently a scarcity-based viewpoint, based on the previous however-many years when mobile phones really were the province only of doctors and the like. Now they're everywhere... So, bottom line, I think the thank-you notes are a lousy use of the tech, but just like the trivial discretionary conversations I hear people having on their mobile phones now that they're ubiquitous, this WILL be used for thank-you notes!
The software would save people 80% or the work and most are lazy enough to release it as is, instead of fixing the remaining 20%. That laziness will end up forcing legislation to flag and eventually ban or deprioritize all GPT content, which will result in a war of adversarial behaviors trying to hide generated stuff among real. Can’t have nice things!
Let alone flagging/deprioritizing it via some draconian legislation?
[0] https://ncse.ngo/americans-scientific-knowledge-and-beliefs-...
Queue Fry "I'm scare-roused" meme...
The hilarious one is changing the zip code to 90210. The AI basically accusing you of a typo because you obviously meant that more famous zip code.
General purpose AIs in situations where more targeted, simpler solutions are needed are going to be incredibly dangerous. Sure this AI can fly a plane 99.999% of the time, but every once in a while it does a nose dive because of reasons we cannot possibly understand or debug.
So of course a human developer made an AI that makes bad data.
Only they aren't. Check the video again, they come out fine.
Edit: Oh dang, you're all right, several of them have wrong digits. :l
I remember in like 2007 or something, in the early days of Facebook, someone made a CLI interface to the FB API. And I wrote a random-timed daily cron job that ran a Bash script that checked "which of my FB friends have their birthday today", went through that list, selected a random greeting from like 15 different ones I'd put into an array, and posted this to the wall of person $i. Complete with a "blacklist" with names of close friends and family, where the script instead sent me an email reminder to write a manual, genuine post.
I used to have a golfed version of that script as my Slashdot signature.