Teaching GPT-3 to reverse words | Better HN

Teaching GPT-3 to reverse words | Better HN

78 comments

esjeon4y ago

... and this is exactly what programming is - breaking down a task into steps that computer can comprehend. I now get more strong feeling that everyone should be a programmer in the end. Plus, GPT-3 is not exactly a good tool for programming.

I use GTP-3 codex daily when working. It saves me time, helps me explore unfamiliar languages and APIs and generates approaches to solve problems. It can be shockingly good at coding in narrow contexts. It would be a mistake to miss the developments happening in this area

esjeon4y ago

I think people are misunderstanding my comment here.

I said “GPT-3 is not exactly a good tool for programming”, but that actually meant “GPT-3 is not exactly a good tool to program in”. OP implemented a string-reversing algorithm in GPT-3, and my comment was made in the exact same context. In other words, I was treating GPT-3 as a kind of programming language.

Well, a program is a series of tokens, and what is GPT-3 good at? Generating tokens. While that's oversimplifying, I feel like we're closer to automated programming than we realize.

bobsmooth4y ago

Recently I wrote a python script to merge a bunch of videos with subtitle files using ffmpeg. It probably would have been faster to do it manually but I can imagine a world where telling GPT-5 to "Generate a python script that merges a folder of video files with subtitle files of the same name" is faster and more accessible than regular programming.

ActorNightly4y ago

Yep. Its not hard to imagine mapping some description text to strictly deterministic operations like generating css/html for a front end, doing some definite data manipulation, or at the least, turning natural language description into an sql query.

goatlover4y ago

Generating the tokens isn't he hard part. It's figuring out which tokens need to be generated in response to whatever solution needs to be coded. That's part communication, part comp sci and part artistic.

mathgladiator4y ago

Given how hard it is for humans to effectively communicate, im not sure we are so close. In essence, the hard part of most software is giving users something they want which is also correct.

rahidz4y ago

Everyone knows how to use Google, but it takes a certain skill and knowledge to use Google effectively.

I think that sometime in the near future, knowing how to phrase something to GPT, DALLE, etc will be a very valuable skill for humans to have.

Indeed. I tried many prompts, given to mini Dall-e and the generated art is located at insta/pramatias alongside with the prompts. Actually i didn't know that insta prohibits downloading of the images, so they will be uploaded to additional sites. Is there any site which has the beauty and simplicity for uploading albums like insta? Devianart is pretty bloated.

Actually after thousand of prompts to mini Dalle i found that the more of a programming language you consider the prompt, and not as a natural language, the better and more accurate it is. In that regard operator first is better, almost like lisp. I tried prompts with parentheses but the nesting didn't affect the results.

I think that with the modern information bombarding, everyone needs to be information-analyst and programmer, information-analyst and engineer, information-analyst and doctor. Dalle will help us construct images which follow some mnemonic rules which can be represented in art. That way we can memorize many corners of the information we want to remember, and know how to not lose the plot of the project in question. Like an image for every function, or an image for every module, or for every enum and trait.

Colorforth did exist in the past most probably we can make artforth with the speed and ease of modern tools.

mateo14y ago

It used to be a great skill when google's behavior was reasonably static and predictable, therefore learnable. Today if you open 2 google instances on your phone and computer they'll both likely return different results. Move to the next city block, and again, same problem. You want to google the same query again? If the algorithm thinks you didn't find what you were looking before the first time, you'll get once again different results.

In this way I think these language transformers will be much better for searching information. Not because of their great comprehension abilities or indexing prowess, but because their behavior will be static and the training data reasonably good. Soon enough someone will find better ways to display their learned associations and they'll become great search engines (if you can index the content relevant to you that is).

axg114y ago

100% agreed. I already see myself doing this with Github Copilot. If I write a comment or start a line of code in a certain way, I get a much better suggested code completion.

I feel like this is a given in a lot of sci-fi I read. "Jokester," an Asimov short story, is premised on people called "Grand Masters" who know how to ask the right questions of Multivac, the globe-spanning supercomputer that appears in a few of his stories.

kordlessagain4y ago

I'm using GPT-3 to write Solr queries when my parsing fails, so I agree with this.

Der_Einzige4y ago

Part of the problem here is that GPT-3 has such a small vocabulary. It's 50K tokens, and many of those are either garbage, punctuation, or full words (rather than sub words).

I'd be curious to see what scaling up the size of the vocabulary would do to improve these results in a model like GPT-3...

50k is not the number of unique words that GPT-3 supports, and perhaps you're referring to the BPE tokens. The input to GPT-3 is not tokenized by splitting on spaces, and is based on byte-pair encoding tokens. You can play with it here: https://beta.openai.com/tokenizer.

A rare word like blithe is tokenized into two BPE tokens: bl and ithe, whereas common words like the get their own token.

I don't think a larger vocab would help. All the individual letters are in the ~50k token vocab already, but the word "alphabet" will still not get tokenized to [a, l, p, h, a, b, e, t]. Using a larger vocab like PaLM's 256k vocab would have the same issue.

a65cec93b4y ago

> GPT-3 correctly reverses long words! But to get there, we had to teach GPT-3 the algorithm to use to get around its limitations.

Has GPT-3 really been "taught" anything here? If you don't provide an explicit example as the context of your input, GPT-3 does not retain the ability to reverse words.

npew4y ago

(author here) It depends a bit on how you define "retain". Most GPT-3 applications use custom "prompts" to train it for their specific use case. So in that way, the prompt is retained with every request.

You can also fine-tune GPT-3 to retain the ability to reason through problems. For example, check out this work on reasoning for grade school math: https://openai.com/blog/grade-school-math/

pornel4y ago

Isn’t this a “can submarines swim?” question?

It has performed a novel (to it) task based on instructions, and this is IMHO remarkable. It should be possible to make it retain and recall this procedure.

Everything non-sci-fi AI does is “just” an algorithm, so it won’t live up to standards of human abilities, precisely because we know how this result has been obtained.

mysecretaccount4y ago

We do not have any reason to believe that mere algorithms are incapable of living up to the standards of human abilities.

f38zf5vdt4y ago

No, it isn't taught anything. GPT3 text generation is effectively a really fancy autocompletion algorithm based on the n-many previous tokens in a rolling window. You can only "teach" GPT3 something within that window, and it doesn't "learn" there, it just tries its best to generate content based on what is stored in its massive n-dimension table of graph edges for tokens.

That is also why it has such a strong propensity to lose the plot once you are outside of that window size and it's generating new content based on self-generated content.

yunyu4y ago

You can update the "graph edges" with content longer than the window by fine tuning: https://beta.openai.com/docs/guides/fine-tuning

kordlessagain4y ago

I got it close:

Build a character array in Python for the string "hellohackernews":

['h', 'e', 'l', 'l', 'o', 'h', 'a', 'c', 'k', 'e', 'r', 'n', 'e', 'w', 's']

Reverse the order of the python array characters:

['s', 'w', 'e', 'n', 'r', 'a', 'k', 'c', 'a', 'h', 'o', 'l', 'l', 'e', 'h']

Er, maybe not...

Reassemble the second array into a string without spaces:

"swenrakcaholles"

tiborsaas4y ago

I'm not sure how you define teaching, but for me getting shown an example and then repeating it successfully with another input does mean teaching/learning. I know the model doesn't update though, let's not focus on that now.

If anthropomorphizing bothers you, then we could just use "prompting", but I feel teaching is a good enough approximation here.

f38zf5vdt4y ago

It's repeating based on what the trained model has given it about situations where instructions possibly similar to the instructions given are specified and which were about reversing strings in general.

If the author messed with temperature and retried their failing prompt enough times, or simply reworded it a little differently, they might also get the correct answer.

You're right for GPT 3, but it's an example of chain of thought reasoning, which seems to be a new area of research [1] and might get integrated into newer versions:

[1] https://arxiv.org/abs/2201.11903

jxy4y ago

That's easy to solve. Prepare all K-12 text books as prompts, and train another GPT-N to go from input to those prompts, then feed these prompts to the current GPT-3.

Can we get a GPT-N-3 this way to do SAT?

The complete version failed for me on "antidisestablishmentarianism", alas.

brycemice4y ago

Check It -- : ) "gpt-3 was never real, openai has faked all its output by simulating it with a large language model"

- Joscha Bach 16 May 2022

https://twitter.com/Plinz/status/1526268745802346496

swid4y ago

It's funny to me that this kind of usage of GPT is just programming with a lot of extra steps.

jameshart4y ago

If you just ask GPT-3 text-davinci-002 to complete

    Create a Python program to reverse a string:

It produces

    def reverse(s): 
        return s[::-1]

And that isn't even the code-specific model.

bobcostas554y ago

What happens if you ask it to evaluate the function it generated, with some input?

convolvatron4y ago

I was just thinking the opposite - that by choosing such a tiny problem one might be able to actually develop some intuition about what's going on inside that very black box

swid4y ago

I meant it mostly as a joke, but there is a certain amount of irony to it. This goes way beyond prompt engineering - he wrote an algorithm to run on GPT in a way you would not expect a non-programmer to write. I think the idea is cool and the process to write it was revealing.

sydthrowaway4y ago

Wait, can someone remind me of something?

GPT-3 is just the worlds largest char-rnn right?

gnramires4y ago

We are just the result of electrical signals (and a few chemical ones) in the brain, right? ;)

What GPT-3 doesn't seem to have yet is large temporal coherence and a stable motivational and qualitative structure that gives value to sentient lives. I do think it's possible there's some traces of sentience in those large models and we should be aware of that to prevent unnecessary suffering and poor quality of existence.

goatlover4y ago

Sentience comes from being embodied. We're not just our brains. The nervous system is intertwined with the rest of the body. There are some thirty million neurons in your gut, and bacteria there can influence your mood. We don't learn about the world primarily from a bunch of tokens. We do so by interacting with our bodies. Language is a kind of additional ability we've developed.

Our brains are how many orders of magnitude more complex than gpt-3? honest question

(I'd guess that the answer is "N/A" because we can't even approximate the complexity of the base algorithms operating in the biological brain, just the number of connections. or maybe we can?)

Technically it's not the largest, not char, not rnn... but it's close :)

sydthrowaway4y ago

Why not 'char', and not 'largest'?

> Tokens are chunks of characters. For example, the word “alphabet” gets broken up into the tokens “alph" and "abet”.

I didn’t know that. Seems like it would confuse it during training. Anyone able to explain?

gattilorenz4y ago

If I recall correctly, it's similar to how fasttext vectors work. For fasttext, this means that the representation of words is dependent to a certain extent to its morphemes (not really, but bear with me), so rare/inflected words can have a better representation due to the similarity with words that are similar-looking and more frequent (e.g. "unconstitutional" might never appear in the training data, but the system can approximate its meaning by composing that of "un", which it has seen in words such as "unbelievable", and the remaining subtokens, that come from the word "constitutional" that was present in the training set)

Not sure if the same thing happens here, tho

andrewmutz4y ago

I believe GPT-3 uses byte pair encoding, which allows it to do tokenization in a language-neutral manner:

https://en.wikipedia.org/wiki/Byte_pair_encoding

Yeah it's BPE. OpenAI has a nice tool that allows you to play with the tokenizer https://beta.openai.com/tokenizer.

minimaxir4y ago

Additionally, the tokenizer vocabulary is unchanged from GPT-2.

You can use HuggingFace's GPT-2 tokenizer as well. (some of OpenAI's GPT-3 notebooks do just that).

I thought I read it uses word2vec?

6gvONxR4sf7o4y ago

The alternatives are learning at the character level (way more complex, and scales badly in memory/compute), or learning at the whole word level (needs absurdly massive dictionary of words, and still can’t handle really rare/novel words). Breaking things into a set of subwords that allows you to encode any string solves lots of problems and is the relatively standard way to do things these days.

gwern4y ago

> The alternatives are learning at the character level (way more complex

No, BPEs are more complex: you have a whole additional layer of preprocessing, with all sorts of strange and counterintuitive downstream effects and brand new ways to screw up (fun quiz question: everyone knows that BPEs use '<|endoftext|>' tokens to denote document breaks; what does the string '<|endoftext|>' encode to?). BPEs are reliably one of the ways that OA API users screw up, especially when trying to work with longer completions or context windows.

But a character is a character.

> and scales badly in memory/compute)

Actually very competitive: https://arxiv.org/abs/2105.13626#google (Especially if you account for all the time and effort and subtle bugs caused by BPEs.)

Humans also think about words in terms of subcomponents, languages make heavy use of prefixes and suffixes for example.

SemanticStrengh4y ago

This is not the same.. The masks are randomized and lossy. Although yes there is potential for a transformer specially trained to segment prefixes/affixes/suffixes, it might augment some of its encoding abilities, see e.g spanbert for a related example of opportunity.

SemanticStrengh4y ago

This is masked token learning, which is used e.g by BERT. This is obscolete and alternatives such as XLNET are much superior but there is too much inertia in the industry and newer large models are still built with the same lossy encoding..

https://nitter.net/npew/status/1525900849888866307

jameshart4y ago

Oh, I’m so looking forward to my next coding interview.

“Okay, could you show me on the whiteboard how you might go about writing a program that can reverse a string?”

“Great, so I’m going to start by initializing a simple transformer-based neural network with 175 billion parameters and 96 attention layers, and I’m going to train it on a corpus of 45 terabytes of data tokenized into about 500 billion tokens…”

"Cool, so what do you think would be the time complexity of that? Do you think we can maybe do better than that?"

jameshart4y ago

Actually it turns out it's O(n). Which goes to show that constant factors can be more important than you think when looking at raw time complexity big-O.

Not if you also want a short poem where each word starts with a letter from the original word, and then a short literary commentary on it.

j / k navigate · click thread line to collapse