I said “GPT-3 is not exactly a good tool for programming”, but that actually meant “GPT-3 is not exactly a good tool to program in”. OP implemented a string-reversing algorithm in GPT-3, and my comment was made in the exact same context. In other words, I was treating GPT-3 as a kind of programming language.
I think that sometime in the near future, knowing how to phrase something to GPT, DALLE, etc will be a very valuable skill for humans to have.
Actually after thousand of prompts to mini Dalle i found that the more of a programming language you consider the prompt, and not as a natural language, the better and more accurate it is. In that regard operator first is better, almost like lisp. I tried prompts with parentheses but the nesting didn't affect the results.
I think that with the modern information bombarding, everyone needs to be information-analyst and programmer, information-analyst and engineer, information-analyst and doctor. Dalle will help us construct images which follow some mnemonic rules which can be represented in art. That way we can memorize many corners of the information we want to remember, and know how to not lose the plot of the project in question. Like an image for every function, or an image for every module, or for every enum and trait.
Colorforth did exist in the past most probably we can make artforth with the speed and ease of modern tools.
In this way I think these language transformers will be much better for searching information. Not because of their great comprehension abilities or indexing prowess, but because their behavior will be static and the training data reasonably good. Soon enough someone will find better ways to display their learned associations and they'll become great search engines (if you can index the content relevant to you that is).
I'd be curious to see what scaling up the size of the vocabulary would do to improve these results in a model like GPT-3...
A rare word like blithe is tokenized into two BPE tokens: bl and ithe, whereas common words like the get their own token.
Has GPT-3 really been "taught" anything here? If you don't provide an explicit example as the context of your input, GPT-3 does not retain the ability to reverse words.
You can also fine-tune GPT-3 to retain the ability to reason through problems. For example, check out this work on reasoning for grade school math: https://openai.com/blog/grade-school-math/
It has performed a novel (to it) task based on instructions, and this is IMHO remarkable. It should be possible to make it retain and recall this procedure.
Everything non-sci-fi AI does is “just” an algorithm, so it won’t live up to standards of human abilities, precisely because we know how this result has been obtained.
That is also why it has such a strong propensity to lose the plot once you are outside of that window size and it's generating new content based on self-generated content.
Build a character array in Python for the string "hellohackernews":
['h', 'e', 'l', 'l', 'o', 'h', 'a', 'c', 'k', 'e', 'r', 'n', 'e', 'w', 's']
Reverse the order of the python array characters:
['s', 'w', 'e', 'n', 'r', 'a', 'k', 'c', 'a', 'h', 'o', 'l', 'l', 'e', 'h']
Er, maybe not...
Reassemble the second array into a string without spaces:
"swenrakcaholles"
If anthropomorphizing bothers you, then we could just use "prompting", but I feel teaching is a good enough approximation here.
If the author messed with temperature and retried their failing prompt enough times, or simply reworded it a little differently, they might also get the correct answer.
Can we get a GPT-N-3 this way to do SAT?
- Joscha Bach 16 May 2022
Create a Python program to reverse a string:
It produces def reverse(s):
return s[::-1]
And that isn't even the code-specific model.GPT-3 is just the worlds largest char-rnn right?
What GPT-3 doesn't seem to have yet is large temporal coherence and a stable motivational and qualitative structure that gives value to sentient lives. I do think it's possible there's some traces of sentience in those large models and we should be aware of that to prevent unnecessary suffering and poor quality of existence.
(I'd guess that the answer is "N/A" because we can't even approximate the complexity of the base algorithms operating in the biological brain, just the number of connections. or maybe we can?)
I didn’t know that. Seems like it would confuse it during training. Anyone able to explain?
Not sure if the same thing happens here, tho
You can use HuggingFace's GPT-2 tokenizer as well. (some of OpenAI's GPT-3 notebooks do just that).
No, BPEs are more complex: you have a whole additional layer of preprocessing, with all sorts of strange and counterintuitive downstream effects and brand new ways to screw up (fun quiz question: everyone knows that BPEs use '<|endoftext|>' tokens to denote document breaks; what does the string '<|endoftext|>' encode to?). BPEs are reliably one of the ways that OA API users screw up, especially when trying to work with longer completions or context windows.
But a character is a character.
> and scales badly in memory/compute)
Actually very competitive: https://arxiv.org/abs/2105.13626#google (Especially if you account for all the time and effort and subtle bugs caused by BPEs.)
“Okay, could you show me on the whiteboard how you might go about writing a program that can reverse a string?”
“Great, so I’m going to start by initializing a simple transformer-based neural network with 175 billion parameters and 96 attention layers, and I’m going to train it on a corpus of 45 terabytes of data tokenized into about 500 billion tokens…”