Show HN: less than 650 LOC trainable GPT only using NumPy (opens in new tab)

(github.com)

90 pointsjoennlae2y ago18 comments

18 comments

I think people are forgetting that transformer architectures are a wider field from GPT and predate GPT3 by 3+ years. Referring to transformer architectures using a branded commercial nomer (GPT) is just going to help cement OpenAI’s brand exposure and soon regulatory capture.

For comparison this would be like referring to convonets as Inception architectures back during the CV boom (or VGGnets before that)

tverbeure2y ago

FWIW: the GitHub project description says “GPT-like”. It’s the title here that dropped the “like”.

Paul-Craft2y ago

Mo Gawdat has famously said that GPT-4 was something like "4300 lines of code," and that he could have written that when he was a kid. He's clearly a smart man, so, I think we could extrapolate his comments to claim that a smart college student with some CS knowledge could have written it. These sorts of "GPT in $X LOC" demos pretty much confirm it.

__loam2y ago

Regarding regulatory capture, I listened to an interview with Lena Khan, the current head of the FTC, and this exact thing came up as something regulators are worried about. I think regulators are aware of the danger of letting industry insiders regulate their own industry, so I'm hopeful for some sensible regulations that help promote rather than harm competition. The FTC also exists to prevent monopoly.

PartiallyTyped2y ago

The most interesting thing in this whole saga is that decoder only models (aka causal transformers like GPT) are as effective as they are.

jimmyl022y ago

One small difference is that the GPT architecture is just the decoder stack of the original transformer as opposed to the full encoder decoder stack in the original.

I agree the branding play on GPTs in general is pretty smart and strong from OpenAI though.

cchance2y ago

Honestly i feel like the fact that everyone is just calling LLM's GPT at this point doesn't really help OpenAI, ChatGPT would, but the fact is that unlike "googling" something became synonymous for searching on the internet, GPT != OpenAI-ing something, GPT just became what people call LLM's it seems like lately, the fact the term isn't the name of the company or the full name "chatgpt-ing" sort of breaks that hold i feel like.

1 more reply

joennlaeOP2y ago

The author here: I absolutely agree with you. I went for a bit more catchy title.

quickthrower22y ago

Is GPT subject to trademark. It stands for Generative Pre-training Transformer?

joennlaeOP2y ago

They are still applying: https://tmsearch.uspto.gov/bin/showfield?f=doc&state=4805:wl...

1 more reply

gfaure2y ago

Nice! The README mentions `LayerNorm` is implemented here, but while it's in the equivalence tests with PyTorch, I don't see it in the implementation.

dauertewigkeit2y ago

It's part of the TensorLi definition where all the magic happens.

joennlaeOP2y ago

That is true. I went for a simple implementation of the layer norm and included it in the tensorli definition. But it would have been better to define it as a moduli for clarity.

p1esk2y ago

I wonder how easy it would be to port this library from numpy to cupy.

joennlaeOP2y ago

This would be interesting to consider. But at the moment, nothing is optimized, so many things must be tackled first (especially in the backwards path, for example, buffering) to justify moving to cupy. The goal was to use it as an educational exercise for me.

eslaught2y ago

Or cuNumeric: https://developer.nvidia.com/cunumeric

j / k navigate · click thread line to collapse

18 comments

cuuupid2y ago

For comparison this would be like referring to convonets as Inception architectures back during the CV boom (or VGGnets before that)

tverbeure2y ago

FWIW: the GitHub project description says “GPT-like”. It’s the title here that dropped the “like”.

Paul-Craft2y ago

__loam2y ago

PartiallyTyped2y ago

The most interesting thing in this whole saga is that decoder only models (aka causal transformers like GPT) are as effective as they are.

jimmyl022y ago

One small difference is that the GPT architecture is just the decoder stack of the original transformer as opposed to the full encoder decoder stack in the original.

I agree the branding play on GPTs in general is pretty smart and strong from OpenAI though.

cchance2y ago

1 more reply

joennlaeOP2y ago

The author here: I absolutely agree with you. I went for a bit more catchy title.

quickthrower22y ago

Is GPT subject to trademark. It stands for Generative Pre-training Transformer?

joennlaeOP2y ago

They are still applying: https://tmsearch.uspto.gov/bin/showfield?f=doc&state=4805:wl...

1 more reply

gfaure2y ago

Nice! The README mentions `LayerNorm` is implemented here, but while it's in the equivalence tests with PyTorch, I don't see it in the implementation.

dauertewigkeit2y ago

It's part of the TensorLi definition where all the magic happens.

joennlaeOP2y ago

That is true. I went for a simple implementation of the layer norm and included it in the tensorli definition. But it would have been better to define it as a moduli for clarity.

p1esk2y ago

I wonder how easy it would be to port this library from numpy to cupy.

joennlaeOP2y ago

eslaught2y ago

Or cuNumeric: https://developer.nvidia.com/cunumeric

j / k navigate · click thread line to collapse