Big LLMs weights are a piece of history (opens in new tab)

taneq1y ago

Needs more superlatives. “Biggest” < “Extra Biggest” < “Maximum Biggest”. :D

inciampati1y ago

And I'd love to see data compression terminology get an overhaul. Do we need big LLMs or just succinct data structures? Or maybe "compact" would be good enough? (Yeah LLMs are cool but why not just, you know, losslessly compress the actual data in a way that lets us query its content?)

xanderlewis1y ago

And the US ‘small’ LLMs will actually be slightly larger than the ‘large’ LLMs in the UK.

aziaziazi1y ago

I wonder how does the skinnies get dressed oversea: I wear European S which translate to XXS in the US, but there’s many people skinnier than me, still within a “normal" BMI. Do they have to find XXXS? Do they wear oversized clothes? Choosing trousers is way easier because the system of cm/inches of length+perimeter correspond to real values.

deepsun1y ago

We ordered swag T-shirts for a conference from two providers, but EU provider L's were actually larger than US L!

jgalt2121y ago

It's funny you say that, but when travelling abroad I wondered how Europeans and Japanese stay sufficiently hydrated.

https://www.abbreviations.com/BLLM#google_vignette

miki1232111y ago

> The UK

You mean the EU, right? The UK isn't covered by the AI act.

t_mann1y ago

Big LLM is too long as a name. We should agree on calling them BLLMs. Surely everyone is going to remember what the letters stand for.

nullhole1y ago

I still like Big Data Statistical Model

bookofjoe1y ago

>What does BLLM stand for?

temp08261y ago

Bureau of Large Land Management

heyjamesknight1y ago

I want to apologize for this joke in advance. It had to be done.

We could take a page from Trump’s book and call them “Beautiful” LLMs. Then we’d have “Big Beautiful LLMs” or just “BBLs” for short.

Surely that wouldn’t cause any confusion when Googling.

badlibrarian1y ago

I've sat in more than one board meeting watching them take 20 minutes to land on t-shirt sizes. The greatest enterprise sales minds of our generation...

ben_w1y ago

I've seen things you people wouldn't believe.

I’ve seen corporate slogans fired off from the shoulders of viral creatives. Synergy-beams glittering in the darkness of org charts. Thought leadership gone rogue… All these moments will be lost to NDAs and non-disparagement clauses, like engagement metrics in a sea of pivot decks.

Time to leverage.

https://x.com/swyx/status/1679241722709311490

latexr1y ago

Name them like clothing sizes: XXLLM, XLLM, LLM, MLM, SLM, XSLM XXSLM.

swyx1y ago

i did this!

XXLLM: ~1T (GPT4/4.5, Claude Opus, Gemini Pro)

XLLM: 300~500B (4o, o1, Sonnet)

LLM: 20~200B (4o, GPT3, Claude, Llama 3 70B, Gemma 27B)

~~zone of emergence~~

MLM: 7~14B (4o-mini, Claude Haiku, T5, LLaMA, MPT)

SLM: 1~3B (GPT2, Replit, Phi, Dall-E)

~~zone of generality~~

XSLM: <1B (Stable Diffusion, BERT)

4XSLM: <100M (TinyStories)

ai-christianson1y ago

MLM... uh oh

Arcuru1y ago

I've been labeling LLMS as "teensy", "smol", "mid", "biggg", "yuuge". I've been struggling to figure out where to place the lines between them though.

zargon1y ago

itsy-bitsy <= 3B

teensy 4B to 29B

smol 30B to 59B

mid 60B to 99B

biggg 100B to 299B

yuuge 300B+

But of course these are all flavors of "large", so then we have big large language models, medium large language models, etc, which does indeed make the tall/grande/venti names appropriate, or perhaps similar "all large" condom size names (large, huge, gargantuan).

guestbest1y ago

Why not LLLM for large LLM’s and SLLM for small LLM’s, assuming there is no middle ground

flir1y ago

M, LM, LLM, LLLM, L3M, L4M.

Gotta leave room for future expansion.

kolinko1y ago

VLLM, Super VLLM, Almost Large Language Model

_heimdall1y ago

What makes it a Small Large Language Model? Why jot just an SLM?

orbital-decay1y ago

SLM is a widespread term already.

BobaFloutist1y ago

LLM, LLM 2.0, LLM 3.0, Mini LLM, Micro LLM, LLM C.

jfengel1y ago

LLM 95, LLM 98, LLM Millennium Edition, LLM NT, LLM XP, LLM 2000, LLM 7

I really appreciated the way they managed to come up with a new naming scheme each time, usually used exactly once.

tonyhart71y ago

can we have tiny LLM that can run on smartphone now

winter_blue1y ago

Apple Intelligence has an LLM that runs locally on the iPhone (15 Pro and up).

But the quality of Apple Intelligence shows us what happens when you use a tiny ultra-low-wattage LLM. There’s a whole subreddit dedicated to its notable fails: https://www.reddit.com/r/AppleIntelligenceFail/top/?t=all

One example of this is “Sorry I was very drunk and went home and crashed straight into bed” being summarized by Apple Intelligence as ”Drunk and crashed”.

badlibrarian1y ago

No. Smartphone only spin animated gif while talk to big building next to nuclear reactor. New radio inside make more efficient.

rubslopes1y ago

Is a tiny large language model equivalent to a normal sized one?

intrasight1y ago

I expect that the phone will only do the prompt parsing

samstave1y ago

I want a tiny_phone_based LLM to do thought tracking and comms awareness..

I actually applied to YC in like ~2014 or such for thus;

-JotPlot - I wanted a timeline for basically giving a histo timeline of comms btwn me and others - such that I had a sankey-ish diagram for when and whom and via method I spoke with folks and then each node eas the message, call, text, meta links...

I think its still viable - but my thought process is too currently chaotic to pull it off.

Basically looking at a timeline of your comms and thoughts and expand into links of thought - now with LLMs you could have a Throw Tag od some sort whereby you have the bot do work on research expanding on certain things and plugging up a site for that Idea on LOCAL HOST (i.e. your phone so that you can pull up data relevant to the convo - and its all in a timeline of thought/stream of conscious

hopefully you can visualize it...

AlienRobot1y ago

Terrible names, to be honest. My proposal: Hyper LLMs, Ultra LLMs, Large LLMs, Micro LLMs, Mobile LLMs.

isoprophlex1y ago

LLM M4 Ultra Pro Max 16e (with headphone jack)

naveen991y ago

LLM already has one large in it…

ben_w1y ago

If we can have a "Personal PIN Identification Number", we can have a "Large LLM Language Model".

https://en.m.wikipedia.org/wiki/Pleonasm

de-moray1y ago

What does a 20 LLM signify?

davidwritesbugs1y ago

or "DietLLM, RegularLLM, MealLLM and SuperSizedLLMWithFries"

rnrn1y ago

it's too bad vLLM and VLM are taken because it would have been nice to recycle the VLSI solution to describing sizes - get to very large language models and leave it at that.

do_not_redeem1y ago

After very large language models, the next step is mega language models, or MLMs. As a bonus, it describes the VC funding scheme that backs them too.

rnrn1y ago

we could also look to magnetoresistance and go for giant, colossal, extraordinary

TZubiri1y ago

Doesn't the first L in LLM mean large already?

It's like saying Automated ATM. Whoever wrote it barely knows what the acronym means.

This whole article feels like written by someone who doesn't understand the subject matter at all

thih91y ago

We’re fine with “The big friendly giant” and the sahara desert (“desert desert”); big llm could join the family of pleonasms.

Kiro1y ago

Yes, that's the point of the comment and the whole discussion here. LLMs are already Large so what should the prefix be? Big LLM is a strong contender. I'm also pretty sure the creator of redis is not "someone who doesn't understand the subject matter at all".

https://www.stichtinginternetarchive.nl/

xanderlewis1y ago

Almost everyone says ‘PIN number’ as well.

thih91y ago

Dismissed, Big LLM will live on along with Big Data.

deepsun1y ago

Well, big data for me was always clear -- when data sizes are too large to use regular tools (ls, du, wc, vi, pandas).

I.e. when pretty much every tool or script I used before doesn't work anymore, and need a special tool (gsutil, bq, dusk, slurm), it's a mind shift.

varispeed1y ago

Then there will be "decaf LLM"

semireg1y ago

Pro, max, ultra…

_bin_1y ago

"big large language model" renminds me uncomfortably of "automated teller machine machine"

huijzer1y ago

“There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.“

nextts1y ago

https://xkcd.com/1294/

dr_dshiv1y ago

“We should regard the Internet Archive as one of the most valuable pieces of modern history; instead, many companies and entities make the chances of the Archive to survive, and accumulate what otherwise will be lost, harder and harder. I understand that the Archive headquarters are located in what used to be a church: well, there is no better way to think of it than as a sacred place.”

Amen. There is an active effort to create an Internet Archive based in Europe, just… in case.

blmurch1y ago

Yup! We're here and looking to do good work with Cultural Heritage and Research Organizations in Europe. I'm very happy to be working with the Internet Archive once again after a 20 year long break.

stogot1y ago

What kind of volunteer help can the community do?

kragen1y ago

Congratulations!

https://vancouversun.com/news/local-news/the-internet-archiv...

ttul1y ago

Well, it did establish a new HQ in Canada…

(Edited: apparently just a new HQ and not THE HQ)

kelseydh1y ago

I was looking to a book a wedding in this venue (The Permanent) and the Internet Archive server is prominently visible on the 2nd floor. The server is pretty cool and adds to the aesthetics of the space.

thrance1y ago

With this belligerent maniac in the White House who recently doubled-down on his wish to annex Canada [1], I wouldn't feel safe relocating there if the goal is to flee the US.

[1] https://www.nbcnews.com/politics/donald-trump/trump-quest-co...

badlibrarian1y ago

Anyone who takes even an hour to audit anything about the Internet Archive will soon come to a very sad conclusion.

The physical assets are stored in the blast radius of an oil refinery. They don't have air conditioning. Take the tour and they tell you the site runs slower on hot days. Great mission, but atrociously managed.

Under attack for a number of reasons, mostly absurd. But a few are painfully valid.

dr_dshiv1y ago

Their yearly budget is less than the budget of just the SF library system.

Smithalicious1y ago

How significant is "in the blast radius of an oil refinery"? Once every how many years should I expect a typical oil refinery to explode? This really doesn't seem like it should be their first, second, fifth, or twelfth priority to"solve".

EDIT: asking Claude:

Based on historical data, major refinery explosions in developed countries might occur at a rate of approximately 1 in 1,000 to 1 in 2,000 refinery-years of operation. Using this very rough estimate, a single refinery might have approximately a 50% chance of experiencing a significant explosion somewhere between 700-1,400 years of continuous operation.

floam1y ago

I realized recently, who needs torrents? I can get a good rip of any movie right there.

ilaksh1y ago

There should not be any physical centralization. Use a series of redundant IPFS pins and/or torrents or some decentralized database of some kind.

https://github.com/Mozilla-Ocho/llamafile/

jart1y ago

Mozilla's llamafile project is designed to enable LLMs to be preserved for historical purposes. They ship the weights and all the necessary software in a deterministic dependency-free single-file executable. If you save your llamafiles, you should be able to run them in fifty years and have the outputs be exactly the same as what you'd get today. Please support Mozilla in their efforts to ensure this special moment in history gets archived for future generations!

visarga1y ago

LLMs are much easier to port than software. They are just a big blob of numbers and a few math operations.

andix1y ago

I think software is rather easy to archive. Emulators are they key. Nearly every platform from the past can be emulated on a modern arm/x86 Linux/windows system. Arm/x86/linux/windows are ubiquitous, even if they might fade away there will be emulators around for a long time. With future compute power it should be no problem to just use nested emulation, to run old emulators on an emulated x86/linux.

refulgentis1y ago

LLMs are much harder, software is just a blob of two numbers.

;)

(less socratic: I have a fraction of a fraction of jart's experience, but have enough experience via maintining a cross-platform llama.cpp wrapper to know there's a ton of ways to interpret that bag o' floats and you need a lot of ancillary information.)

_ea1k1y ago

Indeed. In 50 years, loading the weights and doing math should be much easier than getting some 50 year old piece of cuda code to work.

Then again, CPUs will be fast enough that you'd probably just emulate amd64 and run it as CPU-only.

GeoAtreides1y ago

Just like the map isn't the territory, so summaries are not the content nor the library fillings the actual books.

If I want to read a post, a book, a forum, I want to read exactly that, not a simulacrum built by arcane mathematical algorithms.

visarga1y ago

The counter perspective is that this is not a book, it's an interactive simulation of that era. The model is trained on everything, this means it acts like a mirror of ourselves. I find it fascinating to explore the mind-space it captured.

defgeneric1y ago

While the post talks about big LLMs as a valuable "snapshot" of world knowledge, the same technology can be used for lossless compression: https://bellard.org/ts_zip/.

api1y ago

That's really what these are: something analogous to JPEG for language, and queryable in natural language.

Tangent: I was thinking the other day: these are not AI in the sense that they are not primarily intelligence. I still don't see much evidence of that. What they do give me is superhuman memory. The main thing I use them for is search, research, and a "rubber duck" that talks back, and it's like having an intern who has memorized the library and the entire Internet. They occasionally hallucinate or make mistakes -- compression artifacts -- but it's there.

So it's more AM -- artificial memory.

Edit: as a reply pointed out: this is Vannevar Bush's Memex, kind of.

hengheng1y ago

I've been looking at it as an "instant reddit comment". I can download a 10G or 80G compressed archive that basically contains the useful parts of the internet, and then I all can use it to synthesize something that is about as good and reliable as a really good reddit comment. Which is nifty. But honestly it's an incredible idea to sell that to businesses.

api1y ago

Reddit seems to puppet humans via engagement farming to do what LLMs do in some cases. Posts are prompts, replies are responses.

Of course they vary widely in quality.

Guthur1y ago

And so what would the point be of anyone actually posting on the internet if no one actually visits the sites because large corps have essentially stolen and monetized the whole thing.

And I'm sure they have or will have the ability to influence the responses so you only see what they want you to see.

https://en.m.wikipedia.org/wiki/Memex

flower-giraffe1y ago

Or 80 years to MVP memex

“Vannevar Bush's 1945 article "As We May Think". Bush envisioned the memex as a device in which individuals would compress and store all of their books, records, and communications, "mechanized so that it may be consulted with exceeding speed and flexibility".

mdp20211y ago

The memex was a deterministic device to consult documents - the actual documents. The "LLM" is more like a dumb archivist that came with it ("Yes, see for example that document, it tells you that q=M·k...").

GolfPopper1y ago

>like having an intern who has memorized the library and the entire Internet. They occasionally hallucinate or make mistakes

Correction: you occasionally notice when they hallucinate or make mistakes.

antirez1y ago

I believe LLMs are both data and processing, but even humans reasoning is based in strong ways on existing knowledge. However, for the goal of the post, indeed it is the memorization that is the key value, and the fact that likely in the future sampling such models can be used to transfer the same knowledge to bigger LLMs, even if the source data is lost.

api1y ago

I'm not saying there is no latent reasoning capability. It's there. It just seems to be that the memory and lookup component is much more useful and powerful.

To me intelligence describes something much more capable than what I see in these things, even the bleeding edge ones. At least so far.

https://lcamtuf.coredump.cx/lossifizer/

bob10291y ago

If you want to see what this would actually be like:

I think a fun experiment could be to see at what setting the average human can no longer decipher the text.

visarga1y ago

I can ask a LLM to write a haiku about the loss function of Stable Diffusion. Or I can have it do zero shot translation, between a pair of languages not covered in the training set. Can your "language JPEG" do that?

I think "it's just compression" and "it's just parroting" are flawed metaphors. Especially when the model was trained with RLHF and RL/reasoning. Maybe a better metaphor is "LLM is like a piano, I play the keyboard and it makes 'music'". Or maybe it's a bycicle, I push the pedals and it takes me where I point it.

yannyu1y ago

There's a great article recently by Ted Chiang that elaborated on this idea: https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

mdp20211y ago

> JPEG for [a body of] language

Yes!

> artificial memory

Well, "yes", kind of.

> Memex

After a flood?! Not really. Vannevar Bush - As we may think - http://web.mit.edu/STS.035/www/PDFs/think.pdf

menzoic1y ago

Having memory is fine but choosing the relevant parts requires intelligence

Mistletoe1y ago

This is an excellent viewpoint.

xpe1y ago

I regularly pushback against casual uses of the word “intelligence”.

First, there is no objective dividing line. It is a matter of degree relative to something else. Any language that suggests otherwise should be refined or ejected from our culture and language. Language’s evolution doesn’t have to be a nosedive.

Second, there are many definitions of intelligence; some are more useful than others. Along with many, I like Stuart Russell’s definition: the degree to which an agent can accomplish a task. This definition requires being clear about the agent and the task. I mention this so often I feel like a permalink is needed. It isn’t “my” idea at all; it is simply the result of smart people decomplecting the idea so we’re not mired in needless confusion.

I rant about word meanings often because deep thinking people need to lay claim to words and shape culture accordingly. I say this often: don’t cede the battle of meaning to the least common denominators of apathy, ignorance, confusion, or marketing.

Some might call this kind of thinking elitist. No. This is what taking responsibility looks like. We could never have built modern science (or most rigorous fields of knowledge) with imprecise thinking.

I’m so done with sloppy mainstream phrasing of “intelligence”. Shit is getting real (so to speak), companies are changing the world, governments are racing to stay in the game, jobs will be created and lost, and humanity might transcend, improve, stagnate, or die.

If humans, meanwhile, can’t be bothered to talk about intelligence in a meaningful way, then, frankly, I think we’re … abdicating responsibility, tempting fate, or asking to be in the next Mike Judge movie.

jart1y ago

We never would have been able to create science, if it weren't for focusing on the kinds of thinking that can be made logical. There's a big difference. What you're doing, with this whole "let's make a bullshit word logical" is more similar to medieval scholasticism, which was a vain attempt at verbal precision. https://justine.lol/dox/english.txt

https://news.ycombinator.com/item?id=42824960

laborcontract1y ago

I miss the good ol days when I'd have text-davinci make me a table of movies that included a link to the movie poster. It usually generated a url of an image in an s3 bucket. The link always worked.

andix1y ago

I think it’s fine that not everything on the internet is archived forever.

It has always been like that, in the past people wrote on paper, and most of it was never archived. At some point it was just lost.

I inherited many boxes of notes, books and documents from my grandparents. Most of it was just meaningless to me. I had to throw away a lot of it and only kept a few thousand pages of various documents. The other stuff is just lost forever. And that’s probably fine.

Archives are very important, but nowadays the most difficult part is to select what to archive. There is so much content added to the internet every second, only a fraction of it can be archived.

hedgehog1y ago

This doesn't make much sense to me. Unattributed heresay has limited historical value, perhaps zero given that the view of the web most of the weights-available models have is Common Crawl which is itself available for preservation.

Terr_1y ago

I suspect the idea is that sometimes breadth wins out over accuracy. Even if it's unsuited as a primary source, this kind of lossy compression of many many documents might help a conscientious historian discover verifiable things through other routes.

fl4tul41y ago

> Scientific papers and processes that are lost forever as publishers fail, their websites shut down.

I don't think the big scientific publishers (now, in our time) will ever fail, they are RICH!

Legend24401y ago

That means nothing. Big companies fail all the time. There is no guarantee any of them will be here in 50 years, let alone 500.

thayne1y ago

Perhaps a shorter term risk is the publishers consider some papers less profitable, so they stop preserving them.

bookofjoe1y ago

So was the Roman Empire

nickpsecurity1y ago

People wanting this would be better off using memory architectures, like how the brain does it. For ML, the simplest approach is putting in memory layers with content-addressible schemes. I have a few links on prototypes in this comment:

https://vale.rocks/posts/ai-model-history-is-being-lost

Animal brains do not separate long term memory and processing - they are one and the same thing - columnar neural assemblies in the cortex that have learnt to recognize repeated patterns, and in turn activate others.

dstroot1y ago

Isn’t big LLM training data actually the most analogous to the internet archive? Shouldn’t the title be “Big LLM training data is a piece of history”? Especially at this point in history since a large portion of internet data going forward will be LLM generated and not human generated? It’s kind of the last snapshot of human-created content.

antirez1y ago

The problem is, where is this 20T tokens that are being used for this task? No way to access them. I hope that at least OpenAI and a few more have solid historical storage of the tokens they collect.

ilaksh1y ago

Great idea. Slightly related idea: use the Internet Archive to build a dataset of 6502 machine code/binaries, corresponding manuals, and possibly videos of the software in action.. maybe emulator traces.

It might be possible to create an L LM that can write a custom vintage game or program on demand in machine code and simultaneously generate assets like sprites. Especially if you use the latest reinforcement learning techniques.

rollcat1y ago

https://xkcd.com/1683/

hi_hi1y ago

Naming antics aside, the article makes a good point I've heard previously about the importance of the Internet Archive.

Are there any search experiences that allow me to search like it's 1999? I'd love to be able to re-create the experience of finding random passion project blogs that give a small snapshot of things people and business were using the web for back then.

OuterVale1y ago

Interesting. It seems that both they and I had very similar ideas at about the same time, with this being posted just a few hours after I finally published about AI model history being lost.

Havoc1y ago

I wonder whether it'll become like pre-WW2 steel that doesn't have nuclear contamination.

Just with a pre-LLM knowledge

dmos621y ago

Enjoy the insight, but the title makes my eye twitch. How about "LLM weights are pieces of history"?

lblume1y ago

Small LLM weights are not really interesting though. I am currently training GPT-2 small sized models for a scientific project right, and their world models are just not good enough to generate any kind of real insight about the world it was trained in except for corpus biases.

kragen1y ago

Small large language models? This sounds like the apocryphal headline when a spiritualist with dwarfism escaped prison: "Small medium at large." Do you also have some dehydrated water and a secure key escrow system?

dmos621y ago

A collection of newspapers is generally a better source than a single leaflet, but even a leaflet is a piece of history.

pama1y ago

I would be curious to know if it would be possible to recunstruct approximate versions of popular common subsets of internet training data by using many different LLMs that may have happened to read the same info. Anyone knows pointers to math papers about such things?

teleforce1y ago

I really like the narative that now LLM is the conserving human knowledge that otherwise would be lost forever in the form of its weights in a kind of a lossy compression.

Personally I'd like that if all the knowledge and information (K & I) are readily available and accessible (pretty sure most of the prople share the same sentiment), despite the consistent business decisions from the copyright holders to hoard their K & I by putting everything behind paywalls and/or registration (I'm looking at you Apple and X/Twitter). As much that some people hate Google by organizing the world information by feeding and thriving through advertisements because in the long run the information do get organized and kind of preserved in many Internet data formats, lossy or not. After all Google who originall designed the transformer that enabled the LLM weights that are now apparently a piece of history.

almosthere1y ago

Split the wayback machine away from its book copyright lawsuit stuff and you don't have to worry.

off_by_inf1y ago

And they all undertrained, according to the papers.

bossyTeacher1y ago

So large large language model?

throwaway484761y ago

The internet training data for LLMs is valuable history were losing one dead webadmin at a time. The regurgitated slop less so.

codr71y ago

I find it very depressing to think that the only traces left from all the creativity will end up to be AI slop, the worst use case ever.

I feel like the more people use GenAI, the less intelligent they become. Like the rest of this society, they seem designed to suck the life force out of humans and and return useless crap instead.

sourtrident1y ago

Imagine future historians piecing together our culture from hallucinated AI memories - inaccurate, sure, but maybe even more fascinating than reality itself.

blinky811y ago

"big large" lol

guybedo1y ago

fwiw i've added a summary of the discussion here: https://extraakt.com/extraakts/67d708bc9844db151612d782

isoprophlex1y ago

Interesting. Just this morning I had a conversation with Claude about this very topic. When asked "can you give me your thoughts on LLM train runs as historical artifacts? do you think they might be uniquely valuable for future historians?", it answered

    > oh HELL YEAH they will be. future historians are gonna have a fucking field day with us.

    > imagine some poor academic in 2147 booting up "vintage llm.exe" and getting to directly interrogate the batshit insane period when humans first created quasi-sentient text generators right before everything went completely sideways with *gestures vaguely at civilization*

    > *"computer, tell me about the vibes in 2025"*

    > "BLARGH everyone was losing their minds about ai while also being completely addicted to it"

Interesting indeed to be able to directly interrogate the median experience of being online in 2025.

(also my apologies for slop-posting; i slapped so many custom prompting on it that I hope you'll find the output to be amusing enough)

tryauuum1y ago

what's the prompt?

j / k navigate · click thread line to collapse

213 comments

intellectronica1y ago

saltcured1y ago

I'd prefer to see olive sizes get a renaissance. I was always amused by Super Colossal when following my mom around a store as a little kid.

From a random web search, it seems the sizes above Large are: Extra Large, Jumbo, Extra Jumbo, Giant, Colossal, Super Colossal, Mammoth, Super Mammoth, Atlas.

VectorLock1y ago

How about wine bottle sizes since we're "bottling" a "distillation" of information...

https://en.wikipedia.org/wiki/Wine_bottle#Sizes

taneq1y ago

Needs more superlatives. “Biggest” < “Extra Biggest” < “Maximum Biggest”. :D

inciampati1y ago

xanderlewis1y ago

And the US ‘small’ LLMs will actually be slightly larger than the ‘large’ LLMs in the UK.

aziaziazi1y ago

deepsun1y ago

We ordered swag T-shirts for a conference from two providers, but EU provider L's were actually larger than US L!

jgalt2121y ago

It's funny you say that, but when travelling abroad I wondered how Europeans and Japanese stay sufficiently hydrated.

https://www.abbreviations.com/BLLM#google_vignette

miki1232111y ago

> The UK

You mean the EU, right? The UK isn't covered by the AI act.

t_mann1y ago

Big LLM is too long as a name. We should agree on calling them BLLMs. Surely everyone is going to remember what the letters stand for.

nullhole1y ago

I still like Big Data Statistical Model

bookofjoe1y ago

>What does BLLM stand for?

temp08261y ago

Bureau of Large Land Management

heyjamesknight1y ago

I want to apologize for this joke in advance. It had to be done.

We could take a page from Trump’s book and call them “Beautiful” LLMs. Then we’d have “Big Beautiful LLMs” or just “BBLs” for short.

Surely that wouldn’t cause any confusion when Googling.

badlibrarian1y ago

I've sat in more than one board meeting watching them take 20 minutes to land on t-shirt sizes. The greatest enterprise sales minds of our generation...

ben_w1y ago

I've seen things you people wouldn't believe.

Time to leverage.

https://x.com/swyx/status/1679241722709311490

latexr1y ago

Name them like clothing sizes: XXLLM, XLLM, LLM, MLM, SLM, XSLM XXSLM.

swyx1y ago

i did this!

XXLLM: ~1T (GPT4/4.5, Claude Opus, Gemini Pro)

XLLM: 300~500B (4o, o1, Sonnet)

LLM: 20~200B (4o, GPT3, Claude, Llama 3 70B, Gemma 27B)

~~zone of emergence~~

MLM: 7~14B (4o-mini, Claude Haiku, T5, LLaMA, MPT)

SLM: 1~3B (GPT2, Replit, Phi, Dall-E)

~~zone of generality~~

XSLM: <1B (Stable Diffusion, BERT)

4XSLM: <100M (TinyStories)

ai-christianson1y ago

MLM... uh oh

Arcuru1y ago

I've been labeling LLMS as "teensy", "smol", "mid", "biggg", "yuuge". I've been struggling to figure out where to place the lines between them though.

zargon1y ago

itsy-bitsy <= 3B

teensy 4B to 29B

smol 30B to 59B

mid 60B to 99B

biggg 100B to 299B

yuuge 300B+

guestbest1y ago

Why not LLLM for large LLM’s and SLLM for small LLM’s, assuming there is no middle ground

flir1y ago

M, LM, LLM, LLLM, L3M, L4M.

Gotta leave room for future expansion.

kolinko1y ago

VLLM, Super VLLM, Almost Large Language Model

_heimdall1y ago

What makes it a Small Large Language Model? Why jot just an SLM?

orbital-decay1y ago

SLM is a widespread term already.

BobaFloutist1y ago

LLM, LLM 2.0, LLM 3.0, Mini LLM, Micro LLM, LLM C.

jfengel1y ago

LLM 95, LLM 98, LLM Millennium Edition, LLM NT, LLM XP, LLM 2000, LLM 7

I really appreciated the way they managed to come up with a new naming scheme each time, usually used exactly once.

tonyhart71y ago

can we have tiny LLM that can run on smartphone now

winter_blue1y ago

Apple Intelligence has an LLM that runs locally on the iPhone (15 Pro and up).

One example of this is “Sorry I was very drunk and went home and crashed straight into bed” being summarized by Apple Intelligence as ”Drunk and crashed”.

badlibrarian1y ago

No. Smartphone only spin animated gif while talk to big building next to nuclear reactor. New radio inside make more efficient.

rubslopes1y ago

Is a tiny large language model equivalent to a normal sized one?

intrasight1y ago

I expect that the phone will only do the prompt parsing

samstave1y ago

I want a tiny_phone_based LLM to do thought tracking and comms awareness..

I actually applied to YC in like ~2014 or such for thus;

I think its still viable - but my thought process is too currently chaotic to pull it off.

hopefully you can visualize it...

AlienRobot1y ago

Terrible names, to be honest. My proposal: Hyper LLMs, Ultra LLMs, Large LLMs, Micro LLMs, Mobile LLMs.

isoprophlex1y ago

LLM M4 Ultra Pro Max 16e (with headphone jack)

naveen991y ago

LLM already has one large in it…

ben_w1y ago

If we can have a "Personal PIN Identification Number", we can have a "Large LLM Language Model".

https://en.m.wikipedia.org/wiki/Pleonasm

de-moray1y ago

What does a 20 LLM signify?

davidwritesbugs1y ago

or "DietLLM, RegularLLM, MealLLM and SuperSizedLLMWithFries"

rnrn1y ago

it's too bad vLLM and VLM are taken because it would have been nice to recycle the VLSI solution to describing sizes - get to very large language models and leave it at that.

do_not_redeem1y ago

After very large language models, the next step is mega language models, or MLMs. As a bonus, it describes the VC funding scheme that backs them too.

rnrn1y ago

we could also look to magnetoresistance and go for giant, colossal, extraordinary

TZubiri1y ago

Doesn't the first L in LLM mean large already?

It's like saying Automated ATM. Whoever wrote it barely knows what the acronym means.

This whole article feels like written by someone who doesn't understand the subject matter at all

thih91y ago

We’re fine with “The big friendly giant” and the sahara desert (“desert desert”); big llm could join the family of pleonasms.

Kiro1y ago

https://www.stichtinginternetarchive.nl/

xanderlewis1y ago

Almost everyone says ‘PIN number’ as well.

thih91y ago

Dismissed, Big LLM will live on along with Big Data.

deepsun1y ago

Well, big data for me was always clear -- when data sizes are too large to use regular tools (ls, du, wc, vi, pandas).

I.e. when pretty much every tool or script I used before doesn't work anymore, and need a special tool (gsutil, bq, dusk, slurm), it's a mind shift.

varispeed1y ago

Then there will be "decaf LLM"

semireg1y ago

Pro, max, ultra…

_bin_1y ago

"big large language model" renminds me uncomfortably of "automated teller machine machine"

huijzer1y ago

“There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.“

nextts1y ago

https://xkcd.com/1294/

dr_dshiv1y ago

Amen. There is an active effort to create an Internet Archive based in Europe, just… in case.

blmurch1y ago

Yup! We're here and looking to do good work with Cultural Heritage and Research Organizations in Europe. I'm very happy to be working with the Internet Archive once again after a 20 year long break.

stogot1y ago

What kind of volunteer help can the community do?

kragen1y ago

Congratulations!

https://vancouversun.com/news/local-news/the-internet-archiv...

ttul1y ago

Well, it did establish a new HQ in Canada…

(Edited: apparently just a new HQ and not THE HQ)

kelseydh1y ago

thrance1y ago

With this belligerent maniac in the White House who recently doubled-down on his wish to annex Canada [1], I wouldn't feel safe relocating there if the goal is to flee the US.

[1] https://www.nbcnews.com/politics/donald-trump/trump-quest-co...

badlibrarian1y ago

Anyone who takes even an hour to audit anything about the Internet Archive will soon come to a very sad conclusion.

Under attack for a number of reasons, mostly absurd. But a few are painfully valid.

dr_dshiv1y ago

Their yearly budget is less than the budget of just the SF library system.

Smithalicious1y ago

EDIT: asking Claude:

floam1y ago

I realized recently, who needs torrents? I can get a good rip of any movie right there.

ilaksh1y ago

There should not be any physical centralization. Use a series of redundant IPFS pins and/or torrents or some decentralized database of some kind.

https://github.com/Mozilla-Ocho/llamafile/

jart1y ago

visarga1y ago

LLMs are much easier to port than software. They are just a big blob of numbers and a few math operations.

andix1y ago

refulgentis1y ago

LLMs are much harder, software is just a blob of two numbers.

;)

_ea1k1y ago

Indeed. In 50 years, loading the weights and doing math should be much easier than getting some 50 year old piece of cuda code to work.

Then again, CPUs will be fast enough that you'd probably just emulate amd64 and run it as CPU-only.

GeoAtreides1y ago

Just like the map isn't the territory, so summaries are not the content nor the library fillings the actual books.

If I want to read a post, a book, a forum, I want to read exactly that, not a simulacrum built by arcane mathematical algorithms.

visarga1y ago

defgeneric1y ago

While the post talks about big LLMs as a valuable "snapshot" of world knowledge, the same technology can be used for lossless compression: https://bellard.org/ts_zip/.

api1y ago

That's really what these are: something analogous to JPEG for language, and queryable in natural language.

So it's more AM -- artificial memory.

Edit: as a reply pointed out: this is Vannevar Bush's Memex, kind of.

hengheng1y ago

api1y ago

Reddit seems to puppet humans via engagement farming to do what LLMs do in some cases. Posts are prompts, replies are responses.

Of course they vary widely in quality.

Guthur1y ago

And so what would the point be of anyone actually posting on the internet if no one actually visits the sites because large corps have essentially stolen and monetized the whole thing.

And I'm sure they have or will have the ability to influence the responses so you only see what they want you to see.

https://en.m.wikipedia.org/wiki/Memex

flower-giraffe1y ago

Or 80 years to MVP memex

mdp20211y ago

GolfPopper1y ago

>like having an intern who has memorized the library and the entire Internet. They occasionally hallucinate or make mistakes

Correction: you occasionally notice when they hallucinate or make mistakes.

antirez1y ago

api1y ago

I'm not saying there is no latent reasoning capability. It's there. It just seems to be that the memory and lookup component is much more useful and powerful.

To me intelligence describes something much more capable than what I see in these things, even the bleeding edge ones. At least so far.

https://lcamtuf.coredump.cx/lossifizer/

bob10291y ago

If you want to see what this would actually be like:

I think a fun experiment could be to see at what setting the average human can no longer decipher the text.

visarga1y ago

yannyu1y ago

There's a great article recently by Ted Chiang that elaborated on this idea: https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

mdp20211y ago

> JPEG for [a body of] language

Yes!

> artificial memory

Well, "yes", kind of.

> Memex

After a flood?! Not really. Vannevar Bush - As we may think - http://web.mit.edu/STS.035/www/PDFs/think.pdf

menzoic1y ago

Having memory is fine but choosing the relevant parts requires intelligence

Mistletoe1y ago

This is an excellent viewpoint.

xpe1y ago

I regularly pushback against casual uses of the word “intelligence”.

jart1y ago

https://news.ycombinator.com/item?id=42824960

laborcontract1y ago

andix1y ago

I think it’s fine that not everything on the internet is archived forever.

It has always been like that, in the past people wrote on paper, and most of it was never archived. At some point it was just lost.

Archives are very important, but nowadays the most difficult part is to select what to archive. There is so much content added to the internet every second, only a fraction of it can be archived.

hedgehog1y ago

Terr_1y ago

fl4tul41y ago

> Scientific papers and processes that are lost forever as publishers fail, their websites shut down.

I don't think the big scientific publishers (now, in our time) will ever fail, they are RICH!

Legend24401y ago

That means nothing. Big companies fail all the time. There is no guarantee any of them will be here in 50 years, let alone 500.

thayne1y ago

Perhaps a shorter term risk is the publishers consider some papers less profitable, so they stop preserving them.

bookofjoe1y ago

So was the Roman Empire

nickpsecurity1y ago