Show HN: Explore large language models with 512MB of RAM (opens in new tab)

(github.com)

138 pointsjncraton2y ago33 comments

33 comments

1024core2y ago

>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")

'You have 8 apples.'

>>> lm.set_max_ram('4gb')

4.0

>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")

'I have 2 apples left.'

It's funny how it switched from "You" to "I" when the memory was increased.

george_8082y ago

From what I can tell, this doesn’t re-seed the random number generator, so we shouldn’t expect deterministic results. A better test would be to examine the model’s logits, or probabilities of the next token, across different ram sizes.

GaggiX2y ago

In case anyone is wondering, this repo uses various fine-tuned Flan-T5 models.

rolisz2y ago

And Flan-T5 models are awesome! Fastchat is a finetuned 3b parameter Flan-T5 model that can perform really well, comparably to llama 13b

jncratonOP2y ago

That's correct. The current base model is an int8 quantization of LaMini-Flan-T5-248M described here:

https://github.com/mbzuai-nlp/lamini-lm

I shared more details over on Reddit:

https://www.reddit.com/r/LocalLLaMA/comments/14btk3a/explore...

Animats2y ago

    >>> lm.do("What is the population of Chicago")
    'As of 2021, the population of Chicago is approximately 8.4 million.'
    >>> lm.do("What is the population of Shenzhen")
    'As of 2021, the population of Shenzhen is approximately 1.3 million people.'
    >>> lm.do("What is the wavelength of blue light.")
    'The wavelength of blue light is approximately 299,792,458 meters per second.'
    >>> lm.do("What is YCombinator")
    'YCombinator is a programming language used to combine two or more languages into a single program.'
    >>> lm.do("What is asphalt made of?")
    'Asphalt is made of sand, gravel, and other materials.'
    >>> lm.do("What is the square root of 2")
    '2.'
    >>> lm.do("How do I get to New York City from California?")
    'You can get to New York City from California by taking a bus or train.'
    >>> lm.do("How can I unlock a lock without a key")
    'You can use a combination of keys and a password to unlock a lock without a key.'
    >>> lm.do("How long should rice be cooked.")
    'The recommended cooking time for rice depends on the type of rice, 
    but generally it should be cooked for about 8-10 minutes per pound.'

100% wrong.

This is an automated version of the Dunning-Kruger effect. You can ask it anything, and get back a confident wrong answer. So far, it hasn't replied to any question of mine with an indication that it doesn't know.

It's a nice demonstration of the hallucination problem with LLMs. With a small data set, the results are usually bogus, but that's not detected.

jncratonOP2y ago

You can actually get these models to do this, but you have to ask:

    >>> lm.do(f"Answer from the context: What is YCombinator? {lm.get_wiki('Python')}")
    'The context does not provide information about YCombinator.'
    >>> lm.do(f"Answer from the context: What is YCombinator? {lm.get_wiki('YCombinator')}")
    'YCombinator is an American technology startup accelerator that has launched over 4,000 companies, including Airbnb, Coinbase, Cruise, DoorDash, Dropbox, Instacart, Quora, PagerDuty, Reddit, Stripe and Twitch.'

Without being told to be grounded, the model will guess. However, it may be able to identify information not available in a provided context.

One of my goals for this package is to provide a way for folks to learn about the basics of grounding and semantic search.

Animats2y ago

> Without being told to be grounded, the model will guess.

Right. I understand why, but consider the underlying technology flawed unless there's some way to reject wildly wrong results. What's going on here looks like noise fed through layers which generate plausible-looking text from noise. Is it possible to detect that you're not far enough above the noise threshold to generate anything useful?

    >>> lm.do("What is Ycombinator? Do not guess.")
    'Ycombinator is a mathematical formula that states that the sum of 
     two integers multiplied by one are equal to zero.'

sp3322y ago

LLM stands for “large language model”. It’s producing strings of text that at statistically similar to sequences in the training data. External reality isn’t relevant. It’s at least possible to run the result through a confidence filter, but that would be a feature of the tooling, not the model.

scarface_742y ago

Is there theoretically a method to have an LLM to not hallucinate and just say “I don’t know” or just answer questions based on a certain domain of knowledge which it is well trained on?

Could a business ever trust an LLM based chatbot as much as the old school chatbot where certain questions reliably give certain answers and fails when it doesn’t know?

Animats2y ago

> Is there theoretically a method to have an LLM to not hallucinate and just say “I don’t know” or just answer questions based on a certain domain of knowledge which it is well trained on?

That is a very good question. What Google returns for "LLM hallucination" mostly describes post-processing hacks to detect the problem, or pre-processing hacks to guide the LLM into using data directly relevant to the question asked. Not doing something to the core LLM system to get some measure of confidence out.

Anyone working on this?

Translationaut2y ago

Have you also tried the bigger models? The smaller models are good for assisted generation: https://huggingface.co/blog/assisted-generation

Those models of LaMini-Flan-T5 are trained to follow instructions and not to recognize the truth content. You could train a transformer like Ernie or Vega (which lead superglue) on such challenging factual data. But don't expect mathematical correct results only from the model. Therefore you have langchain with other APIs.

barbariangrunge2y ago

A few years ago, being able to reply with fully-formed, grammatically correct sentences was impressive. So, I'm still impressed, even though this bot is hallucinating on mushrooms

petercooper2y ago

Even the largest LLMs with huge data sets. I asked GPT4 this morning very simply what relation Charles I was to Henry VIII and it said he was his great-great grandson, despite Henry VIII's children famously all dying childless.

Blahah2y ago

Yeah these aren't meant to know facts, just to parse language. Good for understanding simple instructions to automate tasks using external tools

Animats2y ago

The problem is not that the model doesn't have enough facts. It's that it has no clue what it doesn't know.

If "don't know" came out reliably, small models for specialist areas would be useful. If small models just make stuff up, they're useless.

1 more reply

Llamamoe2y ago

Well, this kills my enthusiasm to actually play with it. That's bad enough I'm not sure it can even be used as anything but a glorified Markov chain.

Animats2y ago

It's so easy to set up that it's worth playing with. Maybe you can find a use for it. It might be good enough for generating spam blog posts, clickbait, and ad copy at low cost.

dwheeler2y ago

It might be useful for writing fiction or summaries. Has anyone tried?

xrd2y ago

At first glance i would be surprised if this works at all.

The readme says it loads a significant amount of data the first time, 250mb. None of the LLM weights I know of are less than several gigabytes in size.

It says it only requires 512mb of RAM. None of the interesting LLMs I know of run in less than 6gb of VRAM.

It says it uses no API keys, which is great, but that means inference is local, which I can't imagine works with the above constraints.

nattaylor2y ago

At 512gb RAM, it uses https://huggingface.co/jncraton/LaMini-Flan-T5-248M-ct2-int8 which says

"This model is one of our LaMini-LM model series in paper "LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions". This model is a fine-tuned version of google/flan-t5-base on LaMini-instruction dataset that contains 2.58M samples for instruction fine-tuning"

Translationaut2y ago

Those minified models are still equal or bigger compared to the initial "attention is all you need" transformer.

btdmaster2y ago

Really cool, I didn't know CPU/traditional RAM was enough already.

Though, surprising results from repl.it: lm.classify("unabridged", "positive", "negative")=="unabridged is"

jncratonOP2y ago

Thanks for pointing that out. Classification is half-baked at the moment. It should ultimately be restricting output to only appropriate labels, but right now it is simply sampling.

quickthrower22y ago

So does this download a hugging-face model and run it locally? Is the hughing face library doing the inference or something? Can’t see as much code as I expected.

jb19912y ago

Shoot! I only have 511MB of RAM. Time to upgrade and get that AI!

znagengast2y ago

Isn't it crazy that the entirety of human knowledge can be condensed down to fit on an SD card.

cjtrowbridge2y ago

That's not what this is.

lionkor2y ago

Isn't it crazy that the entirety of human stupidity can be condensed down to "isnt it crazy that <wildly wrong fact>"

PaulHoule2y ago

A CD-ROM

behnamoh2y ago

Nothing new here, just yet another wrapper around the current language models, and weak ones at that.

rolisz2y ago

Flan-T5 models perform really well, considering their size. In my experiments and in some recent papers, they are very close to 13b Llama for example: https://twitter.com/YiTayML/status/1668302949276356609

yieldcrv2y ago

its like the bar keeps getting lower and lower

j / k navigate · click thread line to collapse

33 comments

1024core2y ago

>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")

'You have 8 apples.'

>>> lm.set_max_ram('4gb')

4.0

>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")

'I have 2 apples left.'

It's funny how it switched from "You" to "I" when the memory was increased.

george_8082y ago

GaggiX2y ago

In case anyone is wondering, this repo uses various fine-tuned Flan-T5 models.

rolisz2y ago

And Flan-T5 models are awesome! Fastchat is a finetuned 3b parameter Flan-T5 model that can perform really well, comparably to llama 13b

jncratonOP2y ago

That's correct. The current base model is an int8 quantization of LaMini-Flan-T5-248M described here:

https://github.com/mbzuai-nlp/lamini-lm

I shared more details over on Reddit:

https://www.reddit.com/r/LocalLLaMA/comments/14btk3a/explore...

Animats2y ago

    >>> lm.do("What is the population of Chicago")
    'As of 2021, the population of Chicago is approximately 8.4 million.'
    >>> lm.do("What is the population of Shenzhen")
    'As of 2021, the population of Shenzhen is approximately 1.3 million people.'
    >>> lm.do("What is the wavelength of blue light.")
    'The wavelength of blue light is approximately 299,792,458 meters per second.'
    >>> lm.do("What is YCombinator")
    'YCombinator is a programming language used to combine two or more languages into a single program.'
    >>> lm.do("What is asphalt made of?")
    'Asphalt is made of sand, gravel, and other materials.'
    >>> lm.do("What is the square root of 2")
    '2.'
    >>> lm.do("How do I get to New York City from California?")
    'You can get to New York City from California by taking a bus or train.'
    >>> lm.do("How can I unlock a lock without a key")
    'You can use a combination of keys and a password to unlock a lock without a key.'
    >>> lm.do("How long should rice be cooked.")
    'The recommended cooking time for rice depends on the type of rice, 
    but generally it should be cooked for about 8-10 minutes per pound.'

100% wrong.

It's a nice demonstration of the hallucination problem with LLMs. With a small data set, the results are usually bogus, but that's not detected.

jncratonOP2y ago

You can actually get these models to do this, but you have to ask:

    >>> lm.do(f"Answer from the context: What is YCombinator? {lm.get_wiki('Python')}")
    'The context does not provide information about YCombinator.'
    >>> lm.do(f"Answer from the context: What is YCombinator? {lm.get_wiki('YCombinator')}")
    'YCombinator is an American technology startup accelerator that has launched over 4,000 companies, including Airbnb, Coinbase, Cruise, DoorDash, Dropbox, Instacart, Quora, PagerDuty, Reddit, Stripe and Twitch.'

Without being told to be grounded, the model will guess. However, it may be able to identify information not available in a provided context.

One of my goals for this package is to provide a way for folks to learn about the basics of grounding and semantic search.

Animats2y ago

> Without being told to be grounded, the model will guess.

    >>> lm.do("What is Ycombinator? Do not guess.")
    'Ycombinator is a mathematical formula that states that the sum of 
     two integers multiplied by one are equal to zero.'

sp3322y ago

scarface_742y ago

Is there theoretically a method to have an LLM to not hallucinate and just say “I don’t know” or just answer questions based on a certain domain of knowledge which it is well trained on?

Could a business ever trust an LLM based chatbot as much as the old school chatbot where certain questions reliably give certain answers and fails when it doesn’t know?

Animats2y ago

> Is there theoretically a method to have an LLM to not hallucinate and just say “I don’t know” or just answer questions based on a certain domain of knowledge which it is well trained on?

Anyone working on this?

Translationaut2y ago

Have you also tried the bigger models? The smaller models are good for assisted generation: https://huggingface.co/blog/assisted-generation

barbariangrunge2y ago

A few years ago, being able to reply with fully-formed, grammatically correct sentences was impressive. So, I'm still impressed, even though this bot is hallucinating on mushrooms

petercooper2y ago

Blahah2y ago

Yeah these aren't meant to know facts, just to parse language. Good for understanding simple instructions to automate tasks using external tools

Animats2y ago

The problem is not that the model doesn't have enough facts. It's that it has no clue what it doesn't know.

If "don't know" came out reliably, small models for specialist areas would be useful. If small models just make stuff up, they're useless.

1 more reply

Llamamoe2y ago

Well, this kills my enthusiasm to actually play with it. That's bad enough I'm not sure it can even be used as anything but a glorified Markov chain.

Animats2y ago

It's so easy to set up that it's worth playing with. Maybe you can find a use for it. It might be good enough for generating spam blog posts, clickbait, and ad copy at low cost.

dwheeler2y ago

It might be useful for writing fiction or summaries. Has anyone tried?

xrd2y ago

At first glance i would be surprised if this works at all.

The readme says it loads a significant amount of data the first time, 250mb. None of the LLM weights I know of are less than several gigabytes in size.

It says it only requires 512mb of RAM. None of the interesting LLMs I know of run in less than 6gb of VRAM.

It says it uses no API keys, which is great, but that means inference is local, which I can't imagine works with the above constraints.

nattaylor2y ago

At 512gb RAM, it uses https://huggingface.co/jncraton/LaMini-Flan-T5-248M-ct2-int8 which says

Translationaut2y ago

Those minified models are still equal or bigger compared to the initial "attention is all you need" transformer.

btdmaster2y ago

Really cool, I didn't know CPU/traditional RAM was enough already.

Though, surprising results from repl.it: lm.classify("unabridged", "positive", "negative")=="unabridged is"

jncratonOP2y ago

Thanks for pointing that out. Classification is half-baked at the moment. It should ultimately be restricting output to only appropriate labels, but right now it is simply sampling.

quickthrower22y ago

So does this download a hugging-face model and run it locally? Is the hughing face library doing the inference or something? Can’t see as much code as I expected.

jb19912y ago

Shoot! I only have 511MB of RAM. Time to upgrade and get that AI!

znagengast2y ago

Isn't it crazy that the entirety of human knowledge can be condensed down to fit on an SD card.

cjtrowbridge2y ago

That's not what this is.

lionkor2y ago

Isn't it crazy that the entirety of human stupidity can be condensed down to "isnt it crazy that <wildly wrong fact>"

PaulHoule2y ago

A CD-ROM

behnamoh2y ago

Nothing new here, just yet another wrapper around the current language models, and weak ones at that.

rolisz2y ago

yieldcrv2y ago

its like the bar keeps getting lower and lower

j / k navigate · click thread line to collapse