You are an expert in Web Scraping, so you are capable to find the information in HTML and label them accordingly. Please return the final result in JSON.
Data to scrape:
title: Name of the business
type: The business nature like Cafe, Coffee Shop, many others
phone: The phone number of the business
address: Address of the business, can be a state, country or a full address
years_in_business: Number of years since the business started
hours: Business operating hours
rating: Rating of the business
reviews: Number of reviews on the business
price: Typical spending on the business
description: Extra information that is not mentioned yet in any of the data
service_options: Array of shopping options from the business, for example, in store shopping, delivery and many others. It should be in format -> option_name: true
is_operating: Whether the business is operating
HTML:
{html}Lower end models do not have the attention to complete tasks like this, GPT4Turbo will generally have the capability. But to have an optimal pipeline you should really be splitting up these tasks into individual units. You extract each attribute you want independently and then combine it back together however you want. Also asking for JSON upfront is equally suboptimal in the whole process.
I have high confidence that I could accomplish this task using a lower end model with a high degree of accuracy.
Edit: I am not suggesting that an LLM is more optimal than what ever traditional parsing methods they may use, simply the way they are doing it is wrong from an LLM flow.
Cool, cool. I'm super interested. Please share the process and the results.
LLMs aren’t people even in a chat-roleplaying sense. They complete a “document” that can be a plot, a book, a protocol of conversation. The “AI” side in the chat isn’t an LLM itself, it’s a character (and so are you, it completes your “You: …” replies too - that’s where the driver app stops it and allows you to interfere). So everything you put in that header is very important. There are two places where you can do that: right in the chat, as in TFA, or in the “character card” (idk if GPTs have it, no GPT access for me). I found out that properly crafting a character card makes a huge difference and can resolve the whole classes of issues.
Idk what will work best in this case, but I’d start with describing which sort of a bot, how it deals with unclear or incomplete information, how amazing it is (yes, really), its soft/tech skills and problem solving abilities, what other people think of it, their experience and so on. Maybe would add few examples of interactions in a free form. Then in the task message I’d tell it more and specific details about that json.
One more note - at least for 8x7B, the “You are” in the chat is a much weaker instruction than a character card, even if the context is still empty. I low-key believe that’s because it’s a second-class prompt, i.e. the chat document starts with “This is a conversation with a helpful AI bot which yada yada” in… mind, and then in that chat that AI character gets asked to turn into something else, which poisons the setting.
Simply asking the default AI card represents 0.1% of what’s possible and doesn’t give the best results. Prompt Engineering is real.
I have high confidence that I could accomplish this task using a lower end model with a high degree of accuracy.
Same. I think that no matter how good a model is, this prompt just isn’t a professional task statement and leaves too much to decide. It’s a task that you, as a regular human, would hate to receive.
Answer: "I'm running on Toyota Corolla"
Which was perhaps the funniest thing I heard that day.
Also, don't parse HTML with regular expressions.
The problem I'm finding is that the time I wanted to save mantaining selectors and the like is time that I'm spending writing wrapper code and dealing with the mistakes it makes. Some are OK and can deal with them, others are pretty annoying because It's difficult to deal with them in a deterministic manner.
I've also tried with GPT-4 but it's way more expensive, and despite what this guy got, it also makes mistakes.
I don't really care about inference speed, but I do care about price and correctness.
In fact, what about a hybrid of what you're doing now? Initially, you use an LLM to generate examples. And then from those examples, you use that same LLM to write deterministic code?
That's understandable. The real problem is when the AI lies/hallucinates another answer with confidence instead of saying "I don't know".
Especially running on Groq's infrastructure it's blazing fast. Some examples i ran on Groq's API, the query was completed in 70ms. Groq has released API libraries for Python and Javascript, i wrote a simple Rust example here, of how to use the API [1].
Groq's API documents how long it takes to generate the tokens for each request. 70ms for a page of document, are well over 100 times faster than GPT, and the fastest of every other capable model. Accounting for internet's latency and some queue that might exist, then the user receives the request in a second, but how fast would this model run locally? Fast enough to generate natural language tokens, generate a synthetic voice, listen again and decode the next request the user might talk to it, all in real time.
With a technology like that, why not talk to internet services with just APIs and no web interface at all? Just functions exposed on the internet, take json as an input, validate it, and send the json back to the user? Or every other interface and button around. Why pressing buttons for every electric appliance, and not just talk to the machine using a json schema? Why should users on an internet forum, every time a comment is added, have to press the add comment button, instead of just talking and saying "post it"? Pretty annoying actually.
I am collecting these approaches and tools here: https://github.com/imaurer/awesome-llm-json
Also, Google SERP page is deterministic (always has the same structure for the same kind of queries), so it would probably be much more effective to use AI to write a parser, and then refine it and use that?
Scraping is quite complex by now (front-end JS, deep and irregular nesting, obfuscated html, …).
Have you ever had to scrape multiple sites with variadic html?
If you are scraping a limited amount of sites, you could for each site ask the LLM for parsing code from some samples, review that, and move on.
Impressive inference speed difference though
are you from a non-english country? Maybe its cultural?
Groq delivers this kind of speed by networking many, many chips together with high bandwidth interconnect. Each chip has only 230mb of SRAM[0].
From the linked reference:
"In the case of the Mixtral model, Groq had to connect 8 racks of 9 servers each with 8 chips per server. That’s a total of 576 chips to build up the inference unit and serve the Mixtral model."
That's eight racks with ~132GB of memory for the model. A single H100 has 80GB and can serve Mixtral without issue (albeit at lower performance).
If you consider the requirements for actual real-world inference serving workloads you need to serve multiple models, multiple versions of models, LoRA adapters, sentence embeddings models (for RAG), etc the economics and physical footprint alone get very challenging.
It's an interesting approach and clearly very, very fast but I'm curious to see how they do in the market:
1) This analysis uses cloud GPU costs for Nvidia pricing. Cloud providers make significant margin on their GPU instances. If you look at qty 1 retail Nvidia DGX, Lambda Hyperplane, etc and compare it to cloud GPU pricing (inference needs to run 24x7) break even on hardware vs cloud is less than seven months depending on what your costs are for hosting the hardware.
2) Nvidia has incredibly high margins.
3) CUDA.
There are some special cases where tokens per second and time to first token are incredibly important (as the article states - real time agents, etc) but overall I think actual real-world production use or deployment of Groq is a pretty challenging proposition.
[0] - https://www.semianalysis.com/p/groq-inference-tokenomics-spe...
This is a significant understatement. ChatGPT has an estimated 100m monthly active users.
Groq gets featured on HN from time to time but is otherwise almost completely unknown. According to their stats they have done something like 15m requests total since launch. ChatGPT likely does this in hours (or less).
In short:
Groq - Ai Chip Microsoft etc. - Nvidia Gpu
They would likely wait till any model performs better than GPT 4 for the same price
Claude 3 Opus is in the capability ballpark of GPT-4, GPT-3.5 has alternatives that are cheaper (Claude 3 Haiku) or cheaper and work offline (Qwen 1.5, Mixtral, …).
TFA shows that groq is many times faster than GPT-4. Up to 18x groq claims. Faster means less energy. So I think it's just a matter of time until these things become ridiculously power efficient (eg run on phones in sub second times)
At least AI & LLMs have large scale practical applications as opposed to crypto (IMO).
Crypto is nearly pure waste.
I don't understand this. This adds bureaucracy and I don't see why different uses need to be charged differently if they all use energy the same.
In other words, if energy costs X per unit, and an inefficient (AI) software takes 30 units and an efficient (traditional) software takes 10 units, then it is already cheaper to run the efficient software, and thus people are already incentivised to do so. There's no need to charge differently. If one day AI turns out to only need 5 units, turning more efficient, then just charge them for 5X. People will gravitate towards the new, efficient AI software naturally then.
It no longer requires an expert human
This was just a blog to generate traffic on the site. Not to showcase some new use case for an llm.
>For all the posturing and forest fire hate on HN, it’s now socially acceptable to run a toy steam engine to power a model car? Not very green of you.
To be fair to GP, they did compare it to alternatives (dumb HTML parsing), but failed to consider versatile HTML parsing or other uses for Groq LLM.
If one cares about the environment, a carbon cap/tax is what you should campaign for. Then carbon-based energy sources will be curtailled, energy costs will go up, and AI like this will be encouraged to become more energy efficient or other methods used instead.
There is a lot of business value happening in the AI space and its only going to get better.
Unless you live in a dictatorship it's definitely up to us to decide... Otherwise you leave your voice to the top 0.0001% business owners and expect them to work for your good and not for their own interests
Also read about the rebound effect. Planes are twice as efficient as they were 100 years ago yet they pollute infinitely more as a whole.
There is nothing ridiculous about the comment you're replying to