It leaves Claude and ChatGPT's coding looking like they are from a different century. It's hard to believe these changes are coming in factors of weeks and months. Last month i could not believe how good Claude is. Today I'm not sure how I could continue programming without Google Gemini in my toolkit.
Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it.
Apart from the apologising. It's silly when the AI apologises with ever more sincere apologies. There should be no apologies from AIs.
The Omega Directive: https://snth.prose.sh/the_omega_directive
Each new release is “game changing”.
The implication being the last release y’all said was “game changing” is now “from a different century”.
Do you see it?
For this to be an accurate and true assessment means you were wrong both before and wrong now.
Are you suggesting that a rush to hyperbole which you don't like means advances in a technology aren't groundbreaking?
Or is it that if there is more than one impressive advance in a technology, any advance before the latest wasn't worthy of admiration at the time?
Not really my idea of good.
Every time in the last three or four weeks, there is a post here about Gemini, the top comment, or one of the top comments is something along these lines. And every time I spend a few minutes making empirical tests to check if I made a mistake in cancelling my paid Gemini account after giving up on it...
So I just did a couple of tests sending the same prompt on some AWS related questions to Gemini Pro 2.5 (free) and Claude paid, and no, Claude still better.
Anyone else concerned about this kind of statements? Make no mistake, everyone. We are living in a LLM bubble (not an AI bubble as none of these companies are actually interested in AI as such as moving towards AGI). They are all trying to commercialise LLMs with some minor tweaks. I don't expect LLMs to make the kind of progress made by the first 3 iterations of GPT. And when the insanely hyped overvaluations crashed, the bubble WILL crash. You BETTER hope there is any money left to run this kind of tools at a profit or you will be back at Stackoverflow trying to relearn all the skills you lost using generative coding tools.
I couldn’t find a way to use Gemini like a prepaid plan. I ain’t giving my credit card to Google for an LLM that can easily charge me hundreds or thousands of EUR.
// Moved to foo.ts
Ok, great. That’s what git is for.
// Loop over the users array
Ya. I can read code at a CS101 level, thanks.
https://developers.google.com/gemini-code-assist/docs/overvi...
They all seem to work remarkably well writing typescript or python but in my experience, they fall short when it comes to shell and more broadly dev ops
I want something running in a VM I can safely let all tools execute without human confirmation and I want to write my own tools and plug them in.
Right now a pro max subscription with Claude code plus my MCP servers seems to be the sweet spot, and a cursory look at the Google ecosystem didn’t identify anything like it. Am I overlooking something?
It's my daily driver so far. I switch between the Claude and Gemini models depending on the type of work I'm doing. When I know exactly what I want, I use Claude. When I'm experimenting and discovering, I use Gemini.
Would you expect that to be Google employing cost-saving measures?
That doesn't mean it's worse than the others just not much better. I haven't found anything that worked better than o1-preview so far. How are you using it?
It's pretty useful as long as you hold it back from writing code too early, or too generally, or sometimes at all. It's a chronic over-writer of code, too. Ignoring most of what it attempts to write and using it to explore the design space without ever getting bogged down in code and other implementation details is great though.
I've been doing something that's new to me but is going to be all over the training data (subscription service using stripe) and have often been able to pivot the planned design of different aspects before writing a single line of code because I can get all the data it already has regurgitated in the context of my particular tech stack and use case.
> For fun, I tried asking a bunch of AI models to figure out what shapes those arrays have. Here were the results:
Based on the results from the top 8 state-of-the- art AI models, Gemini is the best and consistently got the right results:
[1] I don't like NumPy (204 comments):
https://news.ycombinator.com/item?id=43996431
[2] I don't like NumPy: I don’t like NumPy indexing:
Is there any concrete example that makes it really obvious? I had no such success with it so far and i would really like to see the clear cut between the gemini and the others.
Essentially we were hoping to tie that to data inputs and have a system to regularly output the visualisation but with dynamic values. I bet my colleague it would one shot it: it did.
What I’ve also found is that even a sloppy prompt still somehow is reading my mind on what to do, even though I’ve expressed myself poorly.
Inversely, I’ve really found myself rejecting suggestions from ChatGPT, even o4-mini-high. It’s just doing so much random crap I didn’t ask and the code is… let’s say not as “Gemini” as I’d prefer.
But more seriously, they need to uncap temperature and allow more samplers if they want to really flex on their competition.
But seriously, yeah, Gemini is pretty great.
Most of my roles were in small teams building quick ad-hoc analyses for business leaders in large multi billion dollar businesses. Example, one db was Oracle e-business suite. It had been set up ~20 years prior with enhancements along the way. There were only a handful of people in the company who knew what the fields helpfully named like ATTR_000349857. Everyone was overworked with urgent requests (and occasional layoffs) and no one bothered to spend time on documenting the database. I suppose this topic fits under "Provide business-specific context". Great, hire someone to understand and document all that crap. Ain't going to happen.
The other roles also fit this pattern, different systems but urgent needs, always a drought of people who understood both the business and the database.
Occasionally we'd get some "AI team" come looking for "data". After many mindless meetings with no clear objective except "increase profits", they'd quietly disappear.
I use stack overflow a lot to pull SQL examples and tweak. A lot of AI talk feels like hype -- to start with, i'd suggest not redefining words like "hallucination", "intelligence" etc which mean something else in English. Maybe call it "advanced algorithms" and stop with the hype. Also the surveillance and extraction of user data for advertising. Thank you.
> fields helpfully named like ATTR_000349857
> Everyone was overworked
> no one bothered to spend time on documenting the database.
I can't blame AI for not being helpful here, nothing short of divine intervention can fix that.
I feel like that's actually true now with LLMs -- if some query I write doesn't get one-shotted, I don't bother with a galaxy-brain prompt; I just shelve it 'til next month and the next big OpenAI/Anthropic/Google model will usually crush it.
1 month to write some code with LLM, that's quite the opposite of the promised productivity gain
Feels like innovation in AI is rapidly changing from paradigm-shifting to incremental.
But I don't see how this is good news at all from a societal POV.
The last 15 or so years has seen an unprecedented rise in salaries for engineers, especially software engineers. This has brought an interest in the profession from people who would normally not have considered SW as a profession. I think this is both good and bad. It has brought new found wealth to more people, but it may have also diluted the quality of the talent pool. That said, I think it was mostly good.
Now with this game-changing efficiency from these AI tools, I'm sure we've seen an end to the glory days in terms of salaries for the SW profession.
With this gone, where else could relatively normal people achieve financial independence? Definitely not in the service industry.
Very sad.
But I don't see how this is good news at all from a societal POV.
Think about all the lamplighters who lost their jobs. Streetlights just turn on now? Lamplighting used to be considered a stable job! And what about the ice cutters…
For real tho, it's not like there's nothing left to do — we still have potholes to fix, t-shirts to fold and illnesses to cure. Just the fact that many people continue to believe that wars are justified by resource scarcity shows we need technological progress.
Only one of these things interests me. The hype of AI is threatening to kill something I actually enjoy doing. If the hype actualises, I'll likely find myself having to do something I don't enjoy. That being said, if programming can be automated, then probably every white collar job is under serious threat.
These days not so much.
Learning comes through struggle and it's too easy to bypass that struggle now. It's so much easier to get the answers from AI.
The amount and complexity of software will expand to its very outer bounds for which specialists will be required.
There are plenty of folks making a living using platforms like Salesforce and “clicks not code,” but it never led to an implosion of the SE job market. Just expanded the tech job pool. And it’s hard to imagine how that would have happened if everything needed to be coded.
Like how a growth in medical-paraprofessionals didn’t negate the need for doctors and nurses.
I personally don't think we are ever going to get to that point where I can give a simple propnmt and have an LLM generate a complex app ready to run. Think about what that would require:
1. The LLM would have to read my mind and extrapolate all the minute decisions I would make to implement the app based on the prompt.
2. Assuming the LLM can get past (1), it would have to basically be AGI to be able to implement pretty much whatever I can dream up.
3. If 2 & 3 above is somehow achieved, it would be economically very valuable, and you can bet that functionality is not going to be casually enabled in LLMs, for just anyone to use.
(Also, just by market logic, rare skills in demand are always paid more; I'm not sure why you're calling it an "exclusion". The education system in a lot of places might have that function, but that's a separate issue not helped by LLMs writing SQL?)
With all this money sloshing around, it takes only a little imagination to think of ways of channeling some of it to working people without employing them to write pointless (or in some cases actively harmful) software.
It's the cleanest way to give the right context and the best place to pull a human in the loop.
A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.
With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.
We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.
My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.
I’m sorry, I can’t. The tail is wagging the dog.
dang, can you delete my account and scrub my history? I’m serious.
{
"dimensions": [
"users.state",
"users.city",
"orders.status"
],
"measures": [
"orders.count"
],
"filters": [
{
"member": "users.state",
"operator": "notEquals",
"values": ["us-wa"]
}
],
"timeDimensions": [
{
"dimension": "orders.created_at",
"dateRange": ["2020-01-01", "2021-01-01"]
}
],
"limit": 10
}
than this: SELECT
users.state,
users.city,
orders.status,
sum(orders.count)
FROM orders
CROSS JOIN users
WHERE
users.state != 'us-wa'
AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
GROUP BY 1, 2, 3
LIMIT 10;^ kids, this is what AI-induced brainrot looks like.
You should have written your comment in JSON instead of raw English.
In all seriousness, I have some complaints about SQL (I think LINQ’s reordering of it is a good idea), but there’s no need to invent another layer on order for LLMs to be able to wrangle it.
But I would never use one that forced me to express my queries in JSON. The best implementations integrate right into the database so they become an integral part of regular your SQL queries, and as such also available to all your tools.
In my experience, from using the Exasol Semantic Layer, it can be a totally seamless experience.
If you had something on the other side to hallucinate the API itself you could have a program that dreams itself into existence as you use it.
Then it apologizes and gives the right answer. It's weird. We really need a new work for what they're doing, 'cos it ain't thinking.
A writer won't think that they're good at creative writing. In fact, I'm pretty sure they'd think LLM's are terrible at creative writing.
In other words, to an expert in their field, they're not that good - at least not yet.
But to someone who is not an expert, they're unbelievably good - they're enabled to do something they had zero ability to do before.
Sometimes when I want to fine tune a query I am challenging AI to provide a better solution. I give it the already optimized query and I ask for better. I never got a better answer, sometimes because AI is hallucinating or because the changes that it proposes are not working in a way that is beneficial, it is like an idiot parrot is telling what it overheard in the brothel - good info if it is a war brothel frequented by enemy officers in 1916, but not these days.
AI is just increasing the frequency of things turning to custard :)
That's what read replicas with read-only access are for. Production db servers should not be open to random queries and usage by people. That's only for the app to use.
This was my experience as well. However I have observed that things have been improving this regard. Newer LLMs do perform much better. And I suspect they will continue to get better over time.
At least for the only OLAP DB I use often - Amazon Redshift - that’s a solved problem with Workload Management Queues. You can restrict those users ability to consume too many resources.
For queries that are used for OLTP, I usually try to keep those queries relatively simple. If there is a reason for read queries that consume resources , those go to read replicas when strong consistently isn’t required
[1] https://www.malloydata.dev/ [2] https://docs.malloydata.dev/documentation/user_guides/malloy... [3] https://github.com/malloydata/publisher
LLMs make some things that were difficult very easy now.
Good article!
I don't need AI to generate perfect SQL, because I am never going to trust the output enough to copy/paste it — the risk of subtle semantic errors is too high, even if the code validates.
Instead, I find it helpful for AI to suggest approaches — after which I will manually craft the SQL, starting from scratch.
I also tend to turn to AI for advising me on difficult use cases, and most of the time it's for production code rather than one-offs. The easy cases, I just write myself because it's more mental effort to review code for subtle errors than it is to write it.
It seems to me that this skeptical mindset is consonant with handling AI output with care.
Do you need an expert to verify if the answer from AI is correct? How is it time saved refining prompts instead of SQL? Is it typing time? How can you know the results are correct if you aren't able to do it yourself? Why should a junior (sorcerer's apprentice) be trusted in charge of using AI? No matter the domain, from art to code to business rules, you still need an expert to verify the results. Would they (and their company) be in a better place to design a solution to a problem themselves, knowing their own assumptions? Or just check of a list of happy-path results without a FULL knowledge of the underlying design? This is not just a change from hand-crafting to line-production, it's a change from deterministic problem-solving to near-enough is good enough, sold as the new truth in problem-solving. It smells wrong.
We recently did the first speed run where Louie.ai beat teams of professional cybersecurity analysts in an open competition, Splunk's annual Boss of the SOC. Think writing queries, wrangling Python, and scanning through 100+ log sources to answer frustratingly sloppy database questions:
- We get 100% correct for basic stuff in the first half that takes most people 5-15 minutes per question, and 50% correct in the second half that most people take 15-45+ minute per question, and most teams time out in that second half.
- ... Louie does a median 2-3min per question irrespective of the expected difficulty, so about 10X faster than a team of 5 (wall clock), and 30X less work (person hours). Louie isn't burnt out at the end ;-)
- This doesn't happen out-of-the-box with frontier models, including fancy reasoning ones. Likewise, letting the typical tool here burn tokens until it finds an answer would cost more than a new hire, which is why we measure as a speedrun vs deceptively uncapped auto-solve count.
- The frontier models DO have good intuition , understand many errors, and for popular languages, DO generate good text2query. We are generally happy with OpenAI for example, so it's more on how Louie and the operator uses it.
- We found we had to add in key context and strategies. You see a bit in Claude Code and Cursor, except those are quite generic, so would have failed as well. Intuitively in coding, you want to use types/lint/tests, and same but diff issues if you do database stuff. But there is a lot more, by domain, in my experience, and expecting tools to just work is unlikely, so having domain relevant patterns baked in and that you can extend is key, and so is learning loops.
A bit more on louie's speed run here: https://www.linkedin.com/posts/leo-meyerovich-09649219_genai...
This is our first attempt at the speed run. I expect Louie to improve: my answers represent the current floor, not the ceiling of where things are (dizzyingly) going. Happy to answer any other q's where data might help!
> Do you need an expert to verify if the answer from AI is correct?
If the underling data has a quality issue that is not obvious to a human, the AI will miss it too. Otherwise, the AI will correct it for you. But I would argue that it's highly probable that your expert would have missed it too... So, no, it's not a silver bullet yet, and the AI model often lacks enough context that humans have, and the capacity to take a step back.
> How is it time saved refining prompts instead of SQL?
I wouldn't call that "prompting". It's just a chat. I'm at least ~10x faster (for reasonable complex & interesting queries).
There isn't one perfect solution to SQL queries against complex systems.
A suduko has one solution.
A reasonably well-optimised SQL solution is what the good use of SQL tries to achieve. And it can be the difference between a total lock-up and a fast running of a script that keeps the rest of a complex system from falling over.
So for example, I was mucking around with ffmpeg and mkv files, and instead of searching for the answer to my thought-bubble (which I doubt would have been "quick" or "productive" on google), I straight up asked it what I wanted to know;
> are there any features for mkv files like what ffmpeg does when making mp4 files with the option `--movflags faststart`?
And it gave me a great answer! (...the answer happened to be based upon our prior conversation of av1 encoding, and so it told me about increasing the I-frame frequency).
Another example from today - I was trying to build mp4v2 but ran in to drama because I don't want to take the easy road and install all the programs needed to "build" (I've taken to doing my hobby-coding as if I'm on a corporate-PC without admin rights (windows)). I also don't know about "cmake" and stuff, but I went and downloaded the portable zip and moved the exe to my `%user-path%/tools/` folder, but it gave an error. I did a quick search, but the google results were grim, so I went to chat-gpt. I said; > I'm trying to build this project off github, but I don't have cmake installed because I can't, so I'm using a portable version. It's giving me this error though: [*error*]
And the aforementioned error was pretty generic, but chat-gpt still gave a fantastic response along the lines of; > Ok, first off, you must not have all the files that cmake.exe needs in the same folder, so to fix do ..[stuff, including explicit powershell commands to set PATH variables, as I had told it I was using powershell before].
> And once cmake is fixed, you still need [this and that].
> For [this], and because you want portable, here's how to setup Ninja [...]
> For [that], and even though you said you dont want to install things, you might consider ..[MSVC instructions].
> If not, you can ..[mingw-w64 instructions]. > I'm wondering if it would be beneficial to add an electric-assist motor to an existing petrol vehicle. There are some 2010 era SUV's that have relatively uneconomical petrol engines, which may be good candidates. That is because some of them are RWD, whilst some are AWD. The AWD gearbox and transfer case could be fitted to the RWD, leaving the transfers front "output" unconnected. Could an electric motor then be connected to this shaft, hence making it an input?
It gave a decent answer, but it was focused on the "front diff" and "front driveshaft" and stuff like that. It hadn't quite grasped what I was implying, although it knew what it was talking about! It brought up various things that I knew were relevant (the "domain knowledge" aspect), so I brought some of those things in my reply (like about the viscous coupling and torque split); > I mentioned the AWD gearbox+transfer into a RWD-only vehicle, thus keeping it RWD only. Thus both petrol+electric would be "driving" at the same time, but I imagine the electric would reduce the effort required from the petrol. The transfer case is a simple "differential" type, without any control or viscous couplings or anything - just simple gear ratio differences that normally torque-split 35% to the front and 65% to the rear. So I imagine the open-differential would handle the 2 different input speeds could "combine" to 1 output?
That was enough to "fix" its answer (see below). And IMO, it was a good answer!I'm posting this because I read a thread on here yesterday/2-days-ago about people stuggling with their AI's context/conversation getting "poisoned" (their word). So whilst I don't use AI that much, I also haven't had issue with it, and maybe that's because of that way I converse with it?
---------
"Edit": Well, the conversation was too long for HN, so I put it here - https://gist.github.com/neRok00/53e97988e1a3e41f3a688a75fe3b...
Is it to build a copilot for a data analyst or to get business insight without going through an analyst?
If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.
These don’t seem like text2sql problems:
> Why did we hit only 80% of our daily ecommmerce transaction yesterday?
> Why is customer acquisition cost trending up?
> Why was the campaign in NYC worse than the same in SF?
Correct, but I would propose two things to add to your analysis:
1. Natural language text is a universal input to LLM systems
2. text2sql makes the foundation of retrieving the information that can help answer these higher-level questions
And so in my mind, the goals for text2sql might be a copilot (near-term), but the long-term is to have a good foundation for automating text2sql calls, comparing results, and pulling them into a larger workflow precisely to help answer the kinds of questions you're proposing.
There's clearly much work needed to achieve that goal.
But ofc the real issue is that if your report metrics change last minute, you're unlikely to get good report. That's a symptom of not thinking much about your metrics.
Also, reports / analysis generally take time because the underlying data are messy, lots of business knowledge encoded "out of band", and poor data infrastructure. The smarter analytics leaders will use the AI push to invest in the foundations.
I assume a useful goal would be to guide development of the system in coordination with experts, test it, have the AI explain all trade offs, potential bugs, sense check it against expected results etc.
Taste is hard to automate. Real insight is hard to automate. But a domain expert who isn’t an “analyst” can go extremely far with well designed automation and a sense of what rational results should look like. Obviously the state of the art isn’t perfect but you asked about goals, so those would be my goals.
> awesome-Text2SQL: https://github.com/eosphoros-ai/Awesome-Text2SQL
> Awesome-code-llm > Benchmarks > Text to SQL: https://github.com/codefuse-ai/Awesome-Code-LLM#text-to-sql
My recent endevour was with Gemini 2.5:
- Write me a simple todo app on cloudflare with auth0 authentication.
- Here's a simple todo on cloudflare. We import the @auth0-cloudflare and...
- Does that @auth0-cloudflare exists?
- Oh, it doesn't. I can give you a walkthrough on how to set up an account on auth0. Would you like me to?
- Yes, please.
- Here. I'm going to write the walkthrough in a document... (proceed to create an empty document)
- That seems to be an empty document.
- Oh, my bad. I'll produce it once more. (proceed to create another empty document)
- Seems like you're md parsing library is broken, can you write it in chat instead?
- Yes... (your gemini trial has expired, would you like to pay $100 to continue?)Claude and Gemini are pretty decent at providing a small and tight function definition with well defined parameters and output, but anything big and it starts losing shit left and right.
All vibecoding sessions I've seen have been pretty dead easy stuff with lot of boilerplate, maybe I'm weird for just not writing a lot of boilerplate and rely on well-built expressive abstractions..
Quote: "While sampling, after every token, our inference engine will determine which tokens are valid to be produced next based on the previously generated tokens and the rules within the grammar that indicate which tokens are valid next. We then use this list of tokens to mask the next sampling step, which effectively lowers the probability of invalid tokens to 0. Because we have preprocessed the schema, we can use a cached data structure to do this efficiently, with minimal latency overhead."
I.e. mask any tokens that would produce something that isn't valid SQL in the given dialect, or further, a valid query for the given schema. I assume some structured outputs capability is latent to most assistants nowadays, so they probably already have explored this
[1] https://openai.com/index/introducing-structured-outputs-in-t...
At the moment GCP are at 76%, humans are at 93%.
Thus, this is mainly just a tool for true experts to do less work and still get paid the same, not a tool for beginners to rise to the level of experts.
Obviously being able to at least read a bit of SQL and understanding the basic idea of relational databases helps loads.
But how do you know if the SQL is correct, or just happened to return results that match for one particular case?
Sounds like a bunch of bespoke not-AI work is being done to make up for LLM limitations that point blank can’t be resolved.
I wish developers would make use of long table names and column names. For example, pcat_extension could have been named release_schema_1_0.product_category_extension. And cat_id2 could have been named category_id2.
I’m curious to know what people are doing to measure whether the customer got what they were looking for. Thumbs up/down seems insufficient to me.
The ability of the LLM to perform purely depends on having good knowledge of what is going to get asked and how, which is more complex than it sounds
What techniques are people having success with?
Works pretty well for me, where you can typically get within the range of human2human variance.
> If the user is a technical analyst or a developer asking a vague question, giving them a reasonable, but perhaps not 100% correct SQL query is a good starting point
> Out of the box, LLMs are particularly good at tasks like creative writing, summarizing or extracting information from documents.
I don't -think- this was written by an LLM, but it really pulls me out of the technical article.
I see the promise for green-field projects.
I'm certain they'll get there soon, they're just not there yet.
As a quick aside there's one thing I wish SQL had that would make writing queries so much faster. At work we're using a DSL that has one operator that automatically generates joins from foreign key columns, just like
credit.CLIENT->NAME
And you got clients table automatically joined into the query. Having to write ten to twenty joins for every query is by far the worst thing, everything else about writing SQL is not that bad.It's not like it's some obscure thing, it's absolutely ubiquitous.
Relatively speaking it's not very complicated, it's widely documented, has vast learning resources, and has some of the best ROI of any DSL. It's funny to joke that it looks like line noise, but really, there is not a lot to learn to understand 90% of the expressions people actually write.
It takes far longer to tell an AI what you want than to write a regex yourself.
With an AI prompt you'll have to do the same thing, just more verbosely.
You will have to do what every programmer hates, write a full formal specification in English.
https://blog.codinghorror.com/regular-expressions-now-you-ha...
[0]: https://blog.cloudflare.com/details-of-the-cloudflare-outage...
Step 2...