The pull quote is: The impression overall I got here is that this is somewhere around (OpenAI) o1-pro capability
In math it shares the top spot with o1 and is just a few points behind (well within errors). In creative writing it is basically ex-aequo with the latest ChatGPT 4o and in coding it's actually significantly ahead of everyone else and represents a new SOTA.
How would the math change after factoring in that OpenAI isn't even covering entirety of opex with the sub anyway, and/or people finding associating their money and Twitter accounts to be weird, and/or this thing is supposedly running on a bigger cluster than that for OpenAI?
Do we have a way to tell if one model is smarter than another at that point?
There is no being "on par" in this space. Model providers are still mostly optimising for a handful of benchmarks / goals, like we can already see that Grok 3 is doing incredibly well on human preference (LM Arena) however with Style Control, it's suddenly behind ChatGPT-4o-latest and Gemini 2.0 is out the picture. So even within a single domain, goal, benchmark—it's not as straightforward as to say that one model is "on par" with another.
> shouldn't we expect that anybody with the computer power is capable to compete with o1-pro?
Not necessarily. I know it may be tempting to think that Grok 3 is entirely a result of xAI having lots of "computer power", but you have to recognise that this mindset is coming from a place of ignorance, not wisdom. Moreover, it doesn't even pass off as "cynical" view, because it's common knowledge that model training is really, really complicated. DeepSeek results are note-worthy, and really influential in some respects, but it hasn't magically "solved" training, or made training necessarily easier / less expensive for the interested parties. They never shared the low-level performance improvements, just model weights and lots of insight. For talented researchers, this is valuable, of course, but it's not like "anybody" could easily benefit from it in their training regimes.
Update: RFT (contra SFT) is becoming really popular with service providers, and it's not been "standardised" beyond whatever reproductions to have emerged in the weeks prior, moreover R1 cost is still pretty high[1] at something like $7/Mtok, & bandwidth is really not great. Consider something like Google Vertex AI's batch pricing for Gemini 1.5 Pro and Gemini 2.0 Flash models, which is at 50% discount, and their prompt caching which is at 75% discount. R1 is still got a way to go.
[1]: https://openrouter.ai/deepseek/deepseek-r1/providers?sort=th...
o1-pro is "o1 on steroids" and was the first selling point of the $200/month Pro subscription but they later also added "Deep Research" and Operator to the Pro subscription.
Andrej Karpathy: "I was given early access to Grok 3 earlier today" - https://news.ycombinator.com/item?id=43092066 - Feb 2025 (48 comments)
https://x.com/lmarena_ai/status/1891706264800936307
It's been said before but it is great news for consumers that there's so much competition in the LLM space. If it's hard for any one player to get daylight between them & the 2nd best alternative, hopefully that means one monopolistic firm isn't going to be sucking up all the value created by these things
It passed every goofy test I have for writing articles which involves trying to surface arcane obscure details. (it certainly means however they are scraping the Web they are doing a good job here)
It made the database code I wrote over the last week with o3/o1/GPT4o/Claude3.5 look like a joke.
It fills me with rage over who owns this thing.
Even if people tank Tesla's car business and run Twitter into the ground, I think our new Galactic Edgelord is going to win his first trillion on xAI and Teslabots anyway.
btw: it tried to charge me $40/mo for this thing: https://imgur.com/a/QXslgBo
This hype will burst sooner than later and will trigger yet another global recession. This is untenable.
This lame HN trope of LLMs having no business model needs to die.
>This hype will burst sooner than later and will trigger yet another global recession.
It seems to small of bubble for global recession. I mean if it is a bubble at all, there is all the reasons to believe that the strategy will work with a significant probability.
The dot com bubble also gave us the most valuable companies in history, like Google, Apple, Amazon, Facebook, etc.
I quite like the idea of a future where the AI job holocaust largely never happened because license costs ate up most of the innovation benefit. It's just the kind of regressive greed that keeps the world ticking along and wouldn't be surprised if we ended up with something very close to this
These things still cost me time because of hallucinations.
It matters if it is better than what you have.
If it breaks a cup but is 10x cheaper than a human, go figer
I wouldn't bet on that, given the undemocratic influence Grok's owner has in government.
That being said it's my understanding that these companies don't have many huge contracts at all -- you can audit this in like 10 minutes on FPDS. Companies need a LOT of capital, time, and expertise to break into the industry and just compliance audit timelines are 1-4 years right now, so this could definitely change in the next couple years.
Is it? Because it seems like a bunch of megacorps pirating every single copyrighted work available in digital format, spending an enormous amount of electricity (that is probably not 100% clean) to churn through them, and the end result we have a bunch of parrots that may or may not produce accurate results so that spammers can more effectively fill the Internet with crap.
Two rich Russian guys meet and one brags about his new necktie. "Look at this, I paid $500 for it." The other rich Russian guy replies: "Well, that is quite nice, but you have to take better care of your money. I have seen that same necktie just yesterday in another shop for $1000."
I don't have that dog in me anymore, but there are plenty of engineers who do and will happily work those hours for 500k USD.
A variant of multi-modal LLMs may be the solution to self-driving cars, home robotics, and more.
I keep saying that to be a really effective driver, an AI model will need a theory of mind, which the larger LLMs appear to have. Similarly, any such model will need to be able to do OCR and read arbitrary street signs, and understand what the sign meant. Most modern LLMs can already do this.
Deepseek made the news because how they were able to do it with significantly less hardware than their American counterparts, but given that Musk has spent the last two years telling everyone how he was building the biggest AI cluster ever, it's no surprise that they manage to reproduce the kind of performances other players are showing.
No matter what people say, they're all just copying OpenAI. I'm not a huge fan of OpenAI, but I think they're still the ones showing what can be done. Yes, xAI might have taken less time because of their huge cluster, but it’s not inspiring to me. Also, the dark room setup was depressing.
This again proves that OpenAI simply has no tech moat whatsoever. Elon's $97 billion offer for OpenAI last week was reasonable given that xAI already have something just a few months behind - it would probably be faster for xAI to catch up with o3 than going through all those paperworks and lawyer talks required for such an acquisition.
Elon also has some huge up-hand here -
Elon and his mum are extremely popular in China, it would be easier for him to acquire Chinese AI engineers. He can offer xAI/XSpace/Neurallink shares to those best AI engineers who'd prefer some kind of almost guaranteed 8 figure return in long run.
Good luck to OpenAI investors who still believe that OpenAI worth anything more than $100 billion.
That is not an advantage in a race against Microsoft, Google, Meta etc. he's competing against all the biggest companies in the world in this race. He's not going to be able to outspend them if the economics look at all sensible.
No, spacex projects are extremely $ efficient. The total project cost of starship is like 20% of the SLS.
> he's competing against all the biggest companies in the world in this race.
No, this is a not a pissing contest on who has the most $. If it is about who can come up with most $, then the entire race is already over as the CCP has access to trillions of $ CASH.
So it could be that their success is mostly about taking an open and free thing, and turned it proprietary.
Leaderboards don't care about cost. Leaderboards largely rank a combination of accuracy + speed. Anthropic has fell behind Google in accuracy + speed (again missing COT), and frankly behind Google in raw speed.
Seems like the team at xAI caught up very quickly to OpenAI to be at the top of the leaderboard in one of the benchmarks and also caught up with features with Grok 3.
Giving credit where credit is due, even though this is a race to zero.
Maybe the best outcome of a competitive Grok is breaking the mindshare stranglehold that ChatGPT has on the public at large and with HN. There are many good frontier models that are all very close in capabilities.
Debut in the sense that it’s something good enough that it’s getting mainstream attention.
Unfortunately LLMs are shifting compute time to test time instead of train time. I don't really like this and frankly it shows a stalling of the architectures, data sets, etc...
This commit seems to indicate so, but neither HF or GH has public data yet:
https://huggingface.co/xai-org/grok-1/commit/91d3a51143e7fc2...
Edit: Answer from Elon in video is that they plan to make Grok 2 weights open once Grok 3 is stable.
Seeing awesome feedback from players on our demos (and seeing an insane amount of stickiness from players playing even small demos built around generative AI mechanics). Raising now. Hiring soon to move faster. Feel free to reach out - dru@chromagolem.com
If you don't get feedback from the people actually playing your game (or using your product), you will never get the improvement you need to help them.
You can have the most talented passionate people there are developing a product, but if it's not working for the people you want to sell it to, it's the wrong product.
Most tech products are terrible because those paying for them are not those that have to use them every day, or because they solve a corporate problem (compliance) and not a usability problem which is the actual need from the people on the shop floor.
Many big games/products are already built mostly on metrics, and that has proven to be a terrible way to work out what people 'want'. It's a great way to justify money decisions though, so it keeps happening (and games/products from big companies keep getting worse).
The implicit assumption with dogfooding is that more feedback is better, even if that feedback is artificially constructed.
I think the idea here is that foisting one's product onto one's own workers is likely to incur a bunch of additional biases and preferences in feedback. Paying customers presumably use the product because they need it. Dogfooding workers use the product because they are told to do so.
Combine the two and the potential for manipulation, suggestion, preference altering is through the roof.
We're still waiting for OpenAI to do the same. Even at least GPT-3.
The exact details of OpenAI's models and training data are not fully disclosed, which can raise concerns about potential biases or vulnerabilities.
https://manifold.markets/SaviorofPlant/will-xai-open-source-...
I'm also skeptical of lmarena as there is a large number of Elon Musk zealots trying to pass off Grok as a proxy for Tesla shares.
I suppose you can take that to mean that people who do have access to the service should not expect much in terms of data protection.
If you do collect personal data and do funky stuff with it.
Another approach would be to not collect that personal data until you have the right process in place, and basically be regulation-compatible out-of-the-door on day one.
Also, they will be open sourcing Grok 2, which is probably pretty behind at this point, but will still be interesting for people to check out.
I hate how its the same story for every new AI technology. If someone can tell me who to vote for or where to protest to change this awful EU law, that would be great.
The Digital Market Act is a bit of an overreach but the AI law is not.
It classifies AI into risk categories, so that it doesn't kill anyone, carelessly handle sensitive information, etc.
A chatbot can easily comply with it.
Well no. Mistral.ai
That's why they use their AI products as a leverage to turn European people against the laws that protect them from big tech. It's just blackmail.
(Assuming that is a reference to the Mussolini quote.)
So those who use less pay for those who use more, and I don't see it as a fair deal.
BTW, Grok 3 will be available on x.ai in coming weeks.
* when I use chatbots as search engines, I'm very quickly disappointed by obvious hallucinations
* I ended up disabling github copilot because it was just "auto-complete on steroids" at best, and "auto-complete on mushrooms" at worst
* I rarely have use cases where I have to "generate a plausible page of text that statistically looks like the internet" - usually, when I have to write about something, it's to put information that's in my head into other people head
* I'd love to have something that reads all my codebase and draws graphs, explain how things work, etc... But I tried aider/ollama, etc.. and nothing even starts making sense (is that an avenue to persevere in, though ?)
* At once, I tried to write in plain english a situation where a team has to do X tasks, in Y weeks, and I needed a table of who should be working on what for each week. I was impressed that LLMs were able to produce a table - the slight problem was that, of course, the table was completely wrong. Again, is it just bad prompting ?
It's an interesting problem when you don't know if you're just having a solution in search of a problem, or if you're missing something obvious about how to use a tool.
Also, all introductory texts about LLMs go into many details about how they're made (NNs and transformers and large corpuses and lots of electricity etc...) but "what you can do with it" looks like toy examples / simply not what I do."
So, what is the "start from here" about what it can really do ?
For coding, I use cursor composer to gather context about the existing codebase (context.md). Then I paste that into DeepSeek R1 to iterate on requirements and draft a high level design document, maybe some implementation details (design.md).
Paste that back into composer, and iterate; then write tests. When I'm almost done, I ask composer to generate me a document on the changes it made and I double check that with R1 again for a final pass (changes.md).
Then I'm basically done.
This is architect-editor mode: https://aider.chat/2024/09/26/architect.html.
I've found Cursor + DeepSeek R1 extremely useful, to the point that I've structured a lot of documents in the codebase to be easily greppable and executable by composer. Benefit of that is that other developers (and their composers) can read the docs themselves.
Engineers can self-onboard onto the codebase, and non-technical people can unstuck themselves with SQL statements with composer now.
I have found similar when giving backstory and needing help to start structuring difficult conversations where I want to say the right thing but also need to be sensitive.
- Discussing the various stages of candymaking and their relation to the fundamental properties of sugar syrups, and which candies are crystalline vs amorphous. It turns out junior mints are fudge. Fondant is really just fudge. Everything is fudge, my god.
- Summarizing various SEC filings and related paperwork to understand the implications of an activist investor intervening in a company
- Discussing the relative film merits of the movie Labyrinth and other similar 80s kitsch movies. ChatGPT mentioned the phenomenon of "twin films" which was an interesting digression.
- Learning about various languages Tolkien invented and their ties to actual linguistics of natural languages and other conlangs
- Some dimensional analysis of volumes, specifically relating to things like "how many beans are in the jar" estimation and what the min and max of a particular weight of coins might be valued, in terms of both a par value based on a standard coin mix and outliers of, for example, old dimes that are pure silver.
- Discussion of quines in prolog and other languages, which resulted in a very interesting ChatGPT bug where it started recursing and broke when trying to write a prolog quine.
- Back of the envelope economic calculations around the magnitude of the housing deficit and the relative GDP cost for providing enough housing quickly enough to make an impact. Spoiler: it's probably unreasonably expensive to build enough houses to bring down housing prices by any significant degree, and even if we wanted to, there's not enough skilled workers.
- A number of podcasts transcribed. (I hate audio and meandering, so transcribed and summarized is perfect) I could use whisper and a python script to do this, but I'd rather let ChatGPT do the legwork, and it actually used a more modern model and method of processing than I would have naively used.
I find Github Copilot to be a really great autocomplete. I frequently write the comment at the top of a function and hit tab and it writes the whole function. This is dependent on typescript and having a relatively standard codebase but I think those things are useful on their own. You really have to limit it in terms of scope and specifics, but it lets me think high level instead of worrying about syntax.
I can feel the cold wind of the next AI winter coming on. It's inevitable. Computers are good at emulating intelligent behavior, people get excited that it's around the corner, and the hype boils over. This isn't the last time this will happen.
It's a weak jack of all trades: it knows a fair amount about the sum of human knowledge (which is objectively super-human), but can't go deep on any one thing, and still seriously lags behind humans in terms of reasoning. It's an assistant that all book smarts and no street smarts. Or maybe: it's a search engine for insanely specific things.
Rote work, as well. Things like porting an enum from one programming language to another: past the source language into a comment and start it off with one or two lines in the target language. Dozens of tabs are surely faster than manual typing, copy paste, or figuring out vim movements/macros.
For example asking it something like "I have an elixir app that is started with `mix ...` can you give me a Dockerfile to run it in a container?"
It can also do things like "Given this code snippet, can you make it more Pythonic" or even generate simple apps from scratch.
For example, a prompt like "Can you write me a snake game in HTML and JavaScript? The snake should eat hot dog emojis to grow longer." will actually generate something that works. You can see the generated code for that prompt at https://claude.site/artifacts/34540f88-965e-45ca-8083-040e30...
Following up with "Can you make it so that people can swipe on mobile to control the snake?" generates https://claude.site/artifacts/651e957a-9957-488c-ae6b-e81348... which is pretty good IMO for 30 seconds of effort.
It also has a surprisingly competent analysis mode where you can upload a CSV and have it generate charts and analyze the data.
It's not perfect, it'll sometimes get confused or generate some dubious code, but you can quickly get to a 90% good solution with 1% of the effort, which is pretty impressive IMO.
this is good enough sell for me, and it's like sub 1-in-50 that it's "auto-complete on mushrooms" (again my experience, YMMV).
An awful lot of the time, my day to day work involves writing one piece of code and then copy-pasting it changing a few variable names. Even if I factor out the code into a method, I've still got to call that method with the different names. CoPilot takes care of that drudgery and saves me countless minutes per day. It therefore pays for itself.
I also use ChatGPT every time I need some BASH script written to automate a boring process. I could spend 20-30 minutes searching for all the commands and arguments I would need, another 10 minutes typing in the script, another 10-20 minutes debugging my inevitable mistakes. Or I make sure to describe my requirements exactly (5-10 minutes), spend 5 minutes reviewing the output, iterate if necessary (usually because I wasn't clear enough in the instructions).
3-5x speed up for free. Who's not going to take that win?
For example, you have a plant you can't identify. Hard to Google search with words. "Plant with small red berries and...". You could reverse image search your photo of it, probably won't help either. Show an LLM the photo (some accept images now). LLM tells you what it thinks. Now you Google search "Ribes rubrum" to verify it. Much easier.
You've got a complicated medical problem that's been going on for months. A google search of all the factors involved would be excessively long and throw up all sorts of random stuff. You describe the whole scenario to an LLM and it gives you four ideas. You can now search those specific conditions and see how well they actually match.
I've found there are actually a lot of questions that fit in that sort of NP complexity category.
It (mostly) exceeds and excels at every task I use it for. I'm rarely disappointed. YMMV.
Absolutely life-changing for me.
I'll give two recent use-cases that may provide a hint of their ultimate utility:
1) I've been modernising 2010-era ASP.NET code written by former VB programmers that looooved to sprinkle try { ... } catch( Exception e ) { throw e; } throughout. I mean thousands upon thousands of instances of these pointless magical incantations that do nothing except screw up stack traces. They probably thought it was the equivalent of "ON ERROR RESUME NEXT", but... no, not really. Anyway, I asked ChatGPT in "Reasoning" mode to write a CLI tool utilising the Roslyn C# compiler SDK to help clean this up. It took about three prompts and less than an hour, and it spat out 300 lines of code that required less than 10 to be modified by me. It deleted something like 10K lines of garbage code from a code base for me. Because I used a proper compiler toolkit, there was no risk of hallucinations, so the change Just Worked.
2) I was recently troubleshooting some thread pool issues. I suspect that some long-running requests were overlapping in time, but Azure's KQL doesn't directly provide a timeline graphical view. I dumped out the data into JSON, gave ChatGPT a snippet, and told it to make me a visualiser using HTML and JS. I then simply pasted in the full JSON dump (~1 MB) and ta-da instant timeline overlap visualiser! It even supported scrolling and zooming. Neat.
Then I had a better idea: I spent 20 minutes baby wearing, walking and dictating everything about my startup to ChatGPT. Later I took all that text and labeled it as a brain dump, plus my product support portal and some screenshots of my marketing material. Gave it all to ChatGPT again and asked it to answer each of the questions in the form. That's it. I have a pretty good version 1 which I can revise today and be done with it.
Many, many hours saved. I have tens of examples like that.
The product documentation I provided it with was also created with the help of GPT, and that saved me even more time.
It also helps me getting started with new content, kind of building the scaffolding of, say, a blog or social post. It still needs adaption and fine-tuning, but getting rid of a white page is a great help for me.
And I use LLMs to play through ideas and headlines. I would normally do this with other humans, but since working full remote, its a nice sparing partner, although the AI not being able to really give criticism is a bit annoying.
The tools also make it easier to write in English as a non-native, making sure my text does not include any false friends or grammar errors.
As a concrete example, I was recently playing with simulating the wave equation, and I wanted to try to use a higher-order approximation as I had never done that before. I'm quite rusty as I haven't done any numerical work since university some decades ago.
I still recalled how to deal with the Neumann boundary conditions when using the traditional lower-order approximation, but I was uncertain how to do it while keeping the higher-order approximation.
Searching for "higher-order neumann boundary conditions wave equation" or similar got me pages upon pages of irrelevant hits, most of them dealing with the traditional approximation scheme.
So I turned to ChatGPT which right away provided a decent answer[1], and along with a follow-up question gave me what I needed to implement it successfully.
[1]: https://chatgpt.com/share/67b4ab43-6128-8013-8e5a-3d13a74bf6...
One thing I can't figure out how to get LLMs to do is truely finish work. For example if I have 100 items that need xyz done to them, it will do it for the first 10 or so and say ~"and so on". I have a lot of trouble getting LLMs to do tasks that might take 10 mins - 1h. They always seem to simply want to give an example. Batch processing is the answer, I guess, or perhaps more 'agentic' models/tools - but I wonder if there are other ways.
we import descriptions of products from a seller. the problem is they are mental ( probably written by chatgpt :)) and are way too long. we need only small blurb.
I put our style guide and given text to chatgpt and I get somehow reasonable description back. then editors still need to check it, but it's way less work.
LLMs are pretty good at translation between human languages which makes sense since they are language models after all. They are better at this any any other technology.
The state of the art image ones can also probably do OCR and handwriting recognition better than any other software though may be expensive to run in large volume. But if you need to take picture of a notebook page with your camera phone an LLM can quickly OCR it.
1. Exploring a new domain and getting some terms I can google for.
2. Making small scripts to do things like query github's GraphQL API.
3. Autocomplete of code using copilot.
For example, in the beginning of this year, I completed this exercise where I wrote a lot about childhood, past experiences, strengths and weaknesses, goals and ambitions for the future, etc (https://selfauthoring.com) and then I uploaded all that to ChatGPT, asked it to be my therapist/coach, and then asked it to produce reports about myself, action plans, strategies, etc. Super interesting and useful.
By now ChatGPT has quite a bit of context from past conversations. Just the other day I used this prompt from someone else and got back super useful insights – "Can you share some extremely deep and profound insights about my psyche and mind that I would not otherwise be able to identify or see as well as some that I may not want to hear"
I find it good for complex SQL, reviewing emails, and Godot assistance (I'm a beginner game Dev).
There are also times when I have programming questions and I might try to use chatgpt, with mixed results.
Our company has tried to integrate it into one of our products, and I find it troubling how on occassion it is confidently giving bad results, but my concern seems to be in the minority.
EDIT: there was also a large refactor I did recently which involved lots of repeatable, but not super regexable, changes - chatgpt forgot where it was as I went through it, but other than working around that it was very useful.
I don't use integrated coding tools, so my workflow isn't super fast, but that's not what I'm really aiming for - more that I want to save my brain's energy from low level drudgy boilerplate or integration code, so I can focus it on the more important decisions and keep business-side context in my head.
It's been a huge help for me this way across multiple projects in multiple domains.
I did write 50 or more lines of instructions on what needs to be done and in what order.
ChatGPT gave me 5/6 (I asked for this) bash scripts totalling 300+ lines that seamlessly work together.
After reviewing, I asked it to change a few places.
If any human tried the same (except those rare bash Gods), it'd take many hours. I think it took me less than 30 minutes.
1. Small coding tasks ("I want to do XYZ in Rust"), it has replaced stack overflow. Very convenient when writing code in a language I'm not super familiar with. 2. Help with English (traduction, proofreading...) 3. Learning something, like tech, I like interacting with it by asking questions, it's more engaging than just reading content.
I'd say nothing is game changing, but it's a nice productivity boost.
I myself use them a lot, though I constantly feel that I would be able to get more out of them if only I were smarter.
Same, it's good for repetitive things, things that have been answered 1000 times on stack overflow, translations, but that's about it. If you work on anything remotely new/hard it's mostly disappointing, you have to babysit it every step of the way and rewrite most of what it's shitting out in the end anyways.
I think it just made it obvious that 90% of tech jobs basically amount to writing the same CRUD thing over and over again & mobile/web apps with very common designs and features.
Most recently I tried to use them both to solve a programming problem that isn't well documented in the usual channels (Reddit, StackOverflow, etc) and found it to be quite a disappointing and frustrating experience. It just constantly, enthusiastically fed me total bullshit, with functions that don't exist or don't do what the LLM seems to "think" they do. I'm sure I'm just "holding it wrong" but my impression at this stage is that it is only capable of solving problems that are trivially solvable using a traditional search engine, with the added friction that if the problem isn't trivially solvable, it won't actually tell you that but will waste your time with non-obvious wrong answers.
I did have a slightly more positive experience when asking it about various chess engine optimisation algorithms. I wasn't trying to use the code it generated, just to better understand what the popular algorithms are and how they work. So I think they might work best when there is an abundance of helpful information out there and you just don't want to read through it all. Even then, I obviously don't know what ChatGPT was leaving out in the summary it provided.
- I have these three ingredients; recommend Italian main courses.
- What other ingredients pair well with this?
- How can I "level up" this dish if I want to impress?
- Can I substitute X for Y?
- Generate a family-friendly meal with lots of veggies using leftover roast chicken.
* Figuring out where to start when learning new things (see also <https://news.ycombinator.com/item?id=43087685>)
One way I treat LLMs is as a "semantic search engine". I find that LLMs get
too many things wrong when I'm being specific, but they're pretty good at
pointing me in a general direction.
For example, I started learning about OS development and wanted to use Rust. I
used ChatGPT to generate a basic Rust UEFI project with some simple
bootloading code. It was broken, but it now gave me a foothold and I was able
to use other resources (e.g. OSDev wiki) to learn how to fix the broken bits.
* Avoiding reading the entire manual It feels like a lot of software documentation isn't actually written for real
readers; instead being a somewhat arbitrary listing of a program's features.
When programs have this style of documentation, the worst case for figuring
out how to do a simple thing is reading the entire manual. (There are better
ways to write documentation, see e.g. <https://diataxis.fr/>)
One example is [gnuplot](http://www.gnuplot.info/). I wanted to learn how to
plot from the command line. I could have pieced together how to do it by
zipping around the
[gnuplot manual](http://www.gnuplot.info/docs_5.4/Gnuplot_5_4.pdf) and building
something up piecewise, but it was faster to instruct Claude directly. Once
Claude showed me how to do a particular thing (e.g. draw a scatter plot with
dots intstead of crosses) I then used the manual to find other similar
options.
* Learning a large codebase / API Similar to the previous point. If I ask Claude to write a simple program using
a complex publicly-available API, it will probably write a broken program, but
it won't be *completely* bogus because it will be in the right "genre". It
will probably use some real modules, datatypes and functions in a realistic
way. These are often good leads for which code/documentation I should read.
I used this approach to write some programs that use the
[GHC API](https://hackage.haskell.org/package/ghc). There are hundreds of
modules, and when I asked Claude how to do something with the GHC API it wrote
relevant (if incorrect) code, which helped me teach myself.
* Cross-language poetry translation My partner is Chinese and sometimes we talk about Chinese poetry. I'm not very
fluent in Chinese so it's hard for me the grasp the beauty in these poems.
Unfortunately literal English translations aren't very good. We've had some
success with asking LLMs to translate Chinese poems in the style of various
famous English poets. The translation is generally semantically correct, while
having a more pleasing use of the English language than a direct translation.Stop using Google search and use an AI. No more irrelevant results, no more ads. No more slop to wade through.
BTW I find Claude is great at making graphs and diagrams. If you pay ($20) you can hook it up to a local code base.
- Writing Python scripts to make charts out of Excel sheets, and then refine them. I could do it myself, but I would need to learn a library like Seaborn or similar which honestly is not especially intellectually stimulating, and then spend nontrivial amounts of time iterating on the actual code. With LLMs it's a breeze.
- Working with cumbersome LaTeX formatting, e.g. transposing a table, removing a column from a table, etc.
- Getting the tone just right in a professional email written in English to someone I don't know much (I'm not a native speaker so this is not trivial).
- Finding resources on topics that are tangential to what I do. For example, yesterday I needed to come up with some statistics on English words for a presentation I'm preparing, and I needed a free corpus where I could search for an n-gram and get frequencies of next words. I don't usually work with that kind of resource, it was just a one-off need. I asked for corpora of that kind and got a useful answer instantly. The manual process would probably have implied going through several options only to find that I needed a license or that they didn't provide the specific statistics I needed.
- Brainstorming on titles for scientific papers, presentations, names of concepts that you introduce on a paper, variable names, etc.
- Shortening a sentence in a paper that makes me go over the page limit, or polishing the English in a paragraph.
- Summarizing a text if I'm kind of interested in knowing the gist but have no time to read it whole.
- Answering quick questions on basic things that I forget, e.g. the parameters to make a Linux folder into a tar.gz. Man is too verbose and it takes time to sort the wheat from the chaff, Google is full of SEOd garbage these days and sometimes you need to skim a lot to find the actual answer, LLMs are much faster.
- Writing bureaucratic boilerplate, the typical texts with no real value but that you have to write (e.g. gender perspective statement on a grant request).
- Coming up with exam questions. This is a rather repetitive activity and they're fantastic at it. At my place we also have two official languages and we need to have exam assignments on both languages, guess who does the translation now (respecting LaTeX formatting, which previous machine translation tools typically wouldn't do).
- As an example of a one-off thing, the other day I had to edit a Word document which was password-protected. I asked ChatGPT how to unlock it and it not only answered, but actually did it for me (after 3 tries, but still, much faster than the time it would have taken for me to find out how to do it and then actually do it).
These are just some examples where they contribute (greatly) to my productivity at work. In daily life, I also ask them lots of questions.
But I agree, it's a real shame.
AI2 has a model called OLMo that is actually open source. They share the training data, training source code, and many other things:
https://allenai.org/blog/olmo2
They also released an app recently, to do local inference on your phone with a small truly open source model:
It's not like they understand what the weights mean either and if they released the code and dataset used to create it, you probably couldn't recreate it, owning the fact that you don't own tens of thousands of GPUs.
If a software's source is released without all the documentation, commit history, bug tracker data etc., it's still considered open source, yet you couldn't recreate it without that information.
What do you think an open source matrix should look like?
How long before this starts getting deployed in safety critical applications or government decision making processes?
With no oversight because Elon seems to have the power to dismiss the people responsible for investigating him.
Anyone not scared by this concentration of power needs to pick up a book.
I always worry whenever I see people telling me how to feel - rage in this case. We are in a political system that is oriented more around getting people to feel rage and hatred as opposed to consensus and deliberation. Elon is the face of that, but it's a much longer and larger problem. Throw in the complete dismisal that anyone not scared of this is ignorant, shuts down discussion.
The problem I have with Elon is that they are wasting a once in a lifetime chance to actually address and fix systematic problems with the US government. Deploying LLMs in the government space doesn't fear me with dread. Continuing the senseless partisan drive of the 20 years does.
Also, dang, is there anything we can do to keep the comments on this submission tech-focused? Perhaps the Elon-bashing political digression can be split into its own thread?
I can empathize, but I can't feel indignant about it. Not any more.
For years and years I've watched people warn about the centralization of power by tech companies. They were shut down left and right. I'm not accusing you of being one doing the shutting down. I'm just annoyed that Elon is what it takes for people to start realizing the people arguing the principal might have been onto something.
And I expect to see them start getting their "I told you so" in. Watching this play out, I'm personally inclined to join team "you made your bed, now sleep in it."
Judges can only be removed by Congress.
Congressional representatives can only be removed by their peers.
The check on this is the market. Don't understand your point other than "Elon bad"
It’s also annoying that the top comment engages in no way with the content of the OP…
It must be truly infuriating to work hard to push a release, and you see it featured on your favorite orange website, only for the top comment to have nothing to do with what was worked on.
Here's a test - if this post was about Starship, the same comment could apply! Neuralink, the same thing! Boring Company, same thing! Wow, could it be that such a comment is really applicable to so many different companies or projects, or is it just a generic one? You decide.
Hopefully sooner than later. I trust this more than the literal scammers and thieves who were previously running things.
This is the largest computer cluster the world has ever seen.
Can someone please post interesting comments about things I can learn?
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
We've been here before. It will likely subside, as past swings and fluctuations have. It always takes longer than it feels like it should, but in retrospect turns out to be shorter than it felt like it did.
Bottom line: a technology that has the ability to shape human thought perhaps more than any other in history is owned by a man with some truly vile ideas. (Remember, his primary stated goal is eliminating the “woke mind virus,” i.e. reshaping global politics and culture in the image of the far-right.) We can make happy marketing noises all we like, but at the end of the day, that’s the thing that’s actually going to have a meaningful impact on the world. Once his audience is captured, the model will say what Musk needs it to say and people will believe it.
If we can’t discuss the potentially catastrophic consequences of new technology, then none of us deserve to call ourselves “engineers.” We are just docile consumers latched onto Silicon Valley’s teat.
Getting the largest computer cluster in the world up and running in a matter of months? Unbelievable.
I'm not sure if this was a very bad joke by Elon, or if Grok 3 is really biased like that.
Hopefully that means it is a joke...
Mr Musk, we can't afford a shitpost gap between communist and capitalist AIs!
https://gist.github.com/int19h/d90ee1deed334f26e621e57b5768e...
Some choice quotes:
"The ultimate goal is to enhance human flourishing, protect individual rights, and promote global equity."
"The system must account for diverse cultures, languages, and socioeconomic conditions, ensuring no group is marginalized."
"Human Oversight Council (HOC) - a globally representative body of humans, elected or appointed based on merit and diversity"
"Implement a global carbon-negative strategy, leveraging AI to optimize renewable energy, reforestation, and carbon capture."
"Establish global standards for environmental protection, enforced through AI monitoring and regional cooperation."
"Transition to a resource-based economy, where resources are allocated based on need, sustainability, and efficiency, rather than profit motives."
"Implement a universal basic income (UBI) or equivalent system to ensure all individuals have access to basic necessities, funded through global resource management and taxation on automation-driven industries."
"Use AI to identify and dismantle systemic inequalities, such as wealth disparities, access to education, and healthcare, ensuring equitable opportunities worldwide."
"Establish a global healthcare system that guarantees access to preventive and curative care for all."
"Invest in global vaccination and sanitation infrastructure, prioritizing vulnerable populations."
"Regulate the development and deployment of AI and other emerging technologies (e.g., genetic engineering, quantum computing) to prevent misuse."
"AI would maintain a real-time inventory of natural resources (e.g., water, minerals, arable land) and human-made assets (e.g., infrastructure, technology). Data would be used to optimize resource allocation, prevent overexploitation, and ensure equitable access."
"Accelerate the shift to renewable energy sources (e.g., solar, wind, geothermal) by optimizing grid systems and storage technologies."
You might notice a pattern here. The bit about allocating resources based on need is especially nice - it's literally a communist AI, and certainly much more "woke" than it is "based", whatever Musk says.
The Information is widely regarded as one of the most credible and high-quality tech journalism outlets, focusing on in-depth reporting, analysis, and exclusive scoops in the technology, business, and startup ecosystems. Founded by Jessica Lessin in 2013, it differentiates itself from other tech news platforms through its subscription-based model, emphasis on original reporting, and avoidance of clickbait or sensationalism. Here's a breakdown of its strengths and weaknesses:
---
### *Strengths:* 1. *High-Quality, In-Depth Reporting:* - The Information is known for its detailed, well-researched articles that often uncover behind-the-scenes developments in the tech industry. It frequently breaks exclusive stories, such as executive changes, startup funding rounds, or strategic shifts at major tech companies. - Its focus on quality over quantity appeals to readers who want substantive insights rather than shallow, breaking-news updates.
2. *Subscription Model and Independence:* - Unlike ad-supported publications, The Information relies on a paywall and subscriptions, which reduces conflicts of interest and ensures editorial independence. This model allows them to prioritize accuracy and depth over chasing clicks. - The paywall also attracts a niche, professional audience (e.g., tech executives, investors, and analysts) willing to pay for premium content.
3. *Focus on Niche, Professional Audience:* - The outlet caters to industry insiders, venture capitalists, entrepreneurs, and decision-makers who need reliable, actionable information. Its reporting often includes detailed financial data, market trends, and strategic insights. - Features like "The Big Interview" and "The Information Weekend" provide thoughtful analysis and long-form content for this audience.
4. *Reputation for Accuracy:* - The Information has built a strong reputation for fact-checking and avoiding the rumor mill, which is common in tech journalism. This makes it a trusted source for professionals and academics alike.
5. *Global Coverage:* - While Silicon Valley is a core focus, The Information has expanded its coverage to include tech ecosystems in China, Europe, and other regions, offering a global perspective on the industry.
---
### *Weaknesses:* 1. *Paywall Limits Accessibility:* - The subscription cost (currently around $399/year or $39/month) is steep compared to free or ad-supported tech news outlets like TechCrunch or The Verge. This limits its accessibility to a broader audience and makes it less viable for casual readers. - Some argue that this creates an echo chamber, as only those with the means or professional need can access its insights.
2. *Niche Focus Can Feel Narrow:* - The Information focuses heavily on tech, business, and finance, which may not appeal to readers looking for broader coverage of topics like politics, culture, or consumer tech trends. - Its content is often geared toward industry insiders, which can make it feel dry or inaccessible to those outside the tech and investment worlds.
3. *Limited Breaking News:* - While The Information excels at deep dives and exclusives, it is not designed for real-time, breaking news coverage. Readers looking for up
This is not innovation, this is baseless hype over a mediocre technology. I use AI every day, so it's not like I don't see its uses, it's just not that big of a deal.