Skip to content

Top New Best Ask Show Jobs

My failed attempt at AGI on the Tokio Runtime | Better HN

My failed attempt at AGI on the Tokio Runtime (opens in new tab)

(christo.sh)

111 pointsopenquery1y ago40 comments

40 comments

I wish more people would just try to do things just like this and blog about their failures.

> The published version of a proof is always condensed. And even if you take all the math that has been published in the history of mankind, it’s still small compared to what these models are trained on.

> And people only publish the success stories. The data that are really precious are from when someone tries something, and it doesn’t quite work, but they know how to fix it. But they only publish the successful thing, not the process.

- Terence Tao (https://www.scientificamerican.com/article/ai-will-become-ma...)

Personally, I think failures on their own are valuable. Others can come in and branch off from a decision you made that instead leads to success. Maybe the idea can be applied to a different domain. Maybe your failure clarified something for someone.

openqueryOP1y ago

Thank you for saying this. I agree which is why I wrote this up.

mindcrime1y ago

I wish more people would just try to do things just like this and blog about their failures.

Came here to say the same thing. Actually, I guess I did say the same thing, just in a much more long-winded form. Needless to say, I concur with you 100%.

> The only hope I have is to try something completely novel

I don’t think this is true. Neural networks were not completely novel when they started to work. Someone just used a novel piece — the gpu. Whatever the next thing is, it will probably be a remix of preexisting components.

openqueryOP1y ago

Right. Ironically I chose a model that was around in the 1970s without knowing it.

My point was more a game-theoretic one. There is just no chance I would beat the frontier labs if I tried the same things with less compute and less people. (Of course there is almost 0 chance I would beat them at all.)

namero9991y ago

Isn't this self-refuting? From the article:

> Assume you are racing a Formula 1 car. You are in last place. You are a worse driver in a worse car. If you follow the same strategy as the cars in front of you, pit at the same time and choose the same tires, you will certainly lose. The only chance you have is to pick a different strategy.

So why model brains and neurons at all? You are outgunned by at least 300.000 thousand years of evolution and 117 billion training sessions.

andrewflnr1y ago

Because bio brains aren't even in the same race.

skeledrew1y ago

Interesting. I started a somewhat conceptually similar project several months ago. For me though, the main motivation is that I think there's something fundamentally wrong with the current method of using matrix math for weight calculation and representation. I'm taking the approach that the very core of how neurons work is inherently binary, and should remain that way. My basic thesis is that it should reduce computational requirements, and lead to something more generic. So I set out to build something that takes an array of booleans (the upstream neurons either fired or didn't fire at a particular time sequence), and gives a single boolean calculated with a customizable activator function.

Project is currently on ice as after I created something that builds a network of layers, but ran into a wall figuring out how to have that network wire itself over time and become representative of whatever it's learned. I'll take some time and go through this, see what it may spark and try to start working on mine again.

openqueryOP1y ago

Nice. Interested to see where this leads.

The network in the article doesn't have explicit layers. It's a graph which is initialised with a completely random connectivity matrix. The inputs and outputs are also wired randomly in the beginning (an input could be connected to a neuron which is also connected to an output for example, or the input could be connected to a neuron which has no post-synaptic neurons).

It was the job of the optimisation algorithm to figure out the graph topology over training.

RaftPeople1y ago

I did a similar project previously and had what I considered "good" results (creatures that did effectively control their bodies to get food) but not the kind of advanced brains I had naively hoped for.

The networks were really configurable (number of layers, number of "sections" within a layer (section=semi-independent chunk), number of neurons, synapses, types of neurons, type of synapses, amount of recurrence, etc.), but I tended to steer the GA stuff in directions that I saw tended to work, these were some of my findings:

1-Feed forward tended to work better than heavily recurrent. Many times I would see a little recurrence in the best brains, but that might have been because due to percentages it was difficult to get a brain that didn't have any of it.

2-The best brains tended to have between 6 and 10 layers, and the middle layers tended to be small like information was being consolidated before fanning out to the motor control neurons.

3-Activation functions: I let it randomly choose per neuron or per section of layer, or per layer or per brain, etc. I was surprised that binary step frequently won out compared to things like sigmoid or others.

openqueryOP1y ago

Were the brains event-driven? How did you implement the GA? What did individual genes encode?

mindcrime1y ago

I haven't read the comments here yet, but I'm predicting there will be at least a few of the form "why would you bother doing this, you aren't an expert in AI, this is stupid, leave AGI to the experts, why would you think this could possibly work" etc. I hope not, but this being HN, history suggests those people will be out en-force.

I hope not. I think this is GREAT work even if the result was ultimately less than what was desired. And I want to encourage the author, and other people who might make similar attempts. I think we need more people "taking a stab" and trying different ideas. You might or might not succeed, but in almost every case the absolute worst scenario is that you learn something that might be useful later. If taking on something like this motivates someone to spend time studying differential equations, then I say "great!" Or if it motivates someone to study neuroscience, or electronics (maybe somebody decides to try realizing a neural network in purpose built hardware, for example) then also "Great!" Do it.

About the only serious negative (aside from allusions to opportunity cost) that I can see for making an effort like this, would be if somebody gets really deep in it and winds up blowing a shit-ton of money on the project, whether that be for cloud compute cycles, custom hardware, or whatever. I wouldn't necessarily recommend maxing out your credit cards and draining your retirement account unless you have VERY solid evidence that you're on the right path!

You are a worse driver in a worse car. If you follow the same strategy as the cars in front of you, pit at the same time and choose the same tires, you will certainly lose. The only chance you have is to pick a different strategy.

Yes, exactly. I adhere to a similar mindset. I do AI research in my spare time. And I cannot possibly afford to spend the kind of money on training ginormous ANN's that OpenAI, Microsoft, Google, Twitter, Meta, IBM, etc. can spend. To even try would be completely ludicrous. There is simply no path where an independent solo researcher can beat those guys playing that game. So the only recourse is to change the rules and play a different game. That's no guarantee of success of course, but I'll take a tiny, even minuscule, chance of achieving something over just ramming my head into the wall over and over again in some Sisyphean attempt to compete head to head in a game I know a priori that I simply cannot win.

Anyway.. to the OP: great work, and thanks for sharing. And I hope you decide to make other attempts in the future, and share those results as well. Likewise to anybody else who has considered trying something like this.

openqueryOP1y ago

Thank you. I hope you write up findings from your own research.

To your point, I wasn't pretending that this work is novel or something that the AI community should take seriously. If anything, my point was that you can just do things.

I also feel like in the SWE community folks are generally concerned that LLMs are getting considerably better at doing our jobs. This was a poetic attempt at trying to regain some agency and not just let life happen _to_ you.

mindcrime1y ago

> I hope you write up findings from your own research.

If and when I achieve anything useful I definitely will. Writing up failed experiments? Yes in principle, per the above. Finding time is probably the biggest challenge. The intention is definitely there though.

But even aside from that, bits and pieces of what I'm working on at any given time dribble out via my participation in various oline forums, Github discussions[1] and posts here, on LinkedIn, etc.

> If anything, my point was that you can just do things.

Yep, yep. Absolutely. Again, even if the outcome isn't some earth shattering new discovery, you've still gained something from the process (in all likelihood).

> This was a poetic attempt at trying to regain some agency and not just let life happen _to_ you.

Well said. That very much echoes a lot of my own philosophy on life. Just do something at least. To me, I'd rather fail trying to do something, than just give up and do nothing.

[1]: https://github.com/jason-lang/jason/discussions

Onavo1y ago

> Ok how the hell do we train this thing? Stochastic gradient descent with back-propagation won't work here (or if it does I have no idea how to implement it).

What's wrong with gradient descent?

https://snntorch.readthedocs.io/en/latest/

openqueryOP1y ago

Thanks for sharing. I thought the discontinuous nature of the SNN made it non-differentiable and therefore unsuitable for SGD and backprop.

Onavo1y ago

Lol in differentiable programming they usually hard code an identity for the problematic parts (e.g. if statements)

Gradient descent needs a differentiable system, the author's clearly not.

whatever11y ago

Try fewer neurons and solve it to global optimality with gurobi. This way you will know if the optimization step was your bottleneck.

cglan1y ago

I’ve thought of something like this for a while, I’m very interested in where this goes.

A highly async actor model is something I’ve wanted to explore, and combined with a highly multi core architecture but clocked very very low, it seems like it could be power efficient too.

I was considering using go + channels for this

jerf1y ago

The idea has kicked around in hardware for a number of years, such as: https://www.greenarraychips.com/home/about/index.php

I think the problem isn't that it's a "bad idea" in some intrinsic sense, but that you really have to have a problem that it fits like a glove. By the nature of the math, if you can only use 4 of your 128 cores 50% of the time, your performance just tanks no matter how fast you're going the other 50% of the time.

Contra the occasional "Everyone Else Is Stupid And We Just Need To Get Off Of von Neumann Architectures To Reach Nirvana" post, CPUs are shaped the way they are for a reason; being able to bring very highly concentrated power to bear on a specific problem is very flexible, especially when you can move the focus around very quickly as a CPU can. (Not instantaneously, but quickly, and this switching penalty is something that can be engineered around.) A lot of the rest of the problem space has been eaten by GPUs. This sort of "lots of low powered computers networked together" still fits in between them somewhat, but there's not a lot of space left anymore. They can communicate better in some ways than GPU cores can communicate with each other, but that is also a problem that can be engineered around.

If you squint really hard, it's possible that computers are sort of wandering in this direction, though. Being low power means it's also low-heat. Putting "efficiency cores" on to CPU dies is sort of, kind of starting down a road that could end up at the greenarray idea. Still, it's hard to imagine what even all of the Windows OS would do with 128 efficiency cores. Maybe if someone comes up with a brilliant innovation on current AI architectures that requires some sort of additional cross-talk between the neural layers that simply requires this sort of architecture to work you could see this pop up... which I suppose brings us back around to the original idea. But it's hard to imagine what that architecture could be, where the communication is vital on a nanosecond-by-nanosecond level and can't just be a separate phase of processing a neural net.

openqueryOP1y ago

> By the nature of the math, if you can only use 4 of your 128 cores 50% of the time, your performance just tanks no matter how fast you're going the other 50% of the time.

I'm not sure I understand this point. If you're using a work-stealing threadpool servicing tasks in your actor model there's no reason you shouldn't get ~100% CPU utilisation provided you are driving the input hard enough (i.e. sampling often from your inputs).

jerf1y ago

To work steal, you must have work to steal. If you always have work to steal, you have a CPU problem, not a CPU fabric problem. CPU fabrics are good for when you have some sort of task that is sort of parallel, but also somehow requires a lot of cross-talk between the tasks, preferably of a very regular and predictable nature, e.g., not randomly blasting messages of very irregular sizes like one might see in a web-based system, but a very regular "I'm going to need exactly 16KB per frame from each of my surrounding 4 CPUs every 25ms". You would think of using a GPU on a modern computer because you can use all the little CPUs in a GPU, but the GPU won't do well because those GPU CPUs can't communicate like that. GPUs obtain their power by forbidding communication within cells except through very stereotyped patterns.

If you have all that, and you have it all the time, you can win on these fabrics.

The problem is, this doesn't describe very many problems. There's a lot of problems that may sort of look like this, but have steps where the problem has to be unpacked and dispatched, or the information has to be rejoined, or just in general there's other parts of the process that are limited to a single CPU somehow, and then Amdahl's Law murders your performance advantage over conventional CPUs. If you can't keep these things firing on all cylinders basically all the time, you very quickly end up back in a regime where conventional CPUs are more appropriate. It's really hard to feed a hundred threads of anything in a rigidly consistent way, whereas "tasks more or less randomly pile up and we dispatch our CPUs to those tasks with a scheduler" is fairly easy, and very useful.

openqueryOP1y ago

Give it a shot. It isn't much code.

If you want to look at more serious work the Spiking Neural Net community has made models which actually work and are power efficient.

grupthink1y ago

I've implemented something similar to this using Golang Channels during Covid lockdowns. You don't get an emergence of intelligence from throwing random spaghetti at a wall.

Ask me how I know.

This. I love everything about this. Everything. Thank you. This is the kind of sh!t that made me get into programming in the first place.

Regarding specific reading, three books I think you would love are [1] the self assembling brain, [2] the archaelogy of the mind, and [3] evolutionary optimization algorithms.

People can talk whatever sh!t they want but this pushed us closer to actual AGI than anything this (useful but) deadend LLM craze is pushing us towards, and towards which you thoughtfully made an effort.

The most basic function of learning and intelligence is habituation to stimuli, which even an ameoba can handle but not a single LLM does.

Thanks again for this.

[1]: https://a.co/d/4TG1ZvP

[2]: https://a.co/d/aYReWjs

[3]: https://a.co/d/1cod8Bq

openqueryOP1y ago

Thanks for the kind words and the recommendations. Ordered!

Archaelogy of the mind is an enjoyable marathon but a marathon nonetheless. Audiobook therefore highly recommended, as well. If you get past the "seeking" system section it will probably reframe your entire view of consciousness, intelligence, and what it means to be human. You will cringe significantly harder at the concept of LLMs becoming AGI once you get past the part on "primary affective processes". Self-assembling brain will help reconcile those conversations. And then evolutionary algos will help you build that brain out! Keep posting!!

The author could first reproduce models and results from papers before trying to extend that work. Starting with something working helps.

skeledrew1y ago

I've found that many times new approaches are discovered by not having the bias created by knowledge of existing research. Knowing what others have done can sometimes lead to the abandoning of what could actually be a novel path, or unwittingly guide one too far down already discovered paths.

Of course, the knowledge may also sometimes be helpful. So ideally it's good that different persons tackle the problems with knowledge of different amounts and parts of existing work.

alecst1y ago

Love the drawings. Kind of a silly question, but how did you do them?

openqueryOP1y ago

Excalidraw[0] and a mouse and a few failed attempts :)

[0] https://excalidraw.com/

andsoitis1y ago

If you’re looking for a neuroscience approach, check out Numenta https://www.numenta.com/

RaftPeople1y ago

Has HTM had any good results?

I followed them for a long time, but I really never heard of anything where they were beating other approaches.

fitzn1y ago

Thank you very much for writing this up. Good, thought-provoking ideas here.

oksurewhynot1y ago

Damn AGI got hands

robblbobbl1y ago

Finally singularity confirmed, thanks.

dudeinjapan1y ago

The greatest trick AGI ever pulled was convincing the world it didn't exist.

homarp1y ago

using https://news.ycombinator.com/item?id=42324444 you could make a better joke

Also I was wondering about the source of the original quote, https://quoteinvestigator.com/2018/03/20/devil/

j / k navigate · click thread line to collapse