> The published version of a proof is always condensed. And even if you take all the math that has been published in the history of mankind, it’s still small compared to what these models are trained on.
> And people only publish the success stories. The data that are really precious are from when someone tries something, and it doesn’t quite work, but they know how to fix it. But they only publish the successful thing, not the process.
- Terence Tao (https://www.scientificamerican.com/article/ai-will-become-ma...)
Personally, I think failures on their own are valuable. Others can come in and branch off from a decision you made that instead leads to success. Maybe the idea can be applied to a different domain. Maybe your failure clarified something for someone.
Came here to say the same thing. Actually, I guess I did say the same thing, just in a much more long-winded form. Needless to say, I concur with you 100%.
I don’t think this is true. Neural networks were not completely novel when they started to work. Someone just used a novel piece — the gpu. Whatever the next thing is, it will probably be a remix of preexisting components.
My point was more a game-theoretic one. There is just no chance I would beat the frontier labs if I tried the same things with less compute and less people. (Of course there is almost 0 chance I would beat them at all.)
> Assume you are racing a Formula 1 car. You are in last place. You are a worse driver in a worse car. If you follow the same strategy as the cars in front of you, pit at the same time and choose the same tires, you will certainly lose. The only chance you have is to pick a different strategy.
So why model brains and neurons at all? You are outgunned by at least 300.000 thousand years of evolution and 117 billion training sessions.
Project is currently on ice as after I created something that builds a network of layers, but ran into a wall figuring out how to have that network wire itself over time and become representative of whatever it's learned. I'll take some time and go through this, see what it may spark and try to start working on mine again.
The network in the article doesn't have explicit layers. It's a graph which is initialised with a completely random connectivity matrix. The inputs and outputs are also wired randomly in the beginning (an input could be connected to a neuron which is also connected to an output for example, or the input could be connected to a neuron which has no post-synaptic neurons).
It was the job of the optimisation algorithm to figure out the graph topology over training.
The networks were really configurable (number of layers, number of "sections" within a layer (section=semi-independent chunk), number of neurons, synapses, types of neurons, type of synapses, amount of recurrence, etc.), but I tended to steer the GA stuff in directions that I saw tended to work, these were some of my findings:
1-Feed forward tended to work better than heavily recurrent. Many times I would see a little recurrence in the best brains, but that might have been because due to percentages it was difficult to get a brain that didn't have any of it.
2-The best brains tended to have between 6 and 10 layers, and the middle layers tended to be small like information was being consolidated before fanning out to the motor control neurons.
3-Activation functions: I let it randomly choose per neuron or per section of layer, or per layer or per brain, etc. I was surprised that binary step frequently won out compared to things like sigmoid or others.
I hope not. I think this is GREAT work even if the result was ultimately less than what was desired. And I want to encourage the author, and other people who might make similar attempts. I think we need more people "taking a stab" and trying different ideas. You might or might not succeed, but in almost every case the absolute worst scenario is that you learn something that might be useful later. If taking on something like this motivates someone to spend time studying differential equations, then I say "great!" Or if it motivates someone to study neuroscience, or electronics (maybe somebody decides to try realizing a neural network in purpose built hardware, for example) then also "Great!" Do it.
About the only serious negative (aside from allusions to opportunity cost) that I can see for making an effort like this, would be if somebody gets really deep in it and winds up blowing a shit-ton of money on the project, whether that be for cloud compute cycles, custom hardware, or whatever. I wouldn't necessarily recommend maxing out your credit cards and draining your retirement account unless you have VERY solid evidence that you're on the right path!
You are a worse driver in a worse car. If you follow the same strategy as the cars in front of you, pit at the same time and choose the same tires, you will certainly lose. The only chance you have is to pick a different strategy.
Yes, exactly. I adhere to a similar mindset. I do AI research in my spare time. And I cannot possibly afford to spend the kind of money on training ginormous ANN's that OpenAI, Microsoft, Google, Twitter, Meta, IBM, etc. can spend. To even try would be completely ludicrous. There is simply no path where an independent solo researcher can beat those guys playing that game. So the only recourse is to change the rules and play a different game. That's no guarantee of success of course, but I'll take a tiny, even minuscule, chance of achieving something over just ramming my head into the wall over and over again in some Sisyphean attempt to compete head to head in a game I know a priori that I simply cannot win.
Anyway.. to the OP: great work, and thanks for sharing. And I hope you decide to make other attempts in the future, and share those results as well. Likewise to anybody else who has considered trying something like this.
To your point, I wasn't pretending that this work is novel or something that the AI community should take seriously. If anything, my point was that you can just do things.
I also feel like in the SWE community folks are generally concerned that LLMs are getting considerably better at doing our jobs. This was a poetic attempt at trying to regain some agency and not just let life happen _to_ you.
If and when I achieve anything useful I definitely will. Writing up failed experiments? Yes in principle, per the above. Finding time is probably the biggest challenge. The intention is definitely there though.
But even aside from that, bits and pieces of what I'm working on at any given time dribble out via my participation in various oline forums, Github discussions[1] and posts here, on LinkedIn, etc.
> If anything, my point was that you can just do things.
Yep, yep. Absolutely. Again, even if the outcome isn't some earth shattering new discovery, you've still gained something from the process (in all likelihood).
> This was a poetic attempt at trying to regain some agency and not just let life happen _to_ you.
Well said. That very much echoes a lot of my own philosophy on life. Just do something at least. To me, I'd rather fail trying to do something, than just give up and do nothing.
What's wrong with gradient descent?
A highly async actor model is something I’ve wanted to explore, and combined with a highly multi core architecture but clocked very very low, it seems like it could be power efficient too.
I was considering using go + channels for this
I think the problem isn't that it's a "bad idea" in some intrinsic sense, but that you really have to have a problem that it fits like a glove. By the nature of the math, if you can only use 4 of your 128 cores 50% of the time, your performance just tanks no matter how fast you're going the other 50% of the time.
Contra the occasional "Everyone Else Is Stupid And We Just Need To Get Off Of von Neumann Architectures To Reach Nirvana" post, CPUs are shaped the way they are for a reason; being able to bring very highly concentrated power to bear on a specific problem is very flexible, especially when you can move the focus around very quickly as a CPU can. (Not instantaneously, but quickly, and this switching penalty is something that can be engineered around.) A lot of the rest of the problem space has been eaten by GPUs. This sort of "lots of low powered computers networked together" still fits in between them somewhat, but there's not a lot of space left anymore. They can communicate better in some ways than GPU cores can communicate with each other, but that is also a problem that can be engineered around.
If you squint really hard, it's possible that computers are sort of wandering in this direction, though. Being low power means it's also low-heat. Putting "efficiency cores" on to CPU dies is sort of, kind of starting down a road that could end up at the greenarray idea. Still, it's hard to imagine what even all of the Windows OS would do with 128 efficiency cores. Maybe if someone comes up with a brilliant innovation on current AI architectures that requires some sort of additional cross-talk between the neural layers that simply requires this sort of architecture to work you could see this pop up... which I suppose brings us back around to the original idea. But it's hard to imagine what that architecture could be, where the communication is vital on a nanosecond-by-nanosecond level and can't just be a separate phase of processing a neural net.
I'm not sure I understand this point. If you're using a work-stealing threadpool servicing tasks in your actor model there's no reason you shouldn't get ~100% CPU utilisation provided you are driving the input hard enough (i.e. sampling often from your inputs).
If you want to look at more serious work the Spiking Neural Net community has made models which actually work and are power efficient.
Ask me how I know.
Regarding specific reading, three books I think you would love are [1] the self assembling brain, [2] the archaelogy of the mind, and [3] evolutionary optimization algorithms.
People can talk whatever sh!t they want but this pushed us closer to actual AGI than anything this (useful but) deadend LLM craze is pushing us towards, and towards which you thoughtfully made an effort.
The most basic function of learning and intelligence is habituation to stimuli, which even an ameoba can handle but not a single LLM does.
Thanks again for this.
Of course, the knowledge may also sometimes be helpful. So ideally it's good that different persons tackle the problems with knowledge of different amounts and parts of existing work.
I followed them for a long time, but I really never heard of anything where they were beating other approaches.
Also I was wondering about the source of the original quote, https://quoteinvestigator.com/2018/03/20/devil/