Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
I almost guarantee there is no way they can read this blogpost, escalate it internally, get the appropriate approval to the work item, actually work on the fix, get it through QA and get it live in production in 3 days. That would only happen on really critical issues, and this is definitely not critical enough for that.
I don't think that fix is specific to this, but it's absolutely true that MLX is trying to lever every advantage it can find on specific hardware, so it's possible it made a bad choice on a particular device.
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
Edit: Spoiler -
It's 'Eclipse'
Eclipse, obviously.
I'll just add that if you think this advice applies to you, it's the - https://en.wikipedia.org/wiki/Barnum_effect
"Monsoon," says ChatGPT.
It’s a reasonable Tarot question.
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
It is commutative (except for NaN). It isn't associative though.
There's a C++26 paper about compile time math optimizations with a good overview and discussion about some of these issues [P1383]. The paper explicitly states:
1. It is acceptable for evaluation of mathematical functions to differ between translation time and runtime.
2. It is acceptable for constant evaluation of mathematical functions to differ between platforms.
So C++ has very much accepted the fact that floating point functions should not be presumed to give identical results in all circumstances.
Now, it is of course possible to ensure that floating point-related functions give identical results on all your target machines, but it's usually not worth the hassle.
[P1383]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p13...
a * b = b * a for all "normal" floating point numbers.
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
[1] as the author knows (“MLX uses Metal to compile tensor operations for this accelerator. Somewhere in that stack, the computations are going very wrong”) there’s lots of soft- and firmware in-between the code being run and the hardware of the neural engine. The issue might well be somewhere in those.
The best way to do math on my phone I know of is the HP Prime emulator.
https://pcalc.com/mac/thirty.html
My other favorite calculator is free42, or its larger display version plus42
https://thomasokken.com/plus42/
For a CAS tool on a pocket mobile device, I haven't found anything better than MathStudio (formerly SpaceTime):
You can run that in your web browser, but they maintain a mobile app version. It's like a self-hosted Wolfram Alpha.
They do have some new AI math app that's regularly updated
Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
Also it does some level of symbolic evaluation: sin^-1(cos^-1(tan^-1(tan(cos(sin(9))))))== 9, which is a better result than many standalone calculators.
Also it has a library of built in unit conversations, including live updating currency conversions. You won’t see that on a TI-89!
And I just discovered it actually has a built in 2D/3D graphing ability. Now the question is it allows parametric graphing like the MacOS one…
All that said, obviously the TI-8X family hold a special place in my heart as TI-BASIC was my first language. I just don’t see a reason to use one any more day to day.
I use the NumWorks emulator app whenever I need something more advanced. It's pretty good https://www.numworks.com/simulator/
It is astonishing how often ANE is smeared on here, largely by people who seem to have literally zero idea what they're talking about. It's often pushed by either/or people who bizarrely need to wave a flag.
MLX doesn't use ANE for the single and only reason that Apple hid the ANE behind CoreML, exposing zero public APIs to utilize ANE directly, and MLX -- being basically an experimental grounds -- wanted to hand roll their implementation around the GPU / CPU. They literally, directly state this as the reason. People inventing technical reasons for why MLX doesn't use ANE are basically just manufacturing a fan fiction. This isn't to say that ANE would be suitable for a lot of MLX tasks, and it is a highly optimized, power-efficient inference hardware that doesn't work for a lot of purposes, but its exclusion is not due to technically unsuitability.
Further, the ANE on both my Mac and my iPhone is constantly attenuating and improving my experience. Little stuff like extracting contents from images. Ever browse in Safari and notice that you can highlight text in the image almost instantly after loading a page? Every image, context and features detected effortlessly. Zero fans cycling up. Power usage at a trickle. It just works. It's the same way that when I take a photo I can search "Maine Coon" and get pictures of my cats, ANE used for subject and feature extraction. Computational photography massively leverages the ANE.
At a trickle of power.
Scam? Yeah, I like my battery lasting for more than a couple of minutes.
Apple intended ANE to bring their own NN augmentations to the OS and thus the user experience, and even the availability in CoreML as a runtime engine is more limited than what Apple's own software can do. Apple basically limits the runtime usage to ensure that no third party apps inhibit or restrict Apple's own use of this hardware.
Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
https://news.ycombinator.com/item?id=46232528 ("iPhone Typos? It's Not Just You - The iOS Keyboard is Broken")
At least the machine didn't say it was seven!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
> - MiniMax can't fit on an iPhone.
They asked MiniMax on their computer to make an iPhone app that didn't work.
It didn't work using the Apple Intelligence API. So then:
* They asked Minimax to use MLX instead. It didn't work.
* They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
> Better to dig in a bit more.
The author already did 100% of the digging and then some.
Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.
EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained it wasn't minimax! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.
nothing to see here.