macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt (opens in new tab)

(developer.apple.com)

540 pointsguiand5mo ago291 comments

291 comments

I follow the MLX team on Twitter and they sometimes post about using MLX on two or more joined together Macs to run models that need more than 512GB of RAM.

A couple of examples:

Kimi K2 Thinking (1 trillion parameters): https://x.com/awnihannun/status/1986601104130646266

DeepSeek R1 (671B): https://x.com/awnihannun/status/1881915166922863045 - that one came with setup instructions in a Gist: https://gist.github.com/awni/ec071fd27940698edd14a4191855bba...

awnihannun5mo ago

For a bit more context, those posts are using pipeline parallelism. For N machines put the first L/N layers on machine 1, next L/N layers on machine 2, etc. With pipeline parallelism you don't get a speedup over one machine - it just buys you the ability to use larger models than you can fit on a single machine.

The release in Tahoe 26.2 will enable us to do fast tensor parallelism in MLX. Each layer of the model is sharded across all machines. With this type of parallelism you can get close to N-times faster for N machines. The main challenge is latency since you have to do much more frequent communication.

dpe825mo ago

> The main challenge is latency since you have to do much more frequent communication.

Earlier this year I experimented with building a cluster to do tensor parallelism across large cache CPUs (AMD EPYC 7773X have 768mb of L3). My thought was to keep an entire model in SRAM and take advantage of the crazy memory bandwidth between CPU cores and their cache, and use Infiniband between nodes for the scatter/gather operations.

Turns out the sum of intra-core latency and PCIe latency absolutely dominate. The Infiniband fabric is damn fast once you get data to it, but getting it there quickly is a struggle. CXL would help but I didn't have the budget for newer hardware. Perhaps modern Apple hardware is better for this than x86 stuff.

wmf5mo ago

That's how Groq works. A cluster of LPUv2s would probably be faster and cheaper than an Infiniband cluster of Epycs.

2 more replies

aimanbenbaha5mo ago

Exo-Labs is an open source project that allows this too, pipeline parallelism I mean not the latter, and it's device agnostic meaning you can daisy-chain anything you have that has memory and the implementation will intelligently shard model layers across them, though its slow but scales linearly with concurrent requests.

Exo-Labs: https://github.com/exo-explore/exo

liuliu5mo ago

But that's only for prefilling right? Or is it beneficial for decoding too (I guess you can do KV lookup on shards, not sure how much speed-up that will be though).

zackangelo5mo ago

No you use tensor parallelism in both cases.

The way it typically works in an attention block is: smaller portions of the Q, K and V linear layers are assigned to each node and are processed independently. Attention, rope norm etc is run on the node-specific output of that. Then, when the output linear layer is applied an "all reduce" is computed which combines the output of all the nodes.

EDIT: just realized it wasn't clear -- this means that each node ends up holding a portion of the KV cache specific to its KV tensor shards. This can change based on the specific style of attention (e.g., in GQA where there are fewer KV heads than ranks you end up having to do some replication etc)

1 more reply

monster_truck5mo ago

Even if it wasn't outright beneficial for decoding by itself, it would still allow you to connect a second machine running a smaller, more heavily quantized version of the model for speculative decoding which can net you >4x without quality loss

anemll5mo ago

Tensor Parallel test with RDMA last week https://x.com/anemll/status/1996349871260107102

Note fast sync workaround

andy995mo ago

I’m hoping this isn’t as attractive as it sounds for non-hobbyists because the performance won’t scale well to parallel workloads or even context processing, where parallelism can be better used.

Hopefully this makes it really nice for people that want the experiment with LLMs and have a local model but means well funded companies won’t have any reason to grab them all vs GPUs.

api5mo ago

No way buying a bunch of minis could be as efficient as much denser GPU racks. You have to consider all the logistics and power draw, and high end nVidia stuff and probably even AMD stuff is faster than M series GPUs.

What this does offer is a good alternative to GPUs for smaller scale use and research. At small scale it’s probably competitive.

Apple wants to dominate the pro and serious amateur niches. Feels like they’re realizing that local LLMs and AI research is part of that, is the kind of thing end users would want big machines to do.

gumboshoes5mo ago

Exactly: The AI appliance market. A new kind of home or small-business server.

1 more reply

FuckButtons5mo ago

Power draw? A entire Mac Pro running flat out uses less power than 1 5090. If you have a workload that needs a huge memory footprint then the tco of the Macs, even with their markup may be lower.

codazoda5mo ago

I haven’t looked yet but I might be a candidate for something like this, maybe. I’m RAM constrained and, to a lesser extent, CPU constrained. It would be nice to offload some of that. That said, I don’t think I would buy a cluster of Macs for that. I’d probably buy a machine that can take a GPU.

ChrisMarshallNY5mo ago

I’m not particularly interested in training models, but it would be nice to have eGPUs again. When Apple Silicon came out, support for them dried up. I sold my old BlackMagic eGPU.

That said, the need for them also faded. The new chips have performance every bit as good as the eGPU-enhanced Intel chips.

1 more reply

willtemperley5mo ago

I think it’s going to be great for smaller shops that want on premise private cloud. I’m hoping this will be a win for in-memory analytics on macOS.

bigyabai5mo ago

The lack of official Linux/BSD support is enough to make it DOA for any serious large-scale deployment. Until Apple figures out what they're doing on that front, you've got nothing to worry about.

mjlee5mo ago

Why? AWS manages to do it (https://aws.amazon.com/ec2/instance-types/mac/). Smaller companies too - https://macstadium.com

Having used both professionally, once you understand how to drive Apple's MDM, Mac OS is as easy to sysadmin as Linux. I'll grant you it's a steep learning curve, but so is Linux/BSD if you're coming at it fresh.

In certain ways it's easier - if you buy a device through Apple Business you can have it so that you (or someone working in a remote location) can take it out of the shrink wrap, connect it to the internet, and get a configured and managed device automatically. No PXE boot, no disk imaging, no having it shipped to you to configure and ship out again. If you've done it properly the user can't interrupt/corrupt the process.

The only thing they're really missing is an iLo, I can imagine how AWS solved that, but I'd love to know.

1 more reply

Eggpants5mo ago

Not sure I understand, Mac OS is BSD based. https://en.wikipedia.org/wiki/Darwin_(operating_system)

2 more replies

CamperBob25mo ago

Almost the most impressive thing about that is the power consumption. ~50 watts for both of them? Am I reading it wrong?

wmf5mo ago

Yeah, two Mac Studios is going to be ~400 W.

CamperBob25mo ago

What am I missing? https://i.imgur.com/YpcnlCH.png

(Edit: interesting, thanks. So the underlying OS APIs that supply the power-consumption figures reported by asitop are just outright broken. The discrepancy is far too large to chalk up to static power losses or die-specific calibration factors that the video talks about.)

1 more reply

m-s-y5mo ago

Can confirm. My M3 Ultra tops out at 210W when ComfyUI or ollama is running flat out. Confirmed via smart plug.

btown5mo ago

It would be incredibly ironic if, with Apple's relatively stable supply chain relative to the chaos of the RAM market these days (projected to last for years), Apple compute became known as a cost-effective way to build medium-sized clusters for inference.

andy995mo ago

It’s gonna suck if all the good Macs get gobbled up by commercial users.

icedchai5mo ago

Outside of YouTube influencers, I doubt many home users are buying a 512G RAM Mac Studio.

FireBeyond5mo ago

I doubt many of them are, either.

When the 2019 Mac Pro came out, it was "amazing" how many still photography YouTubers all got launch day deliveries of the same BTO Mac Pro, with exactly the same spec:

18 core CPU, 384GB memory, Vega II Duo GPU and an 8TB SSD.

Or, more likely, Apple worked with them and made sure each of them had this Mac on launch day, while they waited for the model they actually ordered. Because they sure as hell didn't need an $18,000 computer for Lightroom.

1 more reply

DrStartup5mo ago

I'm neither and have 2. 24/7 async inference against github issues. Free. (once you buy the macs that is)

4 more replies

7e5mo ago

That product can still steal fab slots from cheaper, more prosumer products.

kridsdale15mo ago

I did. Admittedly it was for video processing at 8k which uses more than 128gb of ram, but I am NOT a YouTuber.

mirekrusin5mo ago

Of course they're not. Everybody is waiting for next generation that will run LLMs faster to start buying.

1 more reply

mschuster915mo ago

it's not like regular people can afford this kind of Apple machine anyway.

teeray5mo ago

It’s just depressing that the “PC in every home” era is being rapidly pulled out from under our feet by all these supply shocks.

2 more replies

teaearlgraycold5mo ago

It already is depending on your needs.

reilly30005mo ago

dang I wish I could share md tables.

Here’s a text edition: For $50k the inference hardware market forces a trade-off between capacity and throughput:

* Apple M3 Ultra Cluster ($50k): Maximizes capacity (3TB). It is the only option in this price class capable of running 3T+ parameter models (e.g., Kimi k2), albeit at low speeds (~15 t/s).

* NVIDIA RTX 6000 Workstation ($50k): Maximizes throughput (>80 t/s). It is superior for training and inference but is hard-capped at 384GB VRAM, restricting model size to <400B parameters.

To achieve both high capacity (3TB) and high throughput (>100 t/s) requires a ~$270,000 NVIDIA GH200 cluster and data center infrastructure. The Apple cluster provides 87% of that capacity for 18% of the cost.

mechagodzilla5mo ago

You can keep scaling down! I spent $2k on an old dual-socket xeon workstation with 768GB of RAM - I can run Deepseek-R1 at ~1-2 tokens/sec.

Weryj5mo ago

Just keep going! 2TB of swap disk for 0.0000001 t/sec

kergonath5mo ago

Hang on, starting benchmarks on my Raspberry Pi.

2 more replies

jacquesm5mo ago

I did the same, then put in 14 3090's. It's a little bit power hungry but fairly impressive performance wise. The hardest parts are power distribution and riser cards but I found good solutions for both.

r0b055mo ago

I think 14 3090's are more than a little power hungry!

1 more reply

tucnak5mo ago

You get occasional accounts of 3090 home-superscalers whereas they would put up eight, ten, fourteen cards. I normally attribute this to obsessive-compulsive behaviour. What kind of motherboard you ended up using and what's the bi-directional bandwidth you're seeing? Something tells me you're not using EPYC 9005's with up to 256x PCIe 5.0 lanes per socket or something... Also: I find it hard to believe the "performance" claims, when your rig is pulling 3 kW from the wall (assuming undervolting at 200W per card?) The electricity costs alone would surely make this intractable, i.e. the same as running six washing machines all at once.

1 more reply

ternus5mo ago

And if you get bored of that, you can flip the RAM for more than you spent on the whole system!

a0125mo ago

And heat the whole house in parallel

rpastuszak5mo ago

Nice! What do you use it for?

mechagodzilla5mo ago

1-2 tokens/sec is perfectly fine for 'asynchronous' queries, and the open-weight models are pretty close to frontier-quality (maybe a few months behind?). I frequently use it for a variety of research topics, doing feasibility studies for wacky ideas, some prototypy coding tasks. I usually give it a prompt and come back half an hour later to see the results (although the thinking traces are sufficiently entertaining that sometimes it's fun to just read as it comes out). Being able to see the full thinking traces (and pause and alter/correct them if needed) is one of my favorite aspects of being able to run these models locally. The thinking traces are frequently just as or more useful than the final outputs.

icedchai5mo ago

For $50K, you could buy 25 Framework desktop motherboards (128G VRAM each w/Strix Halo, so over 3TB total) Not sure how you'll cluster all of them but it might be fun to try. ;)

sspiff5mo ago

There is no way to achieve a high throughput low latency connection between 25 Strix Halo systems. After accounting for storage and network, there are barely any PCIe lanes left to link two of them together.

You might be able to use USB4 but unsure how the latency is for that.

0manrho5mo ago

In general I agree with you, the IO options exposed by Strix Halo are pretty limited, but if we're getting technical you can tunnel PCIe over USB4v2 by the spec in a way that's functionally similar to Thunderbolt 5. That gives you essentially 3 sets of native PCIe4x4 from the chipset and an additional 2 sets tunnelled over USB4v2. TB5 and USB4 controllers are not made equal, so in practice YMMV. Regardless of USB4v2 or TB5, you'll take a minor latency hit.

Strix Halo IO topology: https://www.techpowerup.com/cpu-specs/ryzen-ai-max-395.c3994

Frameworks mainboard implements 2 of those PCIe4x4 GPP interfaces as M.2 PHY's which you can use a passive adapter to connect a standard PCIe AIC (like a NIC or DPU) to, and also interestingly exposes that 3rd x4 GPP as a standard x4 length PCIe CEM slot, though the system/case isn't compatible with actually installing a standard PCIe add in card in there without getting hacky with it, especially as it's not an open-ended slot.

You absolutely could slap 1x SSD in there for local storage, and then attach up to 4x RDMA supporting NIC's to a RoCE enabled switch (or Infiniband if you're feeling special) to build out a Strix Halo cluster (and you could do similar with Mac Studio's to be fair). You could get really extra by using a DPU/SmartNIC that allows you to boot from a NVMeoF SAN to leverage all 5 sets of PCIe4x4 for connectivity without any local storage but we're hitting a complexity/cost threshold with that that I doubt most people want to cross. Or if they are willing to cross that threshold, they'd also be looking at other solutions better suited to that that don't require as many workarounds.

Apple's solution is better for a small cluster, both in pure connectivity terms and also with respect to it's memory advantages, but Strix Halo is doable. However, in both cases, scaling up beyond 3 or especially 4 nodes you rapidly enter complexity and cost territory that is better served by nodes that are less restrictive unless you have some very niche reason to use either Mac's (especially non-pro) or Strix Halo specifically.

bee_rider5mo ago

Do they need fast storage, in this application? Their OS could be on some old SATA drive or whatever. The whole goal is to get them on a fast network together; the models could be stored on some network filesystem as well, right?

1 more reply

icedchai5mo ago

I figured, but it's good to have confirmation.

3abiton5mo ago

You could use llama.cpp rpc mode over "network" via usb4/thunderbolt connection

3abiton5mo ago

What's the math on the $50k nvidia cluster? My understanding these things cost ~$8k and you can at least get 5 for $40k, that's around half a tb.

That being said, for inference mac still remain the best, and the M5 Ultra will even be a better value with its better PP.

reilly30005mo ago

GPUs: 4x NVIDIA RTX 6000 Blackwell (96GB VRAM each) • Cost: 4 × $9,000 = $36,000

• CPU: AMD Ryzen Threadripper PRO 7995WX (96-Core) • Cost: $10,000

• Motherboard: WRX90 Chipset (supports 7x PCIe Gen5 slots) • Cost: $1,200

• RAM: 512GB DDR5 ECC Registered • Cost: $2,000

• Chassis & Power: Supermicro or specialized Workstation case + 2x 1600W PSUs. • Cost: $1,500

• Total Cost: ~$50,700

It’s a bit maximalist, but if you had to spend $50k it’s going to be about as fast as you can make it.

broretore5mo ago

This is basically a tinybox pro?

FuckButtons5mo ago

Are you factoring in the above comment about as yet un-implemented parallel speed up in there? For on prem inference without any kind of asic this seems quite a bargain relatively speaking.

conradev5mo ago

Apple deploys LPDDR5X for the energy efficiency and cost (lower is better), whereas NVIDIA will always prefer GDDR and HBM for performance and cost (higher is better).

_zoltan_5mo ago

the GH/GB compute has LPDDR5X - a single or dual GPU shares 480GB, depending if it's GH or GB, in addition to the HBM memory, with NVLink C2C - it's not bad!

wtallis5mo ago

Essentially, the Grace CPU is a memory and IO expander that happens to have a bunch of ARM CPU cores filling in the interior of the die, while the perimeter is all PHYs for LPDDR5 and NVLink and PCIe.

2 more replies

yieldcrv5mo ago

15 t/s way too slow for anything but chatting, call and response, and you don't need a 3T parameter model for that

Wake me up when the situation improves

rbanffy5mo ago

Just wait for the M5-Ultra with a terabyte of RAM.

dsrtslnd235mo ago

what about a GB300 workstation with 784GB unified mem?

rbanffy5mo ago

That thing will be extremely expensive I guess. And neither CPU nor GPU have that much memory. It's also not a great workstation either - macOS is a lot more comfortable to use.

wmf5mo ago

$95K

rbanffy5mo ago

I miss the time you could go to Apple's website and build the most obscene computer possible. With the M series, all options got a lot more limited. IIRC, an x86 Mac Pro with 1.5 TB of RAM, a big GPU and the two accelerators would yield an eye watering hardware bill.

Now you need to add 8 $5K monitors to get something similarly ludicrous.

dsrtslnd235mo ago

do you have a source for that? I am trying to find pricing information but was not successful yet.

geerlingguy5mo ago

This implies you'd run more than one Mac Studio in a cluster, and I have a few concerns regarding Mac clustering (as someone who's managed a number of tiny clusters, with various hardware):

1. The power button is in an awkward location, meaning rackmounting them (either 10" or 19" rack) is a bit cumbersome (at best)

2. Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability... wish they made a Mac with QSFP :)

3. Cabling will be important, as I've had tons of issues with TB4 and TB5 devices with anything but the most expensive Cable Matters and Apple cables I've tested (and even then...)

4. macOS remote management is not nearly as efficient as Linux, at least if you're using open source / built-in tooling

To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely, without a GUI, but it looks like you _have_ to use something like Screen Sharing or an IP KVM to log into the UI, to click the right buttons to initiate the upgrade.

Trying "sudo softwareupdate -i -a" will install minor updates, but not full OS upgrades, at least AFAICT.

wlesieutre5mo ago

For #2, OWC puts a screw hole above their dock's thunderbolt ports so that you can attach a stabilizer around the cord

https://www.owc.com/solutions/thunderbolt-dock

It's a poor imitation of old ports that had screws on the cables, but should help reduce inadvertent port stress.

The screw only works with limited devices (ie not the Mac Studio end of the cord) but it can also be adhesive mounted.

https://eshop.macsales.com/item/OWC/CLINGON1PK/

crote5mo ago

That screw hole is just the regular locking USB-C variant, is it not?

See for example:

https://www.startech.com/en-jp/cables/usb31cctlkv50cm

wlesieutre5mo ago

Looks like it! Thanks for pointing this out, I had no idea it was a standard.

Apparently since 2016 https://www.usb.org/sites/default/files/documents/usb_type-c...

So for any permanent Thunderbolt GPU setups, they should really be using this type of cable

1 more reply

TheJoeMan5mo ago

Now that’s one way to enforce not inserting a USB upside-down.

eurleif5mo ago

I have no experience with this, but for what it's worth, looks like there's a rack mounting enclosure available which mechanically extends the power switch: https://www.sonnetstore.com/products/rackmac-studio

geerlingguy5mo ago

I have something similar from MyElectronics, and it works, but it's a bit expensive, and still imprecise. At least the power button isn't in the back corner underneath!

rsync5mo ago

"... Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability ..."

Thunderbolt as a server interconnect displeases me aesthetically but my conclusion is the opposite of yours:

If the systems are locked into place as servers in a rack the movements and stresses on the cable are much lower than when it is used as a peripheral interconnect for a desktop or laptop, yes ?

827a5mo ago

This is a semi-solved problem e.g. https://www.sonnetstore.com/products/thunderlok-a

Apple’s chassis do not support it. But conceptually that’s not a Thunderbolt problem, it’s an Apple problem. You could probably drill into the Mac Studio chassis to create mount points.

broretore5mo ago

You could also epoxy it.

cromniomancer5mo ago

VNC over SSH tunneling always worked well for me before I had Apple Remote Desktop available, though I don't recall if I ever initiated a connection attempt from anything other than macOS...

erase-install can be run non-interactively when the correct arguments are used. I've only ever used it with an MDM in play so YMMV:

https://github.com/grahampugh/erase-install

ThomasBb5mo ago

With MDM solutions you can not only get software update management, but even full LOM for models that support this. There are free and open source MDM out there.

827a5mo ago

They do still sell the Mac Pro in a rack mount configuration. But, it was never updated for M3 Ultra, and feels not long for this world.

colechristensen5mo ago

There are open source MDM projects, I'm not familiar but https://github.com/micromdm/nanohub might do the job for OS upgrades.

badc0ffee5mo ago

> To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely,

I think you can do this if you install a MDM profile on the Macs and use some kind of management software like Jamf.

timc35mo ago

It’s been terrible for years/forever. Even Xserves didn’t really meet the needs of a professional data centre. And it’s got worse as a server OS because it’s not a core focus. Don’t understand why anyone tries to bother - apart from this MLX use case or as a ProRes render farm.

crote5mo ago

iOS build runner. Good luck developing cross-platform apps without a Mac!

jeroenhd5mo ago

Practically, just run the macos-inside-kvm-inside-docker command. Not very fast, but you can compile the entire thing outside of the VM, all you need is the final incantations to get Apple's signatures on there.

Legally, you probably need a Mac. Or rent access to one, that's probably cheaper.

int32_645mo ago

Apple should setup their own giant cloud of M chips with tons of vram, make Metal as good as possible for AI purposes, then market the cloud as allowing self-hosted models for companies and individuals that care about privacy. They would clean up in all kinds of sectors whose data can't touch the big LLM companies.

wmf5mo ago

That exists but it's only for iUsers running Apple models. https://security.apple.com/blog/private-cloud-compute/

make35mo ago

The advantages of having a single big memory per gpu are not as big in a data center where you can just shard things between machines and use the very fast interconnect, saturating the much faster compute cores of a non Apple GPU from Nvidia or AMD

timsneath5mo ago

Also see https://www.engadget.com/ai/you-can-turn-a-cluster-of-macs-i...

FridgeSeal5mo ago

That’s great for AI people, but can we use this for other distributed workloads that aren’t ML?

geerlingguy5mo ago

I've been testing HPL and mpirun a little, not yet with this new RDMA capability (it seems like Ring is currently the supported method)... but it was a little rough around the edges.

See: https://ml-explore.github.io/mlx/build/html/usage/distribute...

dagmx5mo ago

Sure, there’s nothing about it that’s tied to ML. It’s faster interconnect , use it for many kinds of shared compute scenarios.

storus5mo ago

Is there any way to connect DGX Sparks to this via USB4? Right now only 10GbE can be used despite both Spark and MacStudio having vastly faster options.

zackangelo5mo ago

Sparks are built for this and actually have Connect-X 7 NICs built in! You just need to get the SFPs for them. This means you can natively cluster them at 200Gbps.

wtallis5mo ago

That doesn't answer the question, which was how to get a high-speed interconnect between a Mac and a DGX Spark. The most likely solution would be a Thunderbolt PCIe enclosure and a 100Gb+ NIC, and passive DAC cables. The tricky part would be macOS drivers for said NIC.

zackangelo5mo ago

You’re right I misunderstood.

I’m not sure if it would be of much utility because this would presumably be for tensor parallel workloads. In that case you want the ranks in your cluster to be uniform or else everything will be forced to run at the speed of the slowest rank.

You could run pipeline parallel but not sure it’d be that much better than what we already have.

1 more reply

irusensei5mo ago

I am waiting for M5 studio but due to current price of hardware I'm not sure it will be at a level that I would call affordable. Currently I'm watching for news and if there is any announcement prices will go up I'll probably settle for an M4 Max.

piskov5mo ago

George Hotz made nvidia running on macs with his tinygrad via usb4

https://x.com/__tinygrad__/status/1980082660920918045

throawayonthe5mo ago

https://social.treehouse.systems/@janne/115509948515319437 nvidia on a 2023 Mac Pro running linux :p

piskov5mo ago

Geohotz stuff anyone can run today

zeristor5mo ago

Will Apple be able to ramp up M3 Ultra MacStudios if this becomes a big thing?

Is this part of Apple’s plan of building out server side AI support using their own hardware?

If so they would need more physical data centres.

I’m guessing they too would be constrained by RAM.

kjkjadksj5mo ago

Remember when they enabled egpu over thunderbolt and no one cared because the thunderbolt housing cost almost as much as your macbook outright? Yeah. Thunderbolt is a racket. It’s a god damned cord. Why is it $50.

wmf5mo ago

In this case Thunderbolt is much much cheaper than 100G Ethernet.

(The cord is $50 because it contains two active chips BTW.)

geerlingguy5mo ago

Yeah, even decent 40 Gbps QSFP+ DAC cables are usually $30+, and those don't have active electronics in them like Thunderbolt does.

The ability to also deliver 240W (IIRC?) over the same cable is also a bit different here, it's more like FireWire than a standard networking cable.

pjmlp5mo ago

Maybe Apple should rethink bringing back Mac Pro desktops with pluggable GPUs, like that one in the corner still playing with its Intel and AMD toys, instead of a big box full of air and pro audio cards only.

reaperducer5mo ago

As someone not involved in this space at all, is this similar to the old MacOS Xgrid?

https://en.wikipedia.org/wiki/Xgrid

wmf5mo ago

No.

650REDHAIR5mo ago

Do we think TB4 is on the table or is there a technical limitation?

cluckindan5mo ago

This sounds like a plug’n’play physical attack vector.

guiandOP5mo ago

For security, the feature requires setting a special option with the recovery mode command line:

rdma_ctl enable

pstuart5mo ago

I imagine that M5 Ultra with Thunderbolt 5 could be a decent contender for building plug and play AI clusters. Not cheap, but neither is Nvidia.

baq5mo ago

at current memory prices today's cheap is yesterday's obscenely expensive - Apple's current RAM upgrade prices are cheap

whimsicalism5mo ago

nvidia is absolutely cheaper per flop

FlacksonFive5mo ago

To acquire, maybe, but to power?

whimsicalism5mo ago

machine capex currently dominates power

1 more reply

adastra225mo ago

FLOPS are not what matters here.

whimsicalism5mo ago

also cheaper memory bandwidth. where are you claiming that M5 wins?

1 more reply

thatwasunusual5mo ago

Can someone do an ELI5, and why this is important?

wmf5mo ago

It's faster and lower latency than standard Thunderbolt networking. Low latency makes AI clusters faster.

yalogin5mo ago

As someone that is not familiar with rdma, dos it mean I can connect multiple Macs and run inference? If so it’s great!

wmf5mo ago

You've been able to run inference on multiple Macs for around a year but now it's much faster.

daft_pink5mo ago

Hoping Apple has secured plentiful DDR5 to use in their machines so we can buy M5 chips with massive amounts of RAM soon.

colechristensen5mo ago

Apple tends to book its fab time / supplier capacity years in advance

lossolo5mo ago

I hope so, I want to replace my M1 Pro with MacBook Pro with M5 Pro when they release it next year.

colechristensen5mo ago

I mostly want the M5 Pro because my choice of an M4 Air this year with 24 GB of RAM is turning out to be less than I want with the things I'm doing these days.

TheRealPomax5mo ago

IS this... good? Why is this something that the underlying OS itself should be involved in at all?

wmf5mo ago

Networking is part of the OS's job.

jamesfmilne5mo ago

Anyone found any APIs related to this?

I'd have some other uses for RDMA between Macs.

jamesfmilne5mo ago

I found some useful clues here. Looks like it uses the regular InfiniBand RDMA APIs.

https://github.com/Anemll/mlx-rdma/commit/a901dbd3f9eeefc628...

jeffbee5mo ago

Very cool. It requires a fully-connected mesh so the scaling limit here would seem to be 6 Mac Studio M3 Ultra, up to 3TB of unified memory to work with.

PunchyHamster5mo ago

I'm sure someone will figure out how to make thunderbolt switch/router

huslage5mo ago

I don't believe the standard supports such a thing. But I wonder if TB6 will.

kmeisthax5mo ago

RDMA is a networking standard, it's supposed to be switched. The reason why it's being done over Thunderbolt is that it's the only cheap/prosumer I/O standard with enough bandwidth to make this work. Like, 100Gbit Ethernet cards are several hundred dollars minimum, for two ports, and you have to deal with SFP+ cabling. Thunderbolt is just way nicer[0].

The way this capability is exposed in the OS is that the computers negotiate an Ethernet bridge on top of the TB link. I suspect they're actually exposing PCIe Ethernet NICs to each other, but I'm not sure. But either way, a "Thunderbolt router" would just be a computer with a shitton of USB-C ports (in the same way that an "Ethernet router" is just a computer with a shitton of Ethernet ports). I suspect the biggest hurdle would actually just be sourcing an SoC with a lot of switching fabric but not a lot of compute. Like, you'd need Threadripper levels of connectivity but with like, one or two actual CPU cores.

[0] Like, last time I had to swap work laptops, I just plugged a TB cable between them and did an `rsync`.

1 more reply

nickysielicki5mo ago

This is such a weird project. Like where is this running at scale? Where’s the realistic plan to ever run this at scale? What’s the end goal here?

Don’t get me wrong... It’s super cool, but I fail to understand why money is being spent on this.

aurareturn5mo ago

The end goal is that Macs become good local LLM inference machines and for AI devs to keep using Macs.

nickysielicki5mo ago

The former will never happen and the latter is a certainty.

aurareturn5mo ago

The former is already true and will become even more true when M5 Pro/Max/Ultra release.

novok5mo ago

Now we need some hardware that is rackmount friendly, an OS that is not fidly as hell to manage in a data center or headless server and we are off to the races! And no, custom racks are not 'rackmount friendly'.

joeframbach5mo ago

So, the Powerbook Duo Dock?

nottorp5mo ago

It's good to sell shovels :)

DesiLurker5mo ago

does this means an egpu might finally work with macbook-pro or studio?

wmf5mo ago

No.

sebnukem25mo ago

I didn't know they skipped 10 version numbers.

badc0ffee5mo ago

They switched to using the year.

ComputerGuru5mo ago

Imagine if the Xserve was never killed off. Discontinued 14 years ago, now!

icedchai5mo ago

If it was still around, it would probably still be stuck on M2, just like the Mac Pro.

0manrho5mo ago

Just for reference:

Thunderbolt5's stated "80Gbps" bandwidth comes with some caveats. That's the figure for either Display Port bandwidth itself or in practice more often realized by combining the data channel (PCIe4x4 ~=64Gbps) with the display channels (=<80Gbps if used in concert with data channels), and potentially it can also do unidirectional 120Gbps of data for some display output scenarios.

If Apple's silicon follows spec, then that means you're most likely limited to PCIe4x4 ~=64Gbps bandwidth per TB port, with a slight latency hit due to the controller. That Latency hit is ItDepends(TM), but if not using any other IO on that controller/cable (such as display port), it's likely to be less than 15% overhead vs Native on average, but depending on drivers, firmware, configuration, usecase, cable length, and how apple implemented TB5, etc, exact figures very. And just like how 60FPS Average doesn't mean every frame is exactly 1/60th of a second long, it's entirely possible that individual packets or niche scenarios could see significantly more latency/overhead.

As a point of reference Nvidia RTX Pro (formerly known as quadro) workstation cards of Ada generation and older along with most modern consumer grahics cards are PCIe4 (or less, depending on how old we're talking), and the new RTX Pro Blackwell cards are PCIe5. Though comparing a Mac Studio M4 Max for example to an Nvidia GPU is akin to comparing Apples to Green Oranges

However, I mention the GPU's not just to recognize the 800lb AI compute gorilla in the room, but also that while it's possible to pool a pair of 24GB VRAM GPU's to achieve a 48GB VRAM pool between them (be it through a shared PCIe bus or over NVlink), the performance does not scale linearly due to PCIe/NVLinks limitations, to say nothing of the software, and configuration and optimization side of things also being a challenge to realizing max throughput in practice.

This is also just as true as a pair of TB5 equipped macs with 128GB of memory each using TB5 to achieve a 256GB Pool will take a substantial performance hit compared to on otherwise equivalent mac with 256GB. (capacities chosen are arbitrary to illustrate the point). The exact penalty really depends on usecase and how sensitive it is to the latency overhead of using TB5 as well as the bandwidth limitation.

It's also worth noting that it's not just entirely possible with RDMA solutions (no matter the specifics) to see worse performance than using a singular machine if you haven't properly optimized and configured things. This is not hating on the technology, but a warning from experience for people who may have never dabbled to not expect things to just "2x" or even just better than 1x performance just by simply stringing a cable between two devices.

All that said, glad to see this from Apple. Long overdue in my opinion as I doubt we'll see them implement an optical network port with anywhere near that bandwidth or RoCEv2 support, much less a expose a native (not via TB) PCIe port on anything that's a non-pro model.

EDIT: Note, many mac skus have multiple TB5 ports, but it's unclear to me what the underlying architecture/topology is there and thus can't speculate on what kind of overhead or total capacity any given device supports by attempting to use multiple TB links for more bandwidth/parallelism. If anyone's got an SoC diagram or similar refernce data that actually tells us how the TB controller(s) are uplinked to the rest of the SoC, I could go in more depth there. I'm not an Apple silicon/MacOS expert. I do however have lots of experience with RDMA/RoCE/IB clusters, NVMeoF deployments, SXM/NVlink'd devices and generally engineering low latency/high performance network fabrics for distributed compute and storage (primarily on the infrastructure/hardware/ops side than on the software side) so this is my general wheelhouse, but Apple has been a relatively blindspot for me due to their ecosystem generally lacking features/support for things like this.

givemeethekeys5mo ago

Would this also work for gaming?

AndroTux5mo ago

londons_explore5mo ago

Nobodies gonna take them seriously till they make something rack mounted and that isn't made of titanium with pentalobe screws...

moralestapia5mo ago

You might ignore this but, for a while, Mac Mini clusters were a thing and they were capex and opex effective. That same setup is kind of making a comeback.

fennecbutt5mo ago

They were only a thing to do ci/compilation related to apples os because their walled garden locked using other platforms out. You're building an iPhone or mac app? Well your ci needs to be on a cluster of apple machines.

londons_explore5mo ago

It's in a similar vein to the PS2 linux cluster or someone trying to use vape CPU's as web servers...

It might be cost effective, but the supplier is still saying "you get no support, and in fact we might even put roadblocks in your way because you aren't the target customer".

moralestapia5mo ago

True.

I'm sure Apple could make a killing on the server side, unfortunately their income from their other products is so big that even if that's a 10B/year opportunity they'll be like "yawn, yeah, whatever".

1 more reply

unit1495mo ago

Garageband DAW + MacOS 14.4 Roland Juno-D7 synthsizer, for 8-bit audio complementary compact disk format as AIFF, WAV, or MIDI appliance, in which under SLA-royalties licenses, binary 44.1 Khz sample rate sets the reproducer for reference level.

[1]: https://www.apple.com/legal/sla/docs/GarageBand.pdf

schmuckonwheels5mo ago

That's nice but

Liquid (gl)ass still sucks.

nodesocket5mo ago

Can we get proper HDR support first in macOS? If I enable HDR on my LG OLED monitor it looks completely washed out and blacks are grey. Windows 11 HDR works fine.

Razengan5mo ago

Really? I thought it's always been that HDR was notorious on Windows, hopeless on Linux, and only really worked in a plug-and-play manner on Mac, unless your display has an incorrect profile or something/

https://www.youtube.com/shorts/sx9TUNv80RE

masspro5mo ago

MacOS does wash out SDR content in HDR mode specifically on non-Apple monitors. An HDR video playing in windowed mode will look fine but all the UI around it has black and white levels very close to grey.

Edit: to be clear, macOS itself (Cocoa elements) is all SDR content and thus washed out.

crazygringo5mo ago

Define "washed out"?

The white and black levels of the UX are supposed to stay in SDR. That's a feature not a bug.

If you mean the interface isn't bright enough, that's intended behavior.

If the black point is somehow raised, then that's bizarre and definitely unintended behavior. And I honestly can't even imagine what could be causing that to happen. It does seem like that it would have to be a serious macOS bug.

You should post a photo of your monitor, comparing a black #000 image in Preview with a pitch-black frame from a video. People edit HDR video on Macs, and I've never heard of this happening before.

Starmina5mo ago

That's intended behavior for monitor limited in peak brightness

3 more replies

robflynn5mo ago

Oh, that explains why it looked so odd when I enabled HDR on my Studio.

adastra225mo ago

Huh, so that’s why HDR looks like shit on my Mac Studio.

heavyset_go5mo ago

Works well on Linux, just toggle a checkmark in the settings.

m-ack-toddler5mo ago

AI is arguably more important than whatever gaming gimmick you're talking about.

1 more reply

stego-tech5mo ago

This doesn’t remotely surprise me, and I can guess Apple’s AI endgame:

* They already cleared the first hurdle to adoption by shoving inference accelerators into their chip designs by default. It’s why Apple is so far ahead of their peers in local device AI compute, and will be for some time.

* I suspect this introduction isn’t just for large clusters, but also a testing ground of sorts to see where the bottlenecks lie for distributed inference in practice.

* Depending on the telemetry they get back from OSes using this feature, my suspicion is they’ll deploy some form of distributed local AI inference system that leverages their devices tied to a given iCloud account or on the LAN to perform inference against larger models, but without bogging down any individual device (or at least the primary device in use)

For the endgame, I’m picturing a dynamically sharded model across local devices that shifts how much of the model is loaded on any given device depending on utilization, essentially creating local-only inferencing for privacy and security of their end users. Throw the same engines into, say, HomePods or AppleTVs, or even a local AI box, and voila, you’re golden.

EDIT: If you're thinking, "but big models need the higher latency of Thunderbolt" or "you can't do that over Wi-Fi for such huge models", you're thinking too narrowly. Think about the devices Apple consumers own, their interconnectedness, and the underutilized but standardized hardware within them with predictable OSes. Suddenly you're not jamming existing models onto substandard hardware or networks, but rethinking how to run models effectively over consumer distributed compute. Different set of problems.

wmf5mo ago

inference accelerators ... It’s why Apple is so far ahead of their peers in local device AI compute, and will be for some time.

Not really. llama.cpp was just using the GPU when it took off. Apple's advantage is more VRAM capacity.

this introduction isn’t just for large clusters

It doesn't work for large clusters at all; it's limited to 6-7 Macs and most people will probably use just 2 Macs.

fwip5mo ago

The bandwidth of rdma over thunderbolt is so much faster (and lower latency) than Apple's system of mostly-wireless devices, I can't see how any learnings here would transfer.

stego-tech5mo ago

You're thinking, "You can't put modern models on that sort of distributed compute network", which is technically correct.

I was thinking, "How could we package or run these kinds of large models or workloads across a consumer's distributed compute?" The Engineer in me got as far as "Enumerate devices on network via mDNS or Bonjour, compare keys against iCloud device keys or otherwise perform authentication, share utilization telemetry and permit workload scheduling/balance" before I realized that's probably what they're testing here to a degree, even if they're using RDMA.

threecheese5mo ago

I think you are spot on, and this fits perfectly within my mental model of HomeKit; tasks are distributed to various devices within the network based on capabilities and authentication, and given a very fast bus Apple can scale the heck out of this.

stego-tech5mo ago

Consumers generally have far more compute than they think; it's just all distributed across devices and hard to utilize effectively over unreliable interfaces (e.g. Wi-Fi). If Apple (or anyone, really) could figure out a way to utilize that at modern scales, I wager privacy-conscious consumers would gladly trade some latency in responses in favor of superior overall model performance - heck, branding it as "deep thinking" might even pull more customers in via marketing alone ("thinks longer, for better results" or some vaguely-not-suable marketing slogan). It could even be made into an API for things like batch image or video rendering, but without the hassle of setting up an app-specific render farm.

There's definitely something there, but Apple's really the only player setup to capitalize on it via their halo effect with devices and operating systems. Everyone else is too fragmented to make it happen.

j / k navigate · click thread line to collapse

291 comments

simonw5mo ago

I follow the MLX team on Twitter and they sometimes post about using MLX on two or more joined together Macs to run models that need more than 512GB of RAM.

A couple of examples:

Kimi K2 Thinking (1 trillion parameters): https://x.com/awnihannun/status/1986601104130646266

DeepSeek R1 (671B): https://x.com/awnihannun/status/1881915166922863045 - that one came with setup instructions in a Gist: https://gist.github.com/awni/ec071fd27940698edd14a4191855bba...

awnihannun5mo ago

dpe825mo ago

> The main challenge is latency since you have to do much more frequent communication.

wmf5mo ago

That's how Groq works. A cluster of LPUv2s would probably be faster and cheaper than an Infiniband cluster of Epycs.

2 more replies

aimanbenbaha5mo ago

Exo-Labs: https://github.com/exo-explore/exo

liuliu5mo ago

But that's only for prefilling right? Or is it beneficial for decoding too (I guess you can do KV lookup on shards, not sure how much speed-up that will be though).

zackangelo5mo ago

No you use tensor parallelism in both cases.

1 more reply

monster_truck5mo ago

anemll5mo ago

Tensor Parallel test with RDMA last week https://x.com/anemll/status/1996349871260107102

Note fast sync workaround

andy995mo ago

Hopefully this makes it really nice for people that want the experiment with LLMs and have a local model but means well funded companies won’t have any reason to grab them all vs GPUs.

api5mo ago

What this does offer is a good alternative to GPUs for smaller scale use and research. At small scale it’s probably competitive.

gumboshoes5mo ago

Exactly: The AI appliance market. A new kind of home or small-business server.

1 more reply

FuckButtons5mo ago

Power draw? A entire Mac Pro running flat out uses less power than 1 5090. If you have a workload that needs a huge memory footprint then the tco of the Macs, even with their markup may be lower.

codazoda5mo ago

ChrisMarshallNY5mo ago

I’m not particularly interested in training models, but it would be nice to have eGPUs again. When Apple Silicon came out, support for them dried up. I sold my old BlackMagic eGPU.

That said, the need for them also faded. The new chips have performance every bit as good as the eGPU-enhanced Intel chips.

1 more reply

willtemperley5mo ago

I think it’s going to be great for smaller shops that want on premise private cloud. I’m hoping this will be a win for in-memory analytics on macOS.

bigyabai5mo ago

The lack of official Linux/BSD support is enough to make it DOA for any serious large-scale deployment. Until Apple figures out what they're doing on that front, you've got nothing to worry about.

mjlee5mo ago

Why? AWS manages to do it (https://aws.amazon.com/ec2/instance-types/mac/). Smaller companies too - https://macstadium.com

The only thing they're really missing is an iLo, I can imagine how AWS solved that, but I'd love to know.

1 more reply

Eggpants5mo ago

Not sure I understand, Mac OS is BSD based. https://en.wikipedia.org/wiki/Darwin_(operating_system)

2 more replies

CamperBob25mo ago

Almost the most impressive thing about that is the power consumption. ~50 watts for both of them? Am I reading it wrong?

wmf5mo ago

Yeah, two Mac Studios is going to be ~400 W.

CamperBob25mo ago

What am I missing? https://i.imgur.com/YpcnlCH.png

1 more reply

m-s-y5mo ago

Can confirm. My M3 Ultra tops out at 210W when ComfyUI or ollama is running flat out. Confirmed via smart plug.

btown5mo ago

andy995mo ago

It’s gonna suck if all the good Macs get gobbled up by commercial users.

icedchai5mo ago

Outside of YouTube influencers, I doubt many home users are buying a 512G RAM Mac Studio.

FireBeyond5mo ago

I doubt many of them are, either.

When the 2019 Mac Pro came out, it was "amazing" how many still photography YouTubers all got launch day deliveries of the same BTO Mac Pro, with exactly the same spec:

18 core CPU, 384GB memory, Vega II Duo GPU and an 8TB SSD.

1 more reply

DrStartup5mo ago

I'm neither and have 2. 24/7 async inference against github issues. Free. (once you buy the macs that is)

4 more replies

7e5mo ago

That product can still steal fab slots from cheaper, more prosumer products.

kridsdale15mo ago

I did. Admittedly it was for video processing at 8k which uses more than 128gb of ram, but I am NOT a YouTuber.

mirekrusin5mo ago

Of course they're not. Everybody is waiting for next generation that will run LLMs faster to start buying.

1 more reply

mschuster915mo ago

it's not like regular people can afford this kind of Apple machine anyway.

teeray5mo ago

It’s just depressing that the “PC in every home” era is being rapidly pulled out from under our feet by all these supply shocks.

2 more replies

teaearlgraycold5mo ago

It already is depending on your needs.

reilly30005mo ago

dang I wish I could share md tables.

Here’s a text edition: For $50k the inference hardware market forces a trade-off between capacity and throughput:

* Apple M3 Ultra Cluster ($50k): Maximizes capacity (3TB). It is the only option in this price class capable of running 3T+ parameter models (e.g., Kimi k2), albeit at low speeds (~15 t/s).

* NVIDIA RTX 6000 Workstation ($50k): Maximizes throughput (>80 t/s). It is superior for training and inference but is hard-capped at 384GB VRAM, restricting model size to <400B parameters.

mechagodzilla5mo ago

You can keep scaling down! I spent $2k on an old dual-socket xeon workstation with 768GB of RAM - I can run Deepseek-R1 at ~1-2 tokens/sec.

Weryj5mo ago

Just keep going! 2TB of swap disk for 0.0000001 t/sec

kergonath5mo ago

Hang on, starting benchmarks on my Raspberry Pi.

2 more replies

jacquesm5mo ago

r0b055mo ago

I think 14 3090's are more than a little power hungry!

1 more reply

tucnak5mo ago

1 more reply

ternus5mo ago

And if you get bored of that, you can flip the RAM for more than you spent on the whole system!

a0125mo ago

And heat the whole house in parallel

rpastuszak5mo ago

Nice! What do you use it for?

mechagodzilla5mo ago

icedchai5mo ago

For $50K, you could buy 25 Framework desktop motherboards (128G VRAM each w/Strix Halo, so over 3TB total) Not sure how you'll cluster all of them but it might be fun to try. ;)

sspiff5mo ago

You might be able to use USB4 but unsure how the latency is for that.

0manrho5mo ago

Strix Halo IO topology: https://www.techpowerup.com/cpu-specs/ryzen-ai-max-395.c3994

bee_rider5mo ago

1 more reply

icedchai5mo ago

I figured, but it's good to have confirmation.

3abiton5mo ago

You could use llama.cpp rpc mode over "network" via usb4/thunderbolt connection

3abiton5mo ago

What's the math on the $50k nvidia cluster? My understanding these things cost ~$8k and you can at least get 5 for $40k, that's around half a tb.

That being said, for inference mac still remain the best, and the M5 Ultra will even be a better value with its better PP.

reilly30005mo ago

GPUs: 4x NVIDIA RTX 6000 Blackwell (96GB VRAM each) • Cost: 4 × $9,000 = $36,000

• CPU: AMD Ryzen Threadripper PRO 7995WX (96-Core) • Cost: $10,000

• Motherboard: WRX90 Chipset (supports 7x PCIe Gen5 slots) • Cost: $1,200

• RAM: 512GB DDR5 ECC Registered • Cost: $2,000

• Chassis & Power: Supermicro or specialized Workstation case + 2x 1600W PSUs. • Cost: $1,500

• Total Cost: ~$50,700

It’s a bit maximalist, but if you had to spend $50k it’s going to be about as fast as you can make it.

broretore5mo ago

This is basically a tinybox pro?

FuckButtons5mo ago

Are you factoring in the above comment about as yet un-implemented parallel speed up in there? For on prem inference without any kind of asic this seems quite a bargain relatively speaking.

conradev5mo ago

Apple deploys LPDDR5X for the energy efficiency and cost (lower is better), whereas NVIDIA will always prefer GDDR and HBM for performance and cost (higher is better).

_zoltan_5mo ago

the GH/GB compute has LPDDR5X - a single or dual GPU shares 480GB, depending if it's GH or GB, in addition to the HBM memory, with NVLink C2C - it's not bad!

wtallis5mo ago

2 more replies

yieldcrv5mo ago

15 t/s way too slow for anything but chatting, call and response, and you don't need a 3T parameter model for that

Wake me up when the situation improves

rbanffy5mo ago

Just wait for the M5-Ultra with a terabyte of RAM.

dsrtslnd235mo ago

what about a GB300 workstation with 784GB unified mem?

rbanffy5mo ago

That thing will be extremely expensive I guess. And neither CPU nor GPU have that much memory. It's also not a great workstation either - macOS is a lot more comfortable to use.

wmf5mo ago

$95K

rbanffy5mo ago

Now you need to add 8 $5K monitors to get something similarly ludicrous.

dsrtslnd235mo ago

do you have a source for that? I am trying to find pricing information but was not successful yet.

geerlingguy5mo ago

This implies you'd run more than one Mac Studio in a cluster, and I have a few concerns regarding Mac clustering (as someone who's managed a number of tiny clusters, with various hardware):

1. The power button is in an awkward location, meaning rackmounting them (either 10" or 19" rack) is a bit cumbersome (at best)

2. Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability... wish they made a Mac with QSFP :)

3. Cabling will be important, as I've had tons of issues with TB4 and TB5 devices with anything but the most expensive Cable Matters and Apple cables I've tested (and even then...)

4. macOS remote management is not nearly as efficient as Linux, at least if you're using open source / built-in tooling

Trying "sudo softwareupdate -i -a" will install minor updates, but not full OS upgrades, at least AFAICT.

wlesieutre5mo ago

For #2, OWC puts a screw hole above their dock's thunderbolt ports so that you can attach a stabilizer around the cord

https://www.owc.com/solutions/thunderbolt-dock

It's a poor imitation of old ports that had screws on the cables, but should help reduce inadvertent port stress.

The screw only works with limited devices (ie not the Mac Studio end of the cord) but it can also be adhesive mounted.

https://eshop.macsales.com/item/OWC/CLINGON1PK/

crote5mo ago

That screw hole is just the regular locking USB-C variant, is it not?

See for example:

https://www.startech.com/en-jp/cables/usb31cctlkv50cm

wlesieutre5mo ago

Looks like it! Thanks for pointing this out, I had no idea it was a standard.

Apparently since 2016 https://www.usb.org/sites/default/files/documents/usb_type-c...

So for any permanent Thunderbolt GPU setups, they should really be using this type of cable

1 more reply

TheJoeMan5mo ago

Now that’s one way to enforce not inserting a USB upside-down.

eurleif5mo ago

geerlingguy5mo ago

I have something similar from MyElectronics, and it works, but it's a bit expensive, and still imprecise. At least the power button isn't in the back corner underneath!

rsync5mo ago

"... Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability ..."

Thunderbolt as a server interconnect displeases me aesthetically but my conclusion is the opposite of yours:

If the systems are locked into place as servers in a rack the movements and stresses on the cable are much lower than when it is used as a peripheral interconnect for a desktop or laptop, yes ?

827a5mo ago

This is a semi-solved problem e.g. https://www.sonnetstore.com/products/thunderlok-a

Apple’s chassis do not support it. But conceptually that’s not a Thunderbolt problem, it’s an Apple problem. You could probably drill into the Mac Studio chassis to create mount points.

broretore5mo ago

You could also epoxy it.

cromniomancer5mo ago

VNC over SSH tunneling always worked well for me before I had Apple Remote Desktop available, though I don't recall if I ever initiated a connection attempt from anything other than macOS...

erase-install can be run non-interactively when the correct arguments are used. I've only ever used it with an MDM in play so YMMV:

https://github.com/grahampugh/erase-install

ThomasBb5mo ago

With MDM solutions you can not only get software update management, but even full LOM for models that support this. There are free and open source MDM out there.

827a5mo ago

They do still sell the Mac Pro in a rack mount configuration. But, it was never updated for M3 Ultra, and feels not long for this world.

colechristensen5mo ago

There are open source MDM projects, I'm not familiar but https://github.com/micromdm/nanohub might do the job for OS upgrades.

badc0ffee5mo ago

> To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely,

I think you can do this if you install a MDM profile on the Macs and use some kind of management software like Jamf.

timc35mo ago

crote5mo ago

iOS build runner. Good luck developing cross-platform apps without a Mac!

jeroenhd5mo ago

Legally, you probably need a Mac. Or rent access to one, that's probably cheaper.

int32_645mo ago

wmf5mo ago

That exists but it's only for iUsers running Apple models. https://security.apple.com/blog/private-cloud-compute/

make35mo ago

timsneath5mo ago

Also see https://www.engadget.com/ai/you-can-turn-a-cluster-of-macs-i...

FridgeSeal5mo ago

That’s great for AI people, but can we use this for other distributed workloads that aren’t ML?

geerlingguy5mo ago

I've been testing HPL and mpirun a little, not yet with this new RDMA capability (it seems like Ring is currently the supported method)... but it was a little rough around the edges.

See: https://ml-explore.github.io/mlx/build/html/usage/distribute...

dagmx5mo ago

Sure, there’s nothing about it that’s tied to ML. It’s faster interconnect , use it for many kinds of shared compute scenarios.

storus5mo ago

Is there any way to connect DGX Sparks to this via USB4? Right now only 10GbE can be used despite both Spark and MacStudio having vastly faster options.

zackangelo5mo ago

Sparks are built for this and actually have Connect-X 7 NICs built in! You just need to get the SFPs for them. This means you can natively cluster them at 200Gbps.

wtallis5mo ago

zackangelo5mo ago

You’re right I misunderstood.

You could run pipeline parallel but not sure it’d be that much better than what we already have.

1 more reply

irusensei5mo ago

piskov5mo ago

George Hotz made nvidia running on macs with his tinygrad via usb4

https://x.com/__tinygrad__/status/1980082660920918045

throawayonthe5mo ago

https://social.treehouse.systems/@janne/115509948515319437 nvidia on a 2023 Mac Pro running linux :p

piskov5mo ago

Geohotz stuff anyone can run today

zeristor5mo ago

Will Apple be able to ramp up M3 Ultra MacStudios if this becomes a big thing?

Is this part of Apple’s plan of building out server side AI support using their own hardware?

If so they would need more physical data centres.

I’m guessing they too would be constrained by RAM.

kjkjadksj5mo ago

wmf5mo ago

In this case Thunderbolt is much much cheaper than 100G Ethernet.

(The cord is $50 because it contains two active chips BTW.)

geerlingguy5mo ago

Yeah, even decent 40 Gbps QSFP+ DAC cables are usually $30+, and those don't have active electronics in them like Thunderbolt does.

The ability to also deliver 240W (IIRC?) over the same cable is also a bit different here, it's more like FireWire than a standard networking cable.

pjmlp5mo ago

reaperducer5mo ago

As someone not involved in this space at all, is this similar to the old MacOS Xgrid?

https://en.wikipedia.org/wiki/Xgrid

wmf5mo ago

No.

650REDHAIR5mo ago

Do we think TB4 is on the table or is there a technical limitation?

cluckindan5mo ago

This sounds like a plug’n’play physical attack vector.

guiandOP5mo ago

For security, the feature requires setting a special option with the recovery mode command line:

rdma_ctl enable

pstuart5mo ago

I imagine that M5 Ultra with Thunderbolt 5 could be a decent contender for building plug and play AI clusters. Not cheap, but neither is Nvidia.

baq5mo ago

at current memory prices today's cheap is yesterday's obscenely expensive - Apple's current RAM upgrade prices are cheap

whimsicalism5mo ago

nvidia is absolutely cheaper per flop

FlacksonFive5mo ago

To acquire, maybe, but to power?

whimsicalism5mo ago

machine capex currently dominates power

1 more reply

adastra225mo ago

FLOPS are not what matters here.

whimsicalism5mo ago

also cheaper memory bandwidth. where are you claiming that M5 wins?

1 more reply

thatwasunusual5mo ago

Can someone do an ELI5, and why this is important?

wmf5mo ago

It's faster and lower latency than standard Thunderbolt networking. Low latency makes AI clusters faster.

yalogin5mo ago

As someone that is not familiar with rdma, dos it mean I can connect multiple Macs and run inference? If so it’s great!

wmf5mo ago

You've been able to run inference on multiple Macs for around a year but now it's much faster.

daft_pink5mo ago

Hoping Apple has secured plentiful DDR5 to use in their machines so we can buy M5 chips with massive amounts of RAM soon.

colechristensen5mo ago

Apple tends to book its fab time / supplier capacity years in advance

lossolo5mo ago

I hope so, I want to replace my M1 Pro with MacBook Pro with M5 Pro when they release it next year.

colechristensen5mo ago

I mostly want the M5 Pro because my choice of an M4 Air this year with 24 GB of RAM is turning out to be less than I want with the things I'm doing these days.

TheRealPomax5mo ago

IS this... good? Why is this something that the underlying OS itself should be involved in at all?

wmf5mo ago

Networking is part of the OS's job.

jamesfmilne5mo ago

Anyone found any APIs related to this?

I'd have some other uses for RDMA between Macs.

jamesfmilne5mo ago

I found some useful clues here. Looks like it uses the regular InfiniBand RDMA APIs.

https://github.com/Anemll/mlx-rdma/commit/a901dbd3f9eeefc628...

jeffbee5mo ago

Very cool. It requires a fully-connected mesh so the scaling limit here would seem to be 6 Mac Studio M3 Ultra, up to 3TB of unified memory to work with.

PunchyHamster5mo ago

I'm sure someone will figure out how to make thunderbolt switch/router

huslage5mo ago

I don't believe the standard supports such a thing. But I wonder if TB6 will.

kmeisthax5mo ago

[0] Like, last time I had to swap work laptops, I just plugged a TB cable between them and did an `rsync`.

1 more reply

nickysielicki5mo ago

This is such a weird project. Like where is this running at scale? Where’s the realistic plan to ever run this at scale? What’s the end goal here?

Don’t get me wrong... It’s super cool, but I fail to understand why money is being spent on this.

aurareturn5mo ago

The end goal is that Macs become good local LLM inference machines and for AI devs to keep using Macs.

nickysielicki5mo ago

The former will never happen and the latter is a certainty.

aurareturn5mo ago

The former is already true and will become even more true when M5 Pro/Max/Ultra release.

novok5mo ago

joeframbach5mo ago

So, the Powerbook Duo Dock?

nottorp5mo ago

It's good to sell shovels :)

DesiLurker5mo ago

does this means an egpu might finally work with macbook-pro or studio?

wmf5mo ago

No.

sebnukem25mo ago

I didn't know they skipped 10 version numbers.

badc0ffee5mo ago

They switched to using the year.

ComputerGuru5mo ago

Imagine if the Xserve was never killed off. Discontinued 14 years ago, now!

icedchai5mo ago

If it was still around, it would probably still be stuck on M2, just like the Mac Pro.

0manrho5mo ago

Just for reference:

givemeethekeys5mo ago

Would this also work for gaming?

AndroTux5mo ago

londons_explore5mo ago

Nobodies gonna take them seriously till they make something rack mounted and that isn't made of titanium with pentalobe screws...

moralestapia5mo ago

You might ignore this but, for a while, Mac Mini clusters were a thing and they were capex and opex effective. That same setup is kind of making a comeback.

fennecbutt5mo ago

londons_explore5mo ago

It's in a similar vein to the PS2 linux cluster or someone trying to use vape CPU's as web servers...

It might be cost effective, but the supplier is still saying "you get no support, and in fact we might even put roadblocks in your way because you aren't the target customer".

moralestapia5mo ago

True.

1 more reply

unit1495mo ago

[1]: https://www.apple.com/legal/sla/docs/GarageBand.pdf

schmuckonwheels5mo ago

That's nice but

Liquid (gl)ass still sucks.

nodesocket5mo ago

Can we get proper HDR support first in macOS? If I enable HDR on my LG OLED monitor it looks completely washed out and blacks are grey. Windows 11 HDR works fine.

Razengan5mo ago

https://www.youtube.com/shorts/sx9TUNv80RE

masspro5mo ago

Edit: to be clear, macOS itself (Cocoa elements) is all SDR content and thus washed out.

crazygringo5mo ago

Define "washed out"?

The white and black levels of the UX are supposed to stay in SDR. That's a feature not a bug.

If you mean the interface isn't bright enough, that's intended behavior.

You should post a photo of your monitor, comparing a black #000 image in Preview with a pitch-black frame from a video. People edit HDR video on Macs, and I've never heard of this happening before.

Starmina5mo ago

That's intended behavior for monitor limited in peak brightness

3 more replies

robflynn5mo ago

Oh, that explains why it looked so odd when I enabled HDR on my Studio.

adastra225mo ago

Huh, so that’s why HDR looks like shit on my Mac Studio.

heavyset_go5mo ago

Works well on Linux, just toggle a checkmark in the settings.

m-ack-toddler5mo ago

AI is arguably more important than whatever gaming gimmick you're talking about.

1 more reply

stego-tech5mo ago

This doesn’t remotely surprise me, and I can guess Apple’s AI endgame:

* I suspect this introduction isn’t just for large clusters, but also a testing ground of sorts to see where the bottlenecks lie for distributed inference in practice.

wmf5mo ago

inference accelerators ... It’s why Apple is so far ahead of their peers in local device AI compute, and will be for some time.

Not really. llama.cpp was just using the GPU when it took off. Apple's advantage is more VRAM capacity.

this introduction isn’t just for large clusters

It doesn't work for large clusters at all; it's limited to 6-7 Macs and most people will probably use just 2 Macs.

fwip5mo ago

The bandwidth of rdma over thunderbolt is so much faster (and lower latency) than Apple's system of mostly-wireless devices, I can't see how any learnings here would transfer.

stego-tech5mo ago

You're thinking, "You can't put modern models on that sort of distributed compute network", which is technically correct.

threecheese5mo ago

stego-tech5mo ago

j / k navigate · click thread line to collapse