I'd assume the mac mini has a less extensive pcie/tb subsystem.
No idea what people are doing with all those pcie slots except for nvme cards. I wonder how hard it would be to talk to a pcie fpga.
You aggregate PCIe lanes (x16, x8, x4/Thunderbolt, x1). You could also built mesh networks from SerDes but now instead of PCIe switches You would need SerDes switches or routers (Ethernet, NVlink, Infiniband).
You need those high speed links between chips for much more than SSD/NVME cards. Other NAS, Processors, Ethernet/internet, Camera, Wifi, Optics, DRAM, SRAM, power etc. For intercore communication (between processors or between chiplets), between networked PCB's, between DRAM chips (DDR5 is just another SerDes protocol), Flash Chips, camera chips, etc. Any other chip at faster then 250 Mbps speeds.
I aggregate all the M4 Mac mini ports into a M4 cluster by mesh networking all its Serdes/PCIe with FPGAs into a very cheap low power supercomputer with exaflop performance. Cheaper than NVDIA. I'm sure Apple does the same in their data centers.
My talk [1] on Wafer Scale Integration and free space optics goes deeper into how and why SerDes and PCIe will be replaced by fiber optics and free space optics for power reasons. I'm sure several parallel 2 Ghz optic lambdas per fiber (but no SerDes!) will be the next step in Apple Silicon as well: the M4 power budget already is mostly in the off-chip SerDes/Thunderbolt networking links.
That sounds super interesting, do you happen to have some further information on that? Is it just a bunch of FPGAs issuing DMA TLPs?
M4 supercomputers are cheaper and it also will be lower Capex and Apex for most datacenter hardware.
>do you happen to have some further information on that?
Yes, the information is in my highly detailed custom documentation for the programmers and buyers of 'my' Apple Silicon super computer, Squeak and Ometa DSL programming languages and adaptive compiler. You can contact me for this highly technical report and several scientific papers (email in my profile).
Do you know of people who might buy a super computer based on better specifications? Or even just buyers who will go for 'the lowest Capex and the lowest Opex supercomputer in 2025-2027'?
Because the problem with HPC is that almost all funders and managers buy supercomputers with a safe brand name (Nvidia, AMD, Intel) at triple the cost and seldom from a super computer researcher as myself. But some do, if they understand why. I have been designing, selling, programming and operating super computers since 1984 (I was 20 years old then), this M4 Apple Silicon Cluster will be my ninth supercomputer. I prefer to build them from the ground up with our own chip and wafer scale integration designs but when an off-the-shelf chip is good enough I'll sell that instead. Price/Performance/Watt is what counts, ease of programming is a secondary consideration for what performance you achieve. Alan Kay argues you should rewrite your software from scratch [2] and do your own hardware [3] so that is what I've done sinds I learned from him.
>Is it just a bunch of FPGAs issuing DMA TLPs?
No. The FPGA's are optional for when you want to flatten the inter-core (=inter-SRAM cache) networking with switches or routers to a shorter hop topology for the message passing like a Slim fly diameter two hop topology [4].
DMA (Direct Memory Access) TLPs (Transaction Layer Packets) are one of the worst ways of doing inter-core and inter-SRAM communication and on PCIe it has a huge 30% protocol overhead at triple the cost. Intel (and most other chip companies like NVIDIA, Altera, AMD/XILINX) can't design proper chips because they don't want to learn about software [2]. Apple Silicon is marginally better.
You should use pure message passing between any process, preferably in a programming language and a VM that uses pure message passing at the lowest level (Squeak, Erlang). Even better if you then map those software messages directly to message passing hardware as in my custom chips [3].
The reason to reverse Apple Silicon instructions for CPU, GPU and ANE are to be able to adapt my adaptive compiler to M4 chips but also to repurpose PCIe for low level message passing with much better performance and latency than DMA TLPs.
To conclude, if you want to get the cheapest Capex and Opex M4 Mac mini supercomputer you need to rewrite your supercomputing software in a high level language and message passing system like the parallel Squeak Smalltalk VM [3] with adaptive load balancing compilation. C, C++, Swift, MPI or CUDA would result in sub-optimal software performance and orders of magnitude more lines of code when optimal performance of parallel software is the goal.
[1] https://en.wikipedia.org/wiki/System_X_(supercomputer)
[2] https://www.youtube.com/watch?v=ubaX1Smg6pY
Talk [6] on free space optical interconnects without SerDes some day showing up on low power Apple Silicon (around M6-M8 models).