Most of the information in the linked article is very outdated (~16 months old), so we have decided to ditch the idea of having a separate DRAM and "External I/O" and just have our chip-to-chip on all four sides of the chip. The chip-to-chip interface uses the same protocol as our Network On Chip, and expands in the same 2D mesh. We are also looking into (with a sketched out plan) on how to directly interface this I/O with HBM dies that can be in the same MCM package. As far as supporting other memories/IOs, we are leaning towards having "adapter chips" that would convert our chip-to-chip interface to DDR4, Ethernet, Infiniband, etc.
As far as bandwidth numbers, our aggregate bandwidth for this test chip we have just taped out (16 cores + 2 chip-to-chip I/O macros on TSMC 28nm, 12mm^2 in size) is 60GB/s though for the planned production chip, we will be over 256GB/s. I have a good feeling we will be a fair margin higher than that, but I would rather under promise and over deliver.