GCC's impact was possible because it was (with GAS - the assembler) 100% feasible to have an open source toolchain. Yes more software was necessary for a complete system (linker, libc, etc), but GCC made it possible to build from the ground floor up.
Also, yes, the initial GCC was worse than any proprietary decent tool chain at the time, but it got better and better because each improvement built on all the earlier open sourced efforts.
Think about how hard Linux kernel development would have been if it had to rely on different proprietary tool chains for every target architecture (and possibly chip version).
Hardware definition languages (Verilog/VHDL, etc) enable high level chip design like high level programming languages, but making the physical chip requires a PDK (process design kit) that encodes how each critical silicon feature is built.
So a chip built for TSMC 28nm contains TSMC proprietary material and is essentially unportable. It can take several years to move a major chip from one foundry to another (or even a shrink at the same foundry), and the proprietary tool chains preclude a development process that can incrementally improve portability.
This announcement is a a major step toward a similar foundation being available for silicon design. It is very important that it is a large complex chip, rather than just a research development vehicle.
[disclaimer - past life as OpenPOWER participant]
we've developed a dynamically SIMD-partitionable-maskable set of "base primitives" for example, so you set a "mask" and it automatically subdivides the 64-bit adder into two halves. but we didn't leave it there, we did shift, multiply, less-than, greater-than - everything.
https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/iee... https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/iee...
can you imagine doing that in VHDL or Verilog? tens of engineers needed, or some sort of macro-auto-generated code (treating VHDL / Verilog as a machine-code compiler target).
the reason for doing this - planning it well in advance - is because we're doing Cray-style Vectors (Draft SVP64) with polymorphic element-width over-rides. yes, really. the "base" operation is 64-bit, but you can over-ride the source and destination operation width.
the reason why we're using our own Cell Library is actually down to transparency. we want customers to be able to compile the GDS-II files themselves, fully automated, no involvement from us, no manual intervention.
ironically, as an aside: Staf's Cells are 30% smaller (by area) than the Foundry equivalents.
There is a huge amount of great stuff going on this this area.
Tim Ansell - Skywater PDK: Fully open source manufacturable PDK for a 130nm process
Staf actually developed actual IOpad Cells (from scratch), actual Standard Cells and a 4k SRAM block: we did not use the NDA'd TSMC Cell Libraries, here.
if we had used Skywater 130nm we would have been forced to ditch LIP6.fr (i cannot express enough how hard Jean-Paul Chaput has worked on coriolis2 for the past 18 months), we would not have been able to test the IOpads that Staf developed... yeah.
bottom line is we used a complete independent VLSI toolchain - fully automated - that has nothing to do with the USA or DARPA Military funding - and was developed with European expertise.
It may be 180nm (1999-era technology), but that's still hugely important. The world of semiconductor design is incredibly closed source and secretive.
Staf will also "protect" you from the Foundry NDAs. you develop with a "symbolic" version of the Cell Library, he runs the "Real" one and sends it to IMEC on your behalf. here's Staf's "symbolic" Cell Library, it's based on FreePDK45 https://gitlab.com/Chips4Makers/c4m-pdk-freepdk45/-/releases
Coriolis2 - http://coriolis.lip6.fr/ - is entirely Libre-Licensed. it's fully automated, you don't have to do any "hand-editing", it has unit tests (so you have demos you can look at and also check you installed everything right). we have some automated setup scripts for it if you're interested: https://git.libre-soc.org/?p=dev-env-setup.git;a=blob;f=cori...
LIP6 have a Silicon-proven ENTIRELY Libre Cell Library called nsxlib, if you really want to go that route. it's Silicon-proven in 360nm and 180nm.
Also, LIP6 have a relationship with a small town in Japan, they have 2 micron fab which is used for "training" of employees of the town. submission for that is entirely free. i know this exists but have not used it, and don't know more details, but i can probably put you in touch with Sorbonne University if you're serious.
and if you really really want to do "at home" stuff, Libre-Silicon is developing a 2in wafer fab, using Ultra-Violet DLPs and high-accuracy stepper motors, that you'll be able to buy and operate from your garage or lab. think "3D printing", i think they're aiming for 2000 nm or something (20 micron)? really big, but proves the concept.
Neither one has published an easily-replicable process, meaning I can't really repeat what they've done. IMO what this space needs is an open source build plan/BoM, with a cottage industry of people selling DiY and pre-assembled kits. Once the 3d printing community got there, that's when things took off -- before kits or at least build guides with proper BoMs, it was just disparate individuals doing their own thing.
Connect me with anyone who's got a good approach to building some sort of replicable open-source fab though, and I'll quit my job and join the project full-time (that's not a joke: I'm serious).
[1] http://sam.zeloof.xyz/category/semiconductor/ [2] https://libresilicon.com/
180nm is still by far and above the world's most heavily-used geometry, because the price-performance (bang per buck, however you want to put it) is so extremely high.
an 8in wafer is USD 600 and that's extremely low. any power MOSFET, power transistor, diode or other high current semiconductor you absolutely don't want small "things" (detailed tiny tracks) you want MASSIVE ones.
why on earth would you waste money on tiny features, it's like using the latest 0.15mm 3D printing nozzles to 3D print a massive 300x300x300 mm cube that's going to be used for nothing more than a foot-stool. you want a 1.2mm nozzle for that!
then any processor below 300 mhz, you can get away with 180nm. need only an 8 mhz 8-bit or 4-bit washing machine or microwave processor, or something to go in a cheap digital watch? 180nm is your best bet: you'll get tens of thousands of < 1 mm^2 ASICs on a single wafer which means you're well below $0.05 per individual die.
a 28nm 8in wafer would be about... 10x that cost, you'd end up with exactly the same transistor (or 8 mhz 8-bit processor), why would you pay more money for what you don't need?
btw the real reason why there's a chip shortage: the Automotive industry, who are cheap bar-stewards, wanted even lower than $600 per 8in wafer so they went with 360nm and cruder geometry. that's equipment that's even older than the 1990s, like 40+ years in some cases.
so then the stupidity hit, and they stopped ordering. then 18 months later they phone up these old Foundries and say, "ok, we're ready to start ordering again". and the Foundries say, "oh, we switched off the equipment, and it cooled down and got damaged (just like that massive Electric plant in S. Australia that was de-commissioned, the concrete cracked when they switched it off, and it's completely unsafe to start up again). you were our only customer for the past 30 years, so we scrapped it all. you'll have to now compete with the consumer-grade smaller geometry Fabs like everyone else".
which is something that none of the Automotive companies have told their Governments, because then they can't go crying "boo hoo hoo, we can't make chips any more at the price that we demand, waaa, waaaa, i wannnt myyy monneeeeey"
and now of course they can't use the old masks, because those were designed for 360nm and cruder geometries, they have to redesign the entire ASIC for 180nm and that's why you can't now get onto 180nm and other MPW Programmes because the frickin Automotive Industry has jammed them all to hell.
In my opinion, an area of interest going forward into the next decade of more safety-critical software written by smaller and smaller orgs (e.g. eVTOL companies, sensor companies, etc) is continuing to push forward which objectives can be accomplished by formal means instead of primarily through testing.
An NXP or IBM processor might be great, and might be mature, and might be very well tested -- but I, as a safety-critical software developer, have little way of demonstrating that to certification authorities. The availability of open-source processor designs and, in the future, traceable and accountable conversion from those HDL designs to RTL, to masks, and then to silicon, gives a path to showing that portions of a processor are correct-by-design, and thus a path to the goal of showing that my machine-code-as-authored(-by-an-assembler) and machine-code-as-executed(-by-a-processor) semantics match.
The Talos is currently the only fully libre computer available for high-perf computing, and it uses POWER9 CPUs. If you want a fully free CPU, your choices are either very dated CPUs or POWER.
Many distros (inc. Debian, and most source-based ones) support ppc64/POWER officially quite well and go out of their way to ensure a high degree of portability.
The fact that the POWER architecture may be niche is not a problem since so much software can be compiled for it. See the thalos workstations: https://www.raptorcs.com/TALOSII/ and the powerpc notebook: https://www.powerpc-notebook.org/en/
For people who are willing to use niche hardware for more control on what is running, this is seems like a very important step.
What's good about this is that the source is available and can be verified to some degree against the hardware (by decapping it.) That puts a log of constraints on what kinds of secret back doors people can build that we didn't have before.
Off topic: where did you get this rule?
In other words, this chip isn't even remotely open-source.
What they sent to the foundry isn't the "ghost cells" (which don't have transistors in them and therefore don't work).
This fails the most basic requirements of being open source.
Coriolis2 source code: http://coriolis.lip6.fr/
Chips4Makers FlexLib Cell Library based on FreePDK45: https://gitlab.com/Chips4Makers/c4m-pdk-freepdk45/-/releases
Automated Layout scripts for generation of GDS-II Files: https://git.libre-soc.org/?p=soclayout.git;a=summary
please do try to get your facts right and not mislead people by making false claims, eh?
the problem with this particular irate individual is that he's assumed that because TSMC's DRC rules are only accessible under NDA that automatically absof*** everything was also "fake open source".
idiot.
sigh.
clearly didn't read the article.
whilst both Staf Verhaegen and LIP6.fr signed the TSMC Foundry NDA, we in the Libre-SOC team did not. we therefore worked entirely in the Libre world, honoured our committment to full transparency, whilst Staf and Jean-Paul and the rest of the team from LIP6 worked extremely hard "in parallel".
the ASIC can therefore be compiled with three different Cell Libraries:
* LIP6.fr's 180nm "nsxlib" - this is a silicon-proven 180nm Cell Library * Staf's FreePDK45 "symbolic" cell library using FlexLib (as the name says, it uses the Academic FreePDK45 DRC) * the NDA'd TSMC 180nm "real" variant of Staf's FlexLib
i was therefore able to "prepare" work for Jean-Paul, via the parallel track, commit it to the PUBLIC REPOSITORY (the one that's open, that our resident idiot didn't bother to check existed or even ask where it is), which saved Jean-Paul time whilst he focussed on fixing issues in coriolis2.
it was a LOT of work.
on top of that, because it's an entirely separate processor, to get it to do anything you actually have to have a Remote Procedure Call system, operating over Shared Memory!
oink.
so the process for running a GPU shader binary is as follows:
step 1: fire up a compiler (in userspace) step 2: compiler takes the shader IR and turns it into GPU assembler step 3: the userspace program (game, blender, whatever) triggers the linux kernel (or windows kernel) to upload that GPU binary to the GPU step 4: the kernel copies that GPU binary over Shared Memory Bus (usually PCIe) step 5: now we unwind back to userspace (with a context-switch) and want to actually run something (OpenGL call) step 6: the OpenGL call (or Vulkan) gets some function call parameters and some data step 7: the userspace library (MESA) "packs" (marshalls) those function call parameters into serialised data step 8: the userspace library triggers the linux (windows) kernel to "upload" the serialised function call parameters - again over Shared Memory Bus step 9: the kernel waits for that to happen step 10: the userspace proceeds (after a context-switch) and waits for notification that the function call has completed...
... i'm not going to bother filling in the rest of the details, you get the general idea that this is completely insane and goes a long way towards explaining why GPU Cards are so expensive and why it takes YEARS to reverse-engineer GPU drivers.
in the Libre-SOC architecture - which is termed a "Hybrid" one, the following happens:
step 1: the compiler is fired up (in userspace, just like above) step 2: compiler takes the shader IR and turns it into *NATIVE* (Power ISA with Cray-style Vectors and some custom opcodes) assembler step 3: userspace program JIT EXECUTES THAT BINARY NATIVELY RIGHT THERE RIGHT THEN
done.
did you see any kernel context-switches in that simple 3-step process? that's because there aren't any needed.
now, the thing is - answering your question a bit more - that "just having vector capabilities" is nowhere near enough. the lesson has been learned from Nyuzi, Larrabee, and others: if you simply create a high-performance general-purpoes Vector ISA, you have successfully created something that absolutely sucks at GPU workloads: about TWENTY FIVE PERCENT (one quarter) of the capability of a modern GPU for the same power consumption.
therefore, you need to add SIN, COS, ATAN2, LOG2, and other opcodes, but you need to add them with "reduced accuracy" (like, only 12 bit or so) because that's all that's needed for 3D.
you need to add Texture caches, and Texture interpolation opcodes (takes 4 pixels @ 00 01 10 11 square coordinates, plus two FP XY numbers between 0.0 and 1.0, and interpolates the pixels in 2D).
you need to add YUV2RGB and other pixel-format-conversion opcodes that are in the Vulkan Specification...
and many more.
but, we first had to actually, like, y'know, have a core that can actually execute instructions at all? :) and that's what this first Test ASIC is: a first step.
https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/...
i'm currently in the middle of a rabbit-hole exploration of being able to do in-place RADIX-2 FFT, DCT and DFT butterflys, the target is a general purpose function to cover each of those, in around 25 Vector instructions.
not 2,000 optimised loop-unrolled instructions specifically crafted for RADIX-8, another for RADIX-16, another for RADIX-32 ..... RADIX-4096 (as is the case in ffmpeg): 25 instructions FOR ANY 2^N FFT.
btw if you're interested in "real-world" SVP64 Vector Assembler we have the beginnings of an ffmpeg MP3 CODEC inner loop:
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=medi...
that's under 100 instructions, more than 4x less assembler for the same job in PPC64. and 6.5 times less assembler than ffmpeg's optimised x86 apply_window_float.S
you will no doubt be aware of the huge power savings that brings due to reduced L1 cache usage.
it's 64-bit, LE/BE, and it's implementing a "Finite State Machine" (similar technique to picorv32, if you know that design). this because we wanted to keep it REALLY basic, and also very clear as a Reference Design, none of the "optimised pipelined decoders and issuers" that you normally find, which make it really, really difficult to see what the hell is going on.
bear in mind this includes SVP64: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple...
if you go back several revisions, the non-Vectorised version is like... 400 lines?
* In a few years (maybe 5?), it might be possible to build a computer that you can trust has no intentional back doors in the CPU, but is modern enough to run software from within the last decade.
* If this catches on, and is used by enough people, economies of scale might kick in, and bring costs for advanced custom chips down by an order of magnitude (if the cpu is small enough, and if more fab capacity is built). Not Intel/AMD/ARM parts - those prices will remain stable, at first.
* Maybe we can have another decent consumer-grade router? No, this is a pipe-dream.
* Our Amiga accelerator boards will become SMOKING fast.