For those that aren't familiar, control flow - or non "Directed Acyclical graphs" are the hard part of HLS. This looks like a fairly nice syntax compared to the bastardisations of C that Intel and Xilinx pursue for HLS but I'm not sure this is bringing anything new to the table.
As for the examples, I'm kind of flumoxed that they haven't given any details on what the examples synthesize to. For example, how many logic blocks does the CRC32 use? How many clock cycles? What about the throughput? I'm going to sound like a grumpy old man now, but it's important becaues it's very difficult to get performant code as a hardware engineer. Generally it involves having a fair idea of how the code is going to synthesize. What is damn near impossible is figuring out what you want to synthesize to, and then guessing the shibboleth that the compiler wants in order to produce that code. Given that they haven't tackled the difficult problems like control flow, folding, resource sharing etc. It makes me hesitant to believe they've produced something phenomenal.
We have been targeting some Lattice FPGAs for prototyping purposes, but we've mostly been doing designs for ASIC processes, which is why details are a little sparse for FPGAs you get off the shelf, but it's a priority for us to fill those in. We have some interactive demos that show FPGA synthesis stats (cell counts, generated Verilog, let you toy with the pipeline frequency) and integrate with the [IR visualizer](https://google.github.io/xls/ir_visualization/#screenshot), we'll try to open source that as soon as possible. The OSS tools (SymbiFlow) that some of our colleagues collaborate on can do synthesis in just a few seconds, so it can feel pretty cool to see these things in near-real-time.
We fold over resources in time with a sequential generator, but we still have a ways to go, we expect a bunch of problems will map nicely onto concurrent processes, they're turing complete and nice for the compiler to reason about.
I'm a big believer that phenomenonal is really effort and solving real-world pain points integrated over time -- it's a journey! We're intending to do blog posts as we hit big milestones, so keep an eye out!
They seem to have a simulation framework for these tools that isn't just "re-use an existing simulator", and it apparently does use LLVM for codegen but that's the easy part. Actual simulation performance numbers would be really interesting to see vs actual RTL sims.
fn decode_i_instruction(ins: u32) -> (u12, u5, u3, u5, u7) {
let imm_11_0 = (ins >> u32:20);
let rs1 = (ins >> u32:15) & u32:0x1F;
let funct3 = (ins >> u32:12) & u32:0x07;
let rd = (ins >> u32:7) & u32:0x1F;
let opcode = ins & u32:0x7F;
(imm_11_0 as u12, rs1 as u5, funct3 as u3, rd as u5, opcode as u7)
}
here's the systemverilog solution {im_11_0,rs1,funct3,rd,opcode} <= ins;
Obviously, in software, you can't slice data in the same way since as far as I can tell, it's assuming all variables are a certain size and so there's no naturally way of bit slicing.Frankly, combinatorics is not where I expect the most interesting differences. Sequential logic is surely more interesting.
Note that there are differences though: Seems no type inferrence, for .. in, different array syntax, match arms delimitered by ";" instead of ",".
But it has a lot of the cool stuff from Rust: pattern matching, expression orientedness (let ... = match { ... }), etc.
Also other syntax is similar: fn foo() -> Type syntax, although something similar to that can be achieved in C++ as well.
Often, such hardware is written using hardware description languages [1] like Verilog or VHDL. These languages are very low level and, in the opinion of some, a little clumsy to use.
XLS aims to provide a system for High-level synthesis [2]. The benefit of such systems is that you can more easily map interesting algorithms to hardware without being super low level.
[1] https://en.wikipedia.org/wiki/Hardware_description_language
Not sure what happened it it. Maybe it did not optimize things enough.
https://en.wikipedia.org/wiki/Handel-C
https://babbage.cs.qc.cuny.edu/courses/cs345/Manuals/HandelC...
Most programs are loaded into memory and parts of those programs are moved to registers and are used to load data into other registers. That data is, in turn, sent to logic units like adders that add two registers together or comparators that compare to register's values. The generality comes at a cost in terms of power and time but offers flexibility in return.
That is very different from something like a light switch where you flip the switch and the result continuously reflects that input within the limits of the speed of light.
If you are willing to sacrifice flexibility, translating your code into hardware gives you a device that runs the same processing on its inputs continuously at the speed of light subject to your information processing constraints (e.g. derivations of the original input still need to be calculated prior to use).
Traditionally, separate languages and greater hardware knowledge requirements made custom circuits less accessible. This project brings more standard, higher level languages into the set of valid specifications for custom electronics.
The $1M question however how this experience pans out as you try to squeeze out the last bit of timing margin. I don't know, but I'm eager to find out.
ADD: this parallels the situation with CUDA where writing a first working implementation is usually easy, but by the time you have an heavily optimized version ...
https://github.com/mmastrac/oblivious-cpu/blob/master/hidecp...
I was thinking about turning it into a full language at some point, but they beat me to it (and I love the Rust syntax!).
Even with modern RTL, we have a synthesizing compiler optimizing our design within a cycle boundary, trying to manage fanouts and close timing by duplicating paths and optimize redundant boolean formulas. Some will even do some forms of cross stage optimization.
If you think of XLS's starting point as "mostly structural" akin to RTL (instead of "loops where you push a button and produce a whole chip") it's really an up-leveling process, where there's a compiler layer underneath you that can assist you in exploring the design space, ideally more quickly and effectively, and trying to give you a flexible substrate to make that happen (by describing bits of functionality as much as possible in latency insensitive ways).
I like to think of it like [Advanced Chess](https://en.wikipedia.org/wiki/Advanced_chess) -- keep the human intuition but permit the use of lots of cycles for design process assist. It appears from what we've seen so far that when you have a "lifted" representation of your design such that tools can work with it well, composition and exploration becomes more possible, fun, and fruitful! I expect over time we'll have a mode where you still require everything closes timing in a single cycle when you explicitly want all the control you had / don't care so much for the assist, then you just get the benefits of the tooling / fast simulation infrastructure that works with the same program representation. It's a great space to be working in as somebody who loves compilers, tools, and systems: there's so much you could do, there's incredible opportunity!
I'm sure Google will use XLS for their internal digital design work, but I don't expect this to ever gain widespread support. (not because HLS is inherently bad, but because of the culture)
This is categorically not true. There have been repeated projects to re-invent hardware description languages. They don't fail because hardware engineers are conservative, they fail because they don't produce good enough results.
Intel has a team of hundreds of engineers working on HLS, Xilinx probably has almost as many, there are lots of smaller companies working on their own things like Maxeler. They haven't take off because it's an unsolved problem to automate some of the things you do in Verilog efficiently.
Take this language for example - it cannot express any control flow. It's feed forward only. Which essentially means, it is impossible to express most of the difficult parts of the problems people solve in hardware. I hate Verilog, I would love a better solution, but this language is like designing a software programming language that has no concept of run-time conditionals.
As a hardware designer whose never been a fan of SystemVerilog but continues to use it I think this is inaccurate. There are two main issues that mean I currently choose SystemVerilog (though would certainly be happy to replace it).
1. Tooling, Verilog or SystemVerilog (at least bits of it) is widely supported across the EDA ecosystem. Any new HDL thus needs to compile down to Verilog to be usable for anything serious. Most do indeed do this but there can be a major issue with mapping the language. Any issues you get in the compiled Verilog need to be mentally mapped back to the initial language. Depending upon the HDL this can be rather hard, especially if there's serious name mangling going on.
2. New HDLs don't seem to optimize for the kinds of issues I have and may make dealing with the issues I do have worse. Most of my career I've been working on CPUs and GPUs. Implementation results matter (so power, max frequency and silicon area) and to hit the targets you want to hit you often need to do some slightly crazy stuff. You also need a very good mental model of how the implemented design (i.e. what gates you get, where they get placed and how they're connected) is produced from the HDL and in turn know how to alter the HDL to get a better result in gates. A typical example is dealing with timing paths, you may need to knock a few gates off a path to meet a frequency goal which requires you to a) map the gates back to HDL constructs so you can see what bit of RTL is causing the issues and b) do some of the slightly crazy stuff, hyper-specific optimisations that rely on a deep understanding of the micro-architecture.
New HDLs often have nice thing like decent type systems and generative capabilities but loose the low-level easy metal mapping of RTL to gates you get with Verilog. I don't find much of my time for instance is spent dealing with Verilog's awful type system (including the time spent dealing with bugs that arise from it). It's frustrating but making it better wouldn't have a transformative effect on my work.
I do spend lots of time mentally mapping gates back to RTL to then try and out work out better ways to write the RTL to improve implementation results. This often comes back to say seeing an input an AND gate is very late, realising you can make a another version of that signal that won't break functional correctness 90% of the time with a fix-up applied to deal with the other 10% of cases in some other less timing critical part of the design (e.g. in a CPU pipeline the fix-up would be causing a reply or killing an instruction further down the pipeline). Due to the mapping issue I brought up in 1. new HDLs often make this harder. Taking a higher level approach to the design can also make such fixes very fiddly or impossible to do without hacking up the design in a major way.
That said my only major experience with a CPU design not using Verilog/SystemVerilog was building a couple of CPUs for my PhD in Bluespec SystemVerilog. I kind of liked the language but ultimately due to 1. and 2. didn't think it really did much for me over SystemVerilog.
If you're building hardware with less tight constraints than yes some of the new HDLs around could work very well for you and yes hardware designers can be very conservative about changing their ways but it simply isn't the case that this is the only thing holding back adoption of new HDLs.
I do need to spend some more time getting to grips with what's now available and up and coming but I can't say I've seen anything, that for my job at least, provides a major jump over SystemVerilog.
...Are you sure?
The possibility of using a single codebase to generate both a software emulator and a hardware implementation is incredible, from a hardware preservation point of view.
(I'm sorry.)