XLS: Accelerated HW Synthesis (opens in new tab)

(google.github.io)

122 pointsvictor825y ago54 comments

54 comments

>XLS is used inside of Google for generating feed-forward pipelines from "building block" routines

For those that aren't familiar, control flow - or non "Directed Acyclical graphs" are the hard part of HLS. This looks like a fairly nice syntax compared to the bastardisations of C that Intel and Xilinx pursue for HLS but I'm not sure this is bringing anything new to the table.

As for the examples, I'm kind of flumoxed that they haven't given any details on what the examples synthesize to. For example, how many logic blocks does the CRC32 use? How many clock cycles? What about the throughput? I'm going to sound like a grumpy old man now, but it's important becaues it's very difficult to get performant code as a hardware engineer. Generally it involves having a fair idea of how the code is going to synthesize. What is damn near impossible is figuring out what you want to synthesize to, and then guessing the shibboleth that the compiler wants in order to produce that code. Given that they haven't tackled the difficult problems like control flow, folding, resource sharing etc. It makes me hesitant to believe they've produced something phenomenal.

learyg5y ago

Hi, one of the collaborators here, thanks for the good points.

We have been targeting some Lattice FPGAs for prototyping purposes, but we've mostly been doing designs for ASIC processes, which is why details are a little sparse for FPGAs you get off the shelf, but it's a priority for us to fill those in. We have some interactive demos that show FPGA synthesis stats (cell counts, generated Verilog, let you toy with the pipeline frequency) and integrate with the [IR visualizer](https://google.github.io/xls/ir_visualization/#screenshot), we'll try to open source that as soon as possible. The OSS tools (SymbiFlow) that some of our colleagues collaborate on can do synthesis in just a few seconds, so it can feel pretty cool to see these things in near-real-time.

We fold over resources in time with a sequential generator, but we still have a ways to go, we expect a bunch of problems will map nicely onto concurrent processes, they're turing complete and nice for the compiler to reason about.

I'm a big believer that phenomenonal is really effort and solving real-world pain points integrated over time -- it's a journey! We're intending to do blog posts as we hit big milestones, so keep an eye out!

Traster5y ago

Do you mind me asking what applications Google uses this for internally? Is this used in a flow that's ended up in production? Also, what are your thoughts on integrating optimized RTL blocks?

1 more reply

person_of_color5y ago

Are you hiring SWEs for HW-SW co-design?

1 more reply

aseipp5y ago

The HLS tools from Xilinx and Intel (and maybe Cadence I guess) can also actually compile your models as ordinary C++ code (i++ from Intel is literally just a fork of Clang, I think, and so are tools like LegUp), leading to their greatest benefit: simulations are way, way faster and software compilers have vastly better iteration times than synthesizers.

They seem to have a simulation framework for these tools that isn't just "re-use an existing simulator", and it apparently does use LLVM for codegen but that's the easy part. Actual simulation performance numbers would be really interesting to see vs actual RTL sims.

Connect12A225y ago

I love their RISC-V implementation in 500 lines of code: https://github.com/google/xls/blob/main/xls/examples/riscv_s...

Traster5y ago

It's kind of a good demonstration of the problem with software versus hardware, here's xls solution (just for one function):

  fn decode_i_instruction(ins: u32) -> (u12, u5, u3, u5, u7) {
   let imm_11_0 = (ins >> u32:20);
   let rs1 = (ins >> u32:15) & u32:0x1F;
   let funct3 = (ins >> u32:12) & u32:0x07;
   let rd = (ins >> u32:7) & u32:0x1F;
   let opcode = ins & u32:0x7F;
   (imm_11_0 as u12, rs1 as u5, funct3 as u3, rd as u5, opcode as u7)
  }

here's the systemverilog solution

  {im_11_0,rs1,funct3,rd,opcode} <= ins;

Obviously, in software, you can't slice data in the same way since as far as I can tell, it's assuming all variables are a certain size and so there's no naturally way of bit slicing.

learyg5y ago

Thanks again for the detailed thought! We actually [developed more advanced bit slicing syntax]( https://github.com/google/xls/blob/1b6859dc384fe8fa39fb901af... ) since that example was written, you can do things like a standard slice `x[5:8]` or a Verilog-style "width slice" that has explicit signedness `x[i +: u8]`. There's currently no facility for "destructuring" structs as bitfields like pattern matches, but there's no conceptual reason it can't be done, I think that'd be an interesting thing to prioritize if there's good bang for the buck. [Github issue to track!](https://github.com/google/xls/issues/131) Let me know if I missed out on details or rationale, thanks!

1 more reply

FullyFunctional5y ago

That's untrue. You need to include the declarations of im_11_0, etc. for the above to work and then you end up with just as much code. There's no reason they couldn't extend match to operate on bit slices also which would make this identical.

Frankly, combinatorics is not where I expect the most interesting differences. Sequential logic is surely more interesting.

fmakunbound5y ago

Comments indicate it implements a subset of various things.

jashmenn5y ago

I've been programming for 20 years and yet I have no idea what this does. Can someone ELI5?

jevogel5y ago

As far as I can tell, it is a high-level synthesis tool for developing FPGA/ASIC applications. You write your circuit functions in a Rust-like DSL and it generates optimized Verilog/System Verilog code, which can then be synthesized into hardware. But you can also take the output of the DSL and simulate it first, which presumably is quicker than simulating Verilog.

tlack5y ago

You feed in Rust (a flavor called DSLX) or C++ and it generates code for your FPGA (in Verilog). You then upload this compiled "bitstream" to your FPGA and now you have something akin to a custom microprocessor, but running just your program.

est315y ago

It looks really quite similar to Rust: https://github.com/google/xls/blob/main/xls/examples/dslx_in...

Note that there are differences though: Seems no type inferrence, for .. in, different array syntax, match arms delimitered by ";" instead of ",".

But it has a lot of the cool stuff from Rust: pattern matching, expression orientedness (let ... = match { ... }), etc.

Also other syntax is similar: fn foo() -> Type syntax, although something similar to that can be achieved in C++ as well.

1 more reply

cokernel_hacker5y ago

It is a project aimed at making the design of electronic logical easier.

Often, such hardware is written using hardware description languages [1] like Verilog or VHDL. These languages are very low level and, in the opinion of some, a little clumsy to use.

XLS aims to provide a system for High-level synthesis [2]. The benefit of such systems is that you can more easily map interesting algorithms to hardware without being super low level.

[1] https://en.wikipedia.org/wiki/Hardware_description_language

[2] https://en.wikipedia.org/wiki/High-level_synthesis

pkaye5y ago

I remember years ago reading about Handel-C. A lot like Go with channels and threads and function calls. The way it synthesized the hardware was pretty simple conceptually. You could easily understand how the program flow was converted into a state machine in the hardware.

Not sure what happened it it. Maybe it did not optimize things enough.

https://en.wikipedia.org/wiki/Handel-C

https://babbage.cs.qc.cuny.edu/courses/cs345/Manuals/HandelC...

1 more reply

erikerikson5y ago

Not like you're 5 and I'm definitely not an expert on this project but here's my best shot...

Most programs are loaded into memory and parts of those programs are moved to registers and are used to load data into other registers. That data is, in turn, sent to logic units like adders that add two registers together or comparators that compare to register's values. The generality comes at a cost in terms of power and time but offers flexibility in return.

That is very different from something like a light switch where you flip the switch and the result continuously reflects that input within the limits of the speed of light.

If you are willing to sacrifice flexibility, translating your code into hardware gives you a device that runs the same processing on its inputs continuously at the speed of light subject to your information processing constraints (e.g. derivations of the original input still need to be calculated prior to use).

Traditionally, separate languages and greater hardware knowledge requirements made custom circuits less accessible. This project brings more standard, higher level languages into the set of valid specifications for custom electronics.

foota5y ago

I think it turns a c-ish language (from the looks, not sure about semantics) into a hardware language like HDL.

zelly5y ago

Verilog for codemonkeys

FullyFunctional5y ago

That's a complete mischaracterization. The point of any and all HLSes is to raise the level of abstraction so you can be more productive. Even for highly skilled Verilog "monkies", writing in an HLS is a great deal faster and less error prone (assuming comparable mastery of the language) simply because you do not need to deal with a lot of low level details.

The $1M question however how this experience pans out as you try to squeeze out the last bit of timing margin. I don't know, but I'm eager to find out.

ADD: this parallels the situation with CUDA where writing a first working implementation is usually easy, but by the time you have an heavily optimized version ...

nickysielicki5y ago

HLS is going to improve, and you can either disregard it and be left behind or you can try to understand where it fits into a design. Your choice.

patrickcteng5y ago

ditto

gadders5y ago

Thank god I'm not alone.

mmastrac5y ago

I love this. I did something similar with using Java to build an RTL:

https://github.com/mmastrac/oblivious-cpu/blob/master/hidecp...

I was thinking about turning it into a full language at some point, but they beat me to it (and I love the Rust syntax!).

jeffreyrogers5y ago

This is interesting. Overall I'm bearish on high-level synthesis for anything requiring high performance, since you typically need to think about how your code will be mapped to hardware if you want it to perform well, and adding abstractions interferes with that. I would like to know more about how Google uses this, since it doesn't seem like a good fit for the type of stuff I work on.

learyg5y ago

Hi, one of the collaborators here! One question to consider, and one that I consider pretty frequently, is what the hard difference really is between HLS and RTL. It seems up to interpretation, but I think of it more as a spectrum than anything that truly schisms the space. I think I personally associate the term HLS with "trying to uplevel the design process where we can".

Even with modern RTL, we have a synthesizing compiler optimizing our design within a cycle boundary, trying to manage fanouts and close timing by duplicating paths and optimize redundant boolean formulas. Some will even do some forms of cross stage optimization.

If you think of XLS's starting point as "mostly structural" akin to RTL (instead of "loops where you push a button and produce a whole chip") it's really an up-leveling process, where there's a compiler layer underneath you that can assist you in exploring the design space, ideally more quickly and effectively, and trying to give you a flexible substrate to make that happen (by describing bits of functionality as much as possible in latency insensitive ways).

I like to think of it like [Advanced Chess](https://en.wikipedia.org/wiki/Advanced_chess) -- keep the human intuition but permit the use of lots of cycles for design process assist. It appears from what we've seen so far that when you have a "lifted" representation of your design such that tools can work with it well, composition and exploration becomes more possible, fun, and fruitful! I expect over time we'll have a mode where you still require everything closes timing in a single cycle when you explicitly want all the control you had / don't care so much for the assist, then you just get the benefits of the tooling / fast simulation infrastructure that works with the same program representation. It's a great space to be working in as somebody who loves compilers, tools, and systems: there's so much you could do, there's incredible opportunity!

typon5y ago

This doesn't seem like HLS, more like a new HDL that's based on Rust. This has been done many times before with other functional languages (Clash, Chisel, Spinal, hardcaml and others). These projects never take off because hardware designers are inherently conservative and they won't let go of their horrible language (Verilog or SystemVeriog) no matter what.

I'm sure Google will use XLS for their internal digital design work, but I don't expect this to ever gain widespread support. (not because HLS is inherently bad, but because of the culture)

Traster5y ago

> These projects never take off because hardware designers are inherently conservative and they won't let go of their horrible language (Verilog or SystemVeriog) no matter what.

This is categorically not true. There have been repeated projects to re-invent hardware description languages. They don't fail because hardware engineers are conservative, they fail because they don't produce good enough results.

Intel has a team of hundreds of engineers working on HLS, Xilinx probably has almost as many, there are lots of smaller companies working on their own things like Maxeler. They haven't take off because it's an unsolved problem to automate some of the things you do in Verilog efficiently.

Take this language for example - it cannot express any control flow. It's feed forward only. Which essentially means, it is impossible to express most of the difficult parts of the problems people solve in hardware. I hate Verilog, I would love a better solution, but this language is like designing a software programming language that has no concept of run-time conditionals.

1 more reply

jeffreyrogers5y ago

They describe it as HLS, and it definitely looks like HLS to me. But maybe we have different definitions. Either way, it seems to be targeting a strange subset of problems: it doesn't look high level enough to be easy to use for non-hardware designers (I don't think this goal is achievable, but it is at least a worthy goal), and it doesn't seem low-level enough to allow predictable performance.

gchadwick5y ago

> These projects never take off because hardware designers are inherently conservative and they won't let go of their horrible language (Verilog or SystemVeriog) no matter what.

As a hardware designer whose never been a fan of SystemVerilog but continues to use it I think this is inaccurate. There are two main issues that mean I currently choose SystemVerilog (though would certainly be happy to replace it).

1. Tooling, Verilog or SystemVerilog (at least bits of it) is widely supported across the EDA ecosystem. Any new HDL thus needs to compile down to Verilog to be usable for anything serious. Most do indeed do this but there can be a major issue with mapping the language. Any issues you get in the compiled Verilog need to be mentally mapped back to the initial language. Depending upon the HDL this can be rather hard, especially if there's serious name mangling going on.

2. New HDLs don't seem to optimize for the kinds of issues I have and may make dealing with the issues I do have worse. Most of my career I've been working on CPUs and GPUs. Implementation results matter (so power, max frequency and silicon area) and to hit the targets you want to hit you often need to do some slightly crazy stuff. You also need a very good mental model of how the implemented design (i.e. what gates you get, where they get placed and how they're connected) is produced from the HDL and in turn know how to alter the HDL to get a better result in gates. A typical example is dealing with timing paths, you may need to knock a few gates off a path to meet a frequency goal which requires you to a) map the gates back to HDL constructs so you can see what bit of RTL is causing the issues and b) do some of the slightly crazy stuff, hyper-specific optimisations that rely on a deep understanding of the micro-architecture.

New HDLs often have nice thing like decent type systems and generative capabilities but loose the low-level easy metal mapping of RTL to gates you get with Verilog. I don't find much of my time for instance is spent dealing with Verilog's awful type system (including the time spent dealing with bugs that arise from it). It's frustrating but making it better wouldn't have a transformative effect on my work.

I do spend lots of time mentally mapping gates back to RTL to then try and out work out better ways to write the RTL to improve implementation results. This often comes back to say seeing an input an AND gate is very late, realising you can make a another version of that signal that won't break functional correctness 90% of the time with a fix-up applied to deal with the other 10% of cases in some other less timing critical part of the design (e.g. in a CPU pipeline the fix-up would be causing a reply or killing an instruction further down the pipeline). Due to the mapping issue I brought up in 1. new HDLs often make this harder. Taking a higher level approach to the design can also make such fixes very fiddly or impossible to do without hacking up the design in a major way.

That said my only major experience with a CPU design not using Verilog/SystemVerilog was building a couple of CPUs for my PhD in Bluespec SystemVerilog. I kind of liked the language but ultimately due to 1. and 2. didn't think it really did much for me over SystemVerilog.

If you're building hardware with less tight constraints than yes some of the new HDLs around could work very well for you and yes hardware designers can be very conservative about changing their ways but it simply isn't the case that this is the only thing holding back adoption of new HDLs.

I do need to spend some more time getting to grips with what's now available and up and coming but I can't say I've seen anything, that for my job at least, provides a major jump over SystemVerilog.

analognoise5y ago

Hardware has gotten 1000x faster, and software has made that 1000x faster system slower than it was in the 1980's, and you think hardware people should learn the software style?

...Are you sure?

thotypous5y ago

Google is also investing some developer time on Bluespec since it was opensourced (https://github.com/B-Lang-org/bsc). I wonder if these projects make part of a bigger plan at Google.

rbanffy5y ago

When I started playing with MAME, I somewhat dreamed of a way to turn its highly structured code into something that could not only be compiled into an emulator as it is, but also be synthesizable into hardware.

The possibility of using a single codebase to generate both a software emulator and a hardware implementation is incredible, from a hardware preservation point of view.

asdfman1235y ago

If they rename it XLSM they can embed some neat VBA scripts into it and squeeze out more functionality.

(I'm sorry.)

w_t_payne5y ago

I've got a Kahn-process-network based "simulation" framework, intended to provide a smooth conveyor belt of product maturation from prototypes written in high level scripting languages like Python or MATLAB through to production code written in C or Ada. (Sort of like Simulink, but with a different set of warts). Having some hardware synthesis capability is very much on the roadmap, and this looks like it's going to be worth investigating for that. Very excited to dive into it!

ampdepolymerase5y ago

Reminds me of the old reconfigure.io which used the ideas and syntax of Go's CSP and transformed them into async HDL code. Unfortunately the startup has been shuttered.

http://docs.reconfigure.io/

navidr5y ago

What happened to them?

ampdepolymerase5y ago

They shut down. Here's the founder: https://twitter.com/robtaylor78

1 more reply

simonw5y ago

XLS as an acronym for Accelerated HW Synthesis is a bit of a stretch!

high_derivative5y ago

It's most likely inspired by XLA (Accelerated Linear Algebra) - same creator(s).

dirtypersian5y ago

I believe it might come from the fact that this process of going from high level programming language to hardware is called "high level synthesis". I think the "X" is meant to make it more generic, i.e. X level synthesis.

simonw5y ago

That makes sense. Accelerated => XL just about works for me.

rowanG0775y ago

DSLX seems like a nightmare. Does it support arbitrary C++?

R0b0t15y ago

j / k navigate · click thread line to collapse

54 comments

Traster5y ago

>XLS is used inside of Google for generating feed-forward pipelines from "building block" routines

learyg5y ago

Hi, one of the collaborators here, thanks for the good points.

Traster5y ago

Do you mind me asking what applications Google uses this for internally? Is this used in a flow that's ended up in production? Also, what are your thoughts on integrating optimized RTL blocks?

1 more reply

person_of_color5y ago

Are you hiring SWEs for HW-SW co-design?

1 more reply

aseipp5y ago

Connect12A225y ago

I love their RISC-V implementation in 500 lines of code: https://github.com/google/xls/blob/main/xls/examples/riscv_s...

Traster5y ago

It's kind of a good demonstration of the problem with software versus hardware, here's xls solution (just for one function):

  fn decode_i_instruction(ins: u32) -> (u12, u5, u3, u5, u7) {
   let imm_11_0 = (ins >> u32:20);
   let rs1 = (ins >> u32:15) & u32:0x1F;
   let funct3 = (ins >> u32:12) & u32:0x07;
   let rd = (ins >> u32:7) & u32:0x1F;
   let opcode = ins & u32:0x7F;
   (imm_11_0 as u12, rs1 as u5, funct3 as u3, rd as u5, opcode as u7)
  }

here's the systemverilog solution

  {im_11_0,rs1,funct3,rd,opcode} <= ins;

Obviously, in software, you can't slice data in the same way since as far as I can tell, it's assuming all variables are a certain size and so there's no naturally way of bit slicing.

learyg5y ago

1 more reply

FullyFunctional5y ago

Frankly, combinatorics is not where I expect the most interesting differences. Sequential logic is surely more interesting.

fmakunbound5y ago

Comments indicate it implements a subset of various things.

jashmenn5y ago

I've been programming for 20 years and yet I have no idea what this does. Can someone ELI5?

jevogel5y ago

tlack5y ago

est315y ago

It looks really quite similar to Rust: https://github.com/google/xls/blob/main/xls/examples/dslx_in...

Note that there are differences though: Seems no type inferrence, for .. in, different array syntax, match arms delimitered by ";" instead of ",".

But it has a lot of the cool stuff from Rust: pattern matching, expression orientedness (let ... = match { ... }), etc.

Also other syntax is similar: fn foo() -> Type syntax, although something similar to that can be achieved in C++ as well.

1 more reply

cokernel_hacker5y ago

It is a project aimed at making the design of electronic logical easier.

Often, such hardware is written using hardware description languages [1] like Verilog or VHDL. These languages are very low level and, in the opinion of some, a little clumsy to use.

XLS aims to provide a system for High-level synthesis [2]. The benefit of such systems is that you can more easily map interesting algorithms to hardware without being super low level.

[1] https://en.wikipedia.org/wiki/Hardware_description_language

[2] https://en.wikipedia.org/wiki/High-level_synthesis

pkaye5y ago

Not sure what happened it it. Maybe it did not optimize things enough.

https://en.wikipedia.org/wiki/Handel-C

https://babbage.cs.qc.cuny.edu/courses/cs345/Manuals/HandelC...

1 more reply

erikerikson5y ago

Not like you're 5 and I'm definitely not an expert on this project but here's my best shot...

That is very different from something like a light switch where you flip the switch and the result continuously reflects that input within the limits of the speed of light.

foota5y ago

I think it turns a c-ish language (from the looks, not sure about semantics) into a hardware language like HDL.

zelly5y ago

Verilog for codemonkeys

FullyFunctional5y ago

The $1M question however how this experience pans out as you try to squeeze out the last bit of timing margin. I don't know, but I'm eager to find out.

ADD: this parallels the situation with CUDA where writing a first working implementation is usually easy, but by the time you have an heavily optimized version ...

nickysielicki5y ago

HLS is going to improve, and you can either disregard it and be left behind or you can try to understand where it fits into a design. Your choice.

patrickcteng5y ago

ditto

gadders5y ago

Thank god I'm not alone.

mmastrac5y ago

I love this. I did something similar with using Java to build an RTL:

https://github.com/mmastrac/oblivious-cpu/blob/master/hidecp...

I was thinking about turning it into a full language at some point, but they beat me to it (and I love the Rust syntax!).

jeffreyrogers5y ago

learyg5y ago

typon5y ago

I'm sure Google will use XLS for their internal digital design work, but I don't expect this to ever gain widespread support. (not because HLS is inherently bad, but because of the culture)

Traster5y ago

> These projects never take off because hardware designers are inherently conservative and they won't let go of their horrible language (Verilog or SystemVeriog) no matter what.

1 more reply

jeffreyrogers5y ago

gchadwick5y ago

> These projects never take off because hardware designers are inherently conservative and they won't let go of their horrible language (Verilog or SystemVeriog) no matter what.

I do need to spend some more time getting to grips with what's now available and up and coming but I can't say I've seen anything, that for my job at least, provides a major jump over SystemVerilog.

analognoise5y ago

Hardware has gotten 1000x faster, and software has made that 1000x faster system slower than it was in the 1980's, and you think hardware people should learn the software style?

...Are you sure?

thotypous5y ago

Google is also investing some developer time on Bluespec since it was opensourced (https://github.com/B-Lang-org/bsc). I wonder if these projects make part of a bigger plan at Google.

rbanffy5y ago

The possibility of using a single codebase to generate both a software emulator and a hardware implementation is incredible, from a hardware preservation point of view.

asdfman1235y ago

If they rename it XLSM they can embed some neat VBA scripts into it and squeeze out more functionality.

(I'm sorry.)

w_t_payne5y ago

ampdepolymerase5y ago

Reminds me of the old reconfigure.io which used the ideas and syntax of Go's CSP and transformed them into async HDL code. Unfortunately the startup has been shuttered.

http://docs.reconfigure.io/

navidr5y ago

What happened to them?

ampdepolymerase5y ago

They shut down. Here's the founder: https://twitter.com/robtaylor78

1 more reply

simonw5y ago

XLS as an acronym for Accelerated HW Synthesis is a bit of a stretch!

high_derivative5y ago

It's most likely inspired by XLA (Accelerated Linear Algebra) - same creator(s).

dirtypersian5y ago

simonw5y ago

That makes sense. Accelerated => XL just about works for me.

rowanG0775y ago

DSLX seems like a nightmare. Does it support arbitrary C++?

R0b0t15y ago

j / k navigate · click thread line to collapse