* It took > 3 minutes to compile our code.
* DMA made a huge performance difference once we figured it out.
* Realizing that we had to be one with the clock tick took a lot of time. Understanding synchronous based programming (if that's the term) was a paradigm shift for my partner and I.
* The utter delight when we got frames streaming across the wire. The performance (though over a LAN) was silky smooth and you could tell immediately this was different than the run-of-the-mill x86 desktop program.
[0] - http://www1.cs.columbia.edu/~sedwards/classes/2009/4840/repo...
[1] https://github.com/YosysHQ
[2] https://symbiflow.github.io/
[3] https://www.chisel-lang.org/
Now that we have some nice FPGA's we can use I think this is the next biggest hurdle.
What I am missing are 2 topics: timing analysis and debugging. Static timing analysis and proper timing constrains are crucial for functional design. No tool can differentiate without constrains a slow signal signal toggling LED every 10 seconds from DDR3 533 MHz differential clock line.
Debugging FPGAS design cannot be avoided. Even if design works in simulator, it fails very often on real hardware. And then real fun starts. Xilinx has Integrates Logic Analyzer, Intel has Chipscope. Other vendors have their own similar tools. There are best FPGA designer’s friends. But these tools can’t be trusted, they break sometimes design in unexpected ways. Designer must develop gut feeling, what’s happening. Synthesis of design with integrated logic analyzer takes much more time than regular one. Debugging cycles are insane long. Forgetting one signal can mean 2 hours waiting, so add them all at very beginning.
Writing hardware description language is easy part. As somebody already mentioned, everybody can count ones and zeros. First problem every software engineer encounters is simple: there is nothing to program. FPGA design is about describing your system with these ancient language. Second problem is using bulky toolchain. It’s more than compile and debug buttons. In fact, there is huge machine processing code to bitstream. And it’s complexity naturally takes time to understand, you don’t need to be smart to be FPGA designer.
Probably because it has very limited support in open source tooling, same as VHDL.
[1] You can abuse some imperative paradigms to implement things like Conway's Game of Life as a systolic array - https://en.wikipedia.org/wiki/Systolic_array
input [NUM_OF_MULTIPLIERS*32-1:0] a_in,
input [NUM_OF_MULTIPLIERS*32-1:0] b_in,
output [NUM_OF_MULTIPLIERS*64-1:0] mult_out
reg [31:0] tmp_a, tmp_b;
reg [63:0] tmp_mult;
always @(*) begin
mult_out = {(NUM_OF_MULTIPLIERS*64){1'b0}};
for (i=0; i<NUM_OF_MULTIPLIERS; i+=1) begin
tmp_a = a_in>>(i*32);
tmp_b = b_in>>(i*32);
tmp_mult = tmp_a*tmp_b;
mult_out |= tmp_mult<<(i*64);
end
end
Would give you NUM_OF_MULTIPLIERS multipliers. If you wrote each multiply out, it would be more code and also wouldn't allow you to parametrize the code.For example, passing a variable by reference in one context cost me an extra 10% logic blocks, and in another lowered it by 10%. It became a bit of a shotgun approach to optimising
Where they can shine is if you need some odd combination of peripherals attached to a microcontroller: think of something like a uc with 4 uarts or multiple separate i2c buses.
Anywhere you need a lot of parallel processing that you can guarantee won't be interrupted, like a video processing pipeline is also a good fit.
You might find this video helpful: https://www.youtube.com/watch?v=us2F8wAncw8
I think his design is very interesting, showing how to mix custom peripherals with picosoc so you can get very good response but also be able to program in C.
If you're building a device that's going to be mass-produced and sold, then the situation is different and using FPGAs can make sense, because you'll amortize the engineering cost for the digital logic across all the units you sell. It can be worth it if it lets you use a cheaper processor or microcontroller.
It’s kind of similar to verilator, in that it lets you write test benches for your designs in programming languages as opposed to HDL. Whereas verilator lets you write c++, cocotb is python based.
Both of these are probably best to take up after spending some time with an hdl, so you learn to think from a hardware perspective.
Also check out the zipcpu blog
That means when you write something you should have some understanding of the underlying hardware inferred. So is this going to give me some combinatorial logic, a register or a RAM? It's very easy to keep the software mindset of if it compiles then it's good.
Looking at the output of the tools, they'll say something like "x to y setup time: -2 ns slack". That means your desired operation can't meet the 10 ns clock period; it actually takes 12 ns for all the logic to ripple through. So now what?
You can break up the operation into two steps. Let's say the multiplication takes 8 ns, and the addition takes 4 ns. In timestep 1 you do z = mx, and pipeline c = b. Then in timestep 2 you do y = z + c. This way your operation takes two clock cycles = 20 ns total in terms of latency, but you can maintain a rate of 100 MHz.
Alternatively, you could choose a slower clock rate, say 75 MHz, and have a clock period of 13.333 ns. Then you would be able to meet the logic delay requirements in one cycle.
Again this is greatly simplified but it's similar to what one ends up doing in real FPGA designs. At the beginning you're usually trying to achieve maximum performance. Then later on you add more features to the FPGA, only to find that in doing so, you've caused an existing portion of the design to fail timing, so you need to twiddle things around.
https://en.wikipedia.org/wiki/Bob_Widlar#Fairchild_Semicondu...
You might enjoy those books, available fully online: https://ccrma.stanford.edu/~jos/
I know this, because I started learning and tinkering with this sort of thing a year or so ago, with no prior experience with electronics or hardware design, or a formal comp-sci education.
I had decades of programming experience already, but I think I have learned more about the fundamentals of computer science while playing with cheap FPGAs, than I have by just writing code.
It's pretty common for computer engineering students to implement a simple RISC processor (often a simplified MIPS) on an FPGA as a class project. In my experience it was a fantastic way to learn the basics of computer architecture.
And staying above physics, that's how computers work.
Introduction to Logic Circuits & Logic Design with VHDL by Brock J LaMeres
He was my instructor in college and the text itself is extremely helpful for all things FPGA.
Your article has reduced my fear (and also, you mentioned the price of that Tiny FPGA board; at that price point, I don't mind too much if the magic smoke gets out).
The latter requires a USB blaster dongle to program things so the Upduino has the upper hand IMO, especially because it also has very lightweight open source tools.
it's "combinational circuits", not "combinatorial" (that's whole another part of math)
In most of contemporary books and university courses "combinational" is used and with a note that you should not confuse it with "combinatorics" and "combinatory logic"
That would more typically be written as 200e6, no need for the explicit multiplication when using standard float literal notation.