Developer Preview – EC2 Instances with Programmable Hardware (opens in new tab)

phairoh9y ago

Thank you so much for these posts, fpgaminer. They've been extremely helpful to me in framing how these things could be used.

Once upon a time I thought seriously about going in to hardware design. I took a couple different courses in college (over 10 years ago now... sigh) dealing with VHDL and/or verilog and entirely loved it. If not for a chance encounter with web programming during my co-op my career would have been entirely different. With AWS offering this in the cloud if it is not prohibitively expensive I'll be looking in to toying with it and hopefully discovering uses for it in my work.

deafcalculus9y ago

What can each one of those 2.5 million "logic elements" do? Last time I used an FPGA, they were mostly made up of 4-bit LUTs.

How many NOT operations can this do per cycle (and per second)? I realise FPGAs aren't the most suited for this, but the raw number is useful when thinking about how much better the FPGA is compared to a GPU for simple ops.

fpgaminer9y ago

The 2.5 million number quoted in the article is "System Logic Cells", not Logic Elements. Near as I can tell, since I haven't kept pace with Xilinx since their 7 series, a "System Logic Cell" is some strange fabricated metric which is arrived at by taking the number of LUTs in the device and multiplying by ~2. In other words, there is no such thing as a System Logic Cell, it's just a translucent number.

Anyway, the FPGAs being used here are, I believe, based on a 6-LUT (6 input, 2 output). So you'd get about 1.25 million 6-LUTs to work with, and some combination of MUXes, flip-flops, distributed RAM, block RAM, DSP blocks, etc.

Supposing Xilinx isn't doing any trickery and you really can use all those LUTs freely, then you'd be able to cram ~2.5 million binary NOTs into the thing (2 NOTs per LUT, since they're two output LUTs). So 2.5 million NOTs per cycle. I don't know what speed it'd run at for such a simple operation. Their mid-range 7 series FPGAs were able to do 32-bit additions plus a little extra logic, at ~450 MHz and consume 16 LUTs for each adder.

al2o3cr9y ago

The metrics have gotten pretty opaque since the old days when an FPGA was a "sea of LUTs" all alike; modern ones include a ton of (semi-)fixed function hardware like multiply-accumulate blocks and embedded dual-port RAM. Even the LUTs themselves can be reprogrammed into small RAM blocks or shift registers, so counting "logic elements" is mostly a marketing exercise.

[0] https://github.com/cseed/arachne-pnr

ranman9y ago

If you don't click through to read about this: you can write an FPGA image in verilog/VHDL and upload it... and then run it. To me that seems like magic.

HDK here: https://github.com/aws/aws-fpga

(I work for AWS)

cottonseed9y ago

This is so awesome, I can't even. I wrote arachne-pnr [0] to learn about FPGAs to get ready for this day. Just signed up, can't way to play with these!

I hope the growing popularity of FPGAs for general-purpose computing will help push the vendors to open up bitstreams and invest in open-source design tools.

makapuf9y ago

Wow Clifford is that you ? I hope this, exciting as it may be, won't make you leave open fpga efforts for the dark side (saw your talk last Fosdem, was very exciting)

orbifold9y ago

I'm very curious if/how you have managed to make the developer experience sane and enjoyable. I've experience with a FPGA cluster of ~800 FPGAs and it definitely does not get used to its full potential because of the tooling around it.

adamdecaf9y ago

Is that repo going to be made public? It looks to be private right now.

ranman9y ago

Yup, sorry -- working on fixing that now. Check back in a bit.

ranman9y ago

If you guys are curious about these announcements I'll be recapping them and going into more detail on twitch.tv/aws at 12:30 pacific

Huh? Isn't Twitch just for gaming content?

shaklee39y ago

What's the cost?

noselasd9y ago

So it's tied to the PCIe bus - how do you interact with your FPGA once you programmed it - are there general drivers you can use, or do you also have to create a linux driver to talk to your FPGA ?

https://arxiv.org/pdf/1602.04283v1.pdf

Xilinx provide software drivers and IP for PCIe DMA and memory mapped interfaces. These are fairly easy to integrate (probably not the best for latency though - I've developed my own but I require a specific use case - low latency but don't care about bandwidth).

eliben9y ago

I'm not sure what you mean by the "magic" part here, can you please clarify?

[background: many years of writing VHDL specifically for FPGAs, using various dev boards and custom boards]

ebrewste9y ago

The magic part is the thing we have gotten used to with the cloud -- virtual hardware you never see and rent by the minute. Imagine having an FPGA idea and not needing to make board, pay for a dev board, or even find a dev board in your lab... Like your idea and need more? Spin up 100 more right now...

zyngaro9y ago

Exactly what I thought. This is amazing. FPGA is commonly used in embedded systems to perform application specific tasks and now application developers have access to this power too. I guess many machine learning application might take profit of that power instead of using comparatively very expensive graphics hardware.

cma9y ago

How do FPGAs compare with GPUs for the inference stage of Deep Learning algorithms? Can they accelerate it a lot?

nl9y ago

No, but they do use less power:

To the best of our knowledge, state-of-the-art performance for forward propagation of CNNs on FPGAs was achieved by a team at Microsoft. Ovtcharov et al. have reported a throughput of 134 images/second on the ImageNet 1K dataset [28], which amounts to roughly 3x the throughput of the next closest competitor, while operating at 25 W on a Stratix V D5 [30]. This performance is projected to increase by using top-of-the-line FPGAs, with an estimated through- put of roughly 233 images/second while consuming roughly the same power on an Arria 10 GX1150. This is com- pared to high-performing GPU implementations (Caffe + cuDNN), which achieve 500-824 images/second, while con- suming 235 W. Interestingly, this was achieved using Micros oft- designed FPGA boards and servers, an experimental project which integrates FPGAs into datacenter applications.

grandalf9y ago

This is very awesome. Could you add some more thoughts on the tooling and the development workflow? Is it possible to target the Xilinx hardware using only open source (or AWS proprietary) tools? Or is Vivado still required for advanced stuff?

aseipp9y ago

Vivado is required for all advanced features and programming Xilinx chips in general; like the sibling post said, there is no open FPGA toolchain implementation for Xilinx devices, especially for extremely high end ones like the ones being offered on the F1 (I expect they'd run at like, several thousand USD per device, on top of a several thousand dollar Vivado license for all the features).

It doesn't look like there's much AWS proprietary stuff here, though we'd have to wait for the SDK to be opened properly to be sure. I imagine it's mostly just making all of the stuff prepackaged and easily consumable for usage, and maybe some extra IP Cores or something for common stuff, and lots of examples. If you're already using Vivado I imagine using the F1/Cloud won't introduce any kind of major changes to what you expect.

RandomOpinion9y ago

The press release says:

"This AMI includes a set of developer tools that you can use in the AWS Cloud at no charge. You write your FPGA code using VHDL or Verilog and then compile, simulate, and verify it using tools from the Xilinx Vivado Design Suite (you can also use third-party simulators, higher-level language compilers, graphical programming tools, and FPGA IP libraries)."

So basically, buying a copy of Vivado is the minimum. There aren't any open source tools that directly output Xilinx FPGA bitstreams that I know of.

pjmlp9y ago

+1 for VHDL. :)

ap222139y ago

that repository is 404?

grandalf9y ago

Hmm this still isn't public. Any ETA?

cardigan9y ago

This is really cool. Do you think it will be possible to run MongoDB on an FPGA anytime soon?

Something12349y ago

I really hope that is sarcasm.

wyldfire9y ago

> Today we are launching a developer preview of the new F1 instance. In addition to building applications and services for your own use, you will be able to package them up for sale and reuse in AWS Marketplace.

Wow. An app store for FPGA IPs and the infrastructure to enable anyone to use it. That's really cool.

baybal29y ago

>Wow. An app store for FPGA IPs

I see people making video transcoder instances on day 1, and MPEGLA bankrupting Amazoners with lawsuits on day 2

toomuchtodo9y ago

I guess the online distribution of FPGA configurations was an eventual event?

https://github.com/aws/aws-fpga

tener9y ago

Pretty sure the AWS EULA makes you responsible for violating any IPs. I didn't read it, but if it doesn't say so already then their lawyers are crap.

duaneb9y ago

> I see people making video transcoder instances on day 1, and MPEGLA bankrupting Amazoners with lawsuits on day 2

Only if they distribute it through amazon. Just put the code up in a torrent; anyone can run it without MPEGLA knowing.

ChargingWookie9y ago

Yeah this is incredible for fpga users. There is now a market for freelance fpga developers

CalChris9y ago

Yes this is cool if you're already using FPGAs and yeah, there will be a market for FPGA designers.

But I also think this is FPGAs for the Rest of Us. Suddenly, FPGAs are available without having to buy some development board from Xilinx, install a toolchain, use said (shitty) toolchain ...

Me, I was thinking of FPGAs as being something I'd use down the road a few years, eventually, etc. But instead, I'm looking at this right now. This morning. Waiting for the damn 404 to go away on:

This reduces the barrier to entry. It also reduces the transaction cost (h/t Ronald Coase).

4 more replies

perlgeek9y ago

I guess this will be a game changer for FPGA-mineable digital currencies. Maybe not for Bitcoin, because people have invested heavily into dedicated mining hardware, but I'm interested to see what it'll do for the smaller altcoins.

wmf9y ago

For any cryptocurrency that's profitably mineable on AWS the difficulty immediately increases to the point that it's no longer profitable.

https://www.runabove.com/FPGAaaS.xml

mi100hael9y ago

> Maybe not for Bitcoin, because people have invested heavily into dedicated mining hardware

The thing is, it seems like people always invest heavily into dedicated hardware when using FPGAs. I'll be interested to see what people actually end up using this service for.

Sanddancer9y ago

I'm surprised that no one has linked to http://opencores.org/ opencores yet. They've got a ton of vhdl code under various open licenses. The project's been around since forever and is probably a good place to start if you're curious about fpga programming.

irq-19y ago

OVH is testing Altera chips - ALTERA Arria 10 GX 1150 FPGA Chip

ktta9y ago

If anyone is wondering how the FPGA board looks like

https://imgur.com/a/wUTIp

jeffnappi9y ago

Here's an even better view: http://www.bittware.com/xilinx/wp-content/uploads/sites/5/20...

via http://www.bittware.com/xilinx/product/xupp3r/

Thanks OP

alexforencich9y ago

Are they actually using that one, or is that just a board that happens to have that particular FPGA on it?

ktta9y ago

The FPGA in the image is the retail version. But it's more than likely that amazon is using the same one since they don't modify the GPUs although they purchase them on a much larger scale.

PeCaN9y ago

Good lord that is beautiful. What a massive FPGA.

anujdeshpande9y ago

Here's a post by Bunnie Huang, from a few months ago saying that Moore's law is dead and we will now have more of such stuff - http://spectrum.ieee.org/semiconductors/design/the-death-of-...

Pretty interesting read. Also, kudos to AWS !

prashnts9y ago

For my institute this is going to be _really_ useful for Genomics data processing because we can't justify buying expensive hardware for undergrad research. Using a FPGA hardware over cloud sounds almost magical!

r00fus9y ago

Wouldn't bandwidth/transfer costs basically nullify the computing gains? I know someone who used to be in genomics and cloud-anything was priced-out due to transfer costs.

You can't justify buying it but you can justify renting it? Has your department heard of amortization?

op00to9y ago

Most research finance departments are absolutely horrified at OpEx because any strange non-capital expenditure makes them look less efficient than the next research institute. This comes in handy when two labs are up for a grant, and they are equally qualified. The more efficient institute gets the grant. You can imagine asking for the lab credit card for EC2 time is not met with enthusiasm.

[1]: https://www.amazon.com/Digital-Design-Introduction-Verilog-H...

prashnts9y ago

Renting for a short period of time vs. buying the hardware are very different IMO.

krupan9y ago

The traditional EDA tool companies (Mentory, Cadence, Synopsys) all tried offering their tools under a could/SaaS model a few years back and nobody went for it. Chip designers are too paranoid about their source code leaking. I wonder if that attitude will hamper adoption of this model as well?

CamperBob29y ago

Chip designers are too paranoid about their source code leaking.

It's more an issue of being able to reproduce an existing build later on. You can't delegate ownership of the toolchain to the "cloud" (read: somebody else's computer) if you think you'll ever need to maintain the design in the future.

gricardo999y ago

I'm not so sure that is the issue. Currently you delegate ownership of the toolchain to the EDA vendor. Sure you have tools installed locally on your machines, but the tools typically have licenses that expire, so there's never a guarantee you can build it later with the exact same toolchain. Also EDA vendors end-of-life tools at some point, so even if you pay, that tool won't exist for ever, and the license will not be renewable.

I do think the issue with cloud is the concern over IP. There are not a lot of EDA vendors, so the chances that your competitor is also using that same EDA vendor is pretty high. I think companies are pretty wary of using a cloud hosted service where you could literally be running simulations on the same machines as your competitors. Can you imagine some cloud/hosting snafu resulting in your codebase being accessible by your competitors?

EDA companies also sell ASIC/FPGA IP, and VIP (verification IP), so there's also a pretty clear conflict of interest if they have access to your IP. So, if you're really paranoid, imagine the EDA vendors themselves picking through your IP and repackaging/reselling it as IP to other customers (encrypted of course so you can't readily identify the source code)?

gricardo999y ago

the EDA tools need your source code (HDL) to simulate or synthesize the design. But with these F1 instances, potentially the model doesn't have that problem. You develop/design an FPGA solution (some type of accelleration), then you provide it as a service. You don't expose your source code to your end customer, or the EDA tool companies.

You do however, potentially expose your source code to Amazon. But possibly not, if you do your design/testing on EDA tools under your control, then deploy FPGA build packages to the F1 instances for hardware testing.

technological9y ago

Quick Question: If anyone wants to learn programming an FPGA is learning C only way to go ? how hard is to learn and program in verilog/VHDL without electrical background ?

If anyone suggests links or books, please do

Thank You

ranman9y ago

I have a physics background but not an EE background. I found verilog pretty easy to grasp. VHDL took me a lot longer.

To get some basic ideas I always recommend the book code by charles petzold: https://www.amazon.com/Code-Language-Computer-Hardware-Softw...

It walks you through everything from the transistor to the operating system.

(Apparently I need to add that I work for AWS on every message so yes I work for AWS)

technological9y ago

Thank you

ktta9y ago

I would suggest Digital Design by Morris Mano[1]. It'll start off with basic intro from digital gates to FPGAs itself! And you really don't need any EE background for this book. This book starts from absolute basics and it'll also teach you Verilog along the way. And verilog is used more in the industry than VHDL(which more popular in Europe and in the US army for some reason).

I'm surprised where you got the idea of using C to program FPGAs, are you thinking of SystemC or OpenCL (they're both vastly different from each other)

I'm really surprised a sibling comment recommended the code book. It really meant to be a layman's reading about tech. It's a great book but it won't teach you programming FPGAs.

technological9y ago

I thought FPGA are programmed using low level languages like C

serge2k9y ago

220 new.

I am so glad I don't have to buy textbooks anymore.

- http://www.clifford.at/papers/2015/icestorm-flow/

pjc509y ago

C won't help you here; the Verilog/VHDL model is very different from normal languages due to intrinsic parallelism and the different techniques you need to use - you can't allocate anything at runtime, for example. As well as language quirks like '=' vs '=>' which trip up beginners.

pjmlp9y ago

No, you can also go the Ada way with VHDL.

One key difference to keep in mind for digital programming is that everything happens in parallel, unless explicitly serialized, which is the opposite of the usual software development most people know about.

matt_d9y ago

Here's a collection of get-started resources: http://tinyurl.com/fpga-resources

You can start with the EDA Playground tutorial, practice with HDLBits, while going through a book alongside (e.g., Harris & Harris) for examples, exercises, and best practices.

Similarly to a sibling thread, I'd also go with a free and open source flow, IceStorm (for the cheaply available iCE40 FPGAs): http://www.clifford.at/icestorm/

You can follow-up from the aforementioned tutorial and continue testing the designs on an iCE40 board -- starting here: http://hackaday.com/2015/08/19/learning-verilog-on-a-25-fpga...

Here are some really great presentations about it (slides & videos) by the creator (which can also serve in part as a general introduction):

- http://www.clifford.at/papers/2015/yosys-icestorm-etc/

Have fun!

CamperBob29y ago

Honestly, when it comes to learning logic design, if you're not already a programmer you're probably better off.

A C programmer will just spend a lot of time learning why the things they already know how to do are not useful.

duaneb9y ago

I actually recommend Haskell. The division between computation and I/O is strikingly similar (and equally difficult to avoid), and they're both declarative systems.

grandalf9y ago

I found VHDL much easier than C or verilog, I think it has to do with how your brain is wired.

brendangregg9y ago

Very interesting. I'd still like to see the JVM pick up the FPGA as a possible compile target, that way people could run apps that seamlessly used the FPGA where appropriate. I have mentioned this to Intel, who are promoting this technology (and also have a team that contributes to the JVM), but so far no one is stating publicly that they are working on such a thing.

BenoitP9y ago

An Intel VP mentioned it at JavaOne. He said they would provide FPGA support for OpenJDK. One central use case he mentioned would be big data & machine learning on Spark.

It was very pleasant surprise! The JVM world usually does not a have a great interface to the heterogenous world. I think it would yield tremendous benefits. FPGA-accelerated matrix multiplication, sorting, graph operations sound very appealing.

And then, as you mentioned, is the possibility of JITting things. http headers parsing ends up on the FPGA, and routes things to a message queue an actor can read. Or FPGA based actors; Does that make sense?

----

I have been unable to follow this development at all, however. Do you have any news about this project? I've been looking for a blog, a github or a mailing list, but can't find any.

__d9y ago

Maxeler sells a compiler (and hardware) for writing FPGA apps in Java: https://www.maxeler.com

theatrus29y ago

Because the model is so different there would be no benefit.

brendangregg9y ago

Intel already have a compression library as a proof of concept that shows a large benefit. The JVM compiler knows A) how many instructions each method is and B) how CPU hot it is. Just with a compression library, the compiler could identify very hot and very small methods and test them on the FPGA, in parallel to normal execution, and measure the performance difference, and switch to the FPGA if it was beneficial (which may be for <1% of methods). I believe the JVM already has much of the infrastructure to do such parallel method tests.

[1] http://www.graphics.stanford.edu/papers/rigel/

baybal29y ago

java hello world will not fit even into a 10 gigagate chip

CalChris9y ago

Funny thing is that bytecode is actually pretty dense, more dense than x86. But it's that everything else which makes Java images pretty huge.

duaneb9y ago

Tree shaking is not difficult; the JVM just hasn't had much need for trimming its runtime OR static compilation.

jsh3d9y ago

This is amazing! We have been developing a tool called Rigel at Stanford (http://rigel-fpga.org) to make it much easier to develop image processing pipelines for FPGA. We have seem some really significant speedups vs CPUs/GPUs [1].

petra9y ago

Are the speedups enough to negate the much higher cost of an FPGA vs a GPU ?

huntero9y ago

Given that the Amazon cloud is such a huge consumer of Intel's X86 processors, even using Amazon-tailored Xeon's, it's surprising that Amazon chose Xilinx over the Intel-owned Altera.

These Xilinx 16nm Virtex FPGA's are beasts, but Altera has some compelling choices as well. Perhaps some of the hardened IP in the Xilinx tipped the scales, such as the H.265 encode/decode, 100G EMAC, PCI-E Gen 4?

rphlx9y ago

Stratix10 (the large, Intel 14nm family) was delayed, delayed, delayed, and delayed some more. Last I heard it was supposed to be in high-prio customer hands by end of 2016, but unclear if that meant "more eng samples" or the actual, final production parts. Either way Xilinx beat them to market by approx 3-6 months AFAICT.

1024core9y ago

I'm a total FPGA n00b, so here's a dumb question: what can you do with this FPGA that you can't with a GPU?

OK, here's a concrete question: I have a vector of 64 floats. I want to multiply it with a matrix of size 64xN, where N is on the order of 1 billion. How fast can I do this multiplication, and find the top K elements of the resulting N-dimensional array?

cheez9y ago

FGPA = Field Programmable Gate Array.

Basically, you can create a custom "CPU" for your particular workflow. Imagine the GPU didn't exist and you couldn't multiply vectors of floats in parallel on your CPU. You could use a FPGA to write something to multiply a vector of floats in parallel without developing a GPU. It would probably not be as fast as a GPU or the equivalent CPU, but it would be faster than doing it serially.

Another way to put it: you can create a GPU with a FPGA, but not vice versa.

1024core9y ago

Thanks. But what's the capacity of this particular FPGA? How much can it "do" ? Surely it can't emulate a dozen Xeons; so what's the upper bound on what can be done on this FPGA?

deelowe9y ago

I can't answer the GPU comparison question, but I can answer the question of what you "can" do on a FPGA. Here are some example cores for FPGAs: http://opencores.org/projects

Hopefully, by browsing that list, you can see how FPGAs aren't really directly comparable to something like a GPU.

adamnemecek9y ago

Does this mean that ML on FPGA's will be more common? Can someone comment on viability of this? Would there be speedup and if so would it be large enough to warrant rewriting it all in VHDL/Verilog?

ktta9y ago

Yes, definitely to your first and last two questions!

It's not as viable as it resulting in a large scale FPGA movement anytime soon since the the industry and academia is heavy experienced with using GPUs. The software and libraries on GPUs, like CUDA, TensorFlow and other open source libraries are very mature and are optimized for GPUs. There will have to be libraries in Verilog (I for one I'm hoping to be a part of this movement for some time now, so I'd love it if anyone can guide me to anything going on)

There are some major to minor hurdles. Although some of them might not seem like much[0], here they are:

1. Till now deep learning/machine learning researchers have been okay with learning the software stack related to GPUs and there are widespread tutorials on how to get started, etc. Verilog/VHDL is a whole different ball game and a very different thought process. (I will address using OpenCL later)

2. The toolchain being used is not open source and it's not really hackable. Although that is not that important in this case, since you're starting off writing gates from scratch, there will be problems with licensing, bugs that will be fixed at snail's pace (if ever) till there will be a performant open source toolchain (if ever, but I have hope in the community). You'll have to learn to give up at a customer service rep if you try to get help, unlike open source libraries where to head to github's issue page and get help quickly with the main devs.

3. Although this move will make getting into the game a lot easier, it will still not change the fact that people want to have control over their devices and it will take time for people to realize they have to start buying FPGAs for their data centers and use them in production, which has to happen sometime soon. Using AWS's services won't be cost effective for long term usage, just like GPUs instances(I don't know how the spot instance siutation is going to look with the FPGA instances).

This comes with it's own slew of SW problems and good luck trying to understand what's breaking what with the much slower compilation times and terribly unhelpful debugging messages.

4. OpenCL to FPGA is a mess. Only a handful of FPGAs supported using OpenCL. So this has lead to there being little to no open source development surrounding OpenCL with FPGAs in mind. And no the OpenCL libraries for GPUs cannot be used for FPGAs. More likely as from scrach rewrite. There should be a LOT more tweaking done to get them to work. OpenCL to FPGA is not as seamless as one might think and is ridden with problems. This will again, take time and energy by people familiar with FPGAs who have been largely out of the OSS movement.

Although I might come of as pessimistic, I'm largely hopeful for the future in the FPGA space. This move isn't great news just because it lowers the barrier, but introduces a chip that will be much more popular and now we have a chip for which libraries can focus their support on, compared to before, when each dev had a different board. So you'll have to get familiar with this -- Virtex Ultrascale+ XCVU9P [1]

And also, what might be interesting to you is that, Microsoft is doing a LOT on research on this.

I think all of the articles on MS's use of FPGAs can explain better than I can in this comment.

Some links to get you started: MS's blog post: http://blogs.microsoft.com/next/2016/10/17/the_moonshot_that...

Papers: https://www.microsoft.com/en-us/research/publication/acceler...

Media outlet links: https://www.top500.org/news/microsoft-goes-all-in-for-fpgas-... https://www.wired.com/2016/09/microsoft-bets-future-chip-rep...

I'd suggest started with the wired article or MS's blog post. Exciting stuff.

[0]: Remember that academia moves at a much slower pace in getting adjusted to the latest and greatest software than your average developer. The reason CUDA is still so popular although it is closed source and you can only use nvidia's GPUs is that it got in the game first and wooed them with performance. Although OpenCL is comparably performant(although there are some rare cases where this isn't true), I still see CUDA regarded as the defacto language to learn in the GPGPU space.

[1]: https://www.xilinx.com/support/documentation/selection-guide...

klagermkii9y ago

Would love to know what that gets priced at per hour, as well as if they plan to have smaller FPGAs available while developing.

majke9y ago

Bitcoin mining. WPA2 brute forcing.

Maybe someone will finally find the triple-des password used at adobe for password hashing.

The possibilities are endless :)

problems9y ago

Mining is unlikely, with bitcoin at least. Bitcoin passed the FPGA stage and moved onto ASICs many years ago. There are some alt coins that are currently best mined on GPUs though and this may change that or put their claims to a real test.

rphlx9y ago

The boards used for this preview do not have enough memory bandwidth to pose even a modest threat to the latest batch of memory-hard GPU PoW algos.

jakozaur9y ago

So know anyone can run their High Frequency Trading business on their side :-P.

So much easier than buying hardware. Also deep learning works sometimes similarly. It's easier to play with on AWS with their hourly billing than buying hardware for many use cases.

zitterbewegung9y ago

The latencies from AWS servers to the exchanges probably would make HFT applications unfeasible.

spullara9y ago

Not when you use Amazon's new regions, us-fin-1, that is within the exchange's datacenter. /s?

https://www.xilinx.com/products/boards-and-kits/device-famil...

theocean1549y ago

Yeah you need to be in the colo. Also these aren't on the network card, the cpu introduces too much latency

RossBencina9y ago

> Xilinx UltraScale+ VU9P fabricated using a 16 nm process.

> 64 GiB of ECC-protected memory on a 288-bit wide bus (four DDR4 channels).

> Dedicated PCIe x16 interface to the CPU.

Does anyone know whether this is likely to be a plug-in card? and can I buy one to plug in to a local machine for testing?

jasonwatkinspdx9y ago

smilekzs9y ago

Even if it does, this can easily sell for $10k+.

duskwuff9y ago

Much more. The FPGA alone will push the parts cost to ~$30K-55K+.

errordeveloper9y ago

Yeah, the point is that you should need to buy any hardware even for development, which is the biggest win to me!

krupan9y ago

For complex designs the simulator that comes with the Vivado tools (Mentor's modelsim) is not going to cut it. I wonder if they are working on deals with Mentor (or competitors Cadence and Synopsys) to license their full-featured simulators.

Even better, maybe Amazon (and others getting into this space like Intel and Microsoft) will put their weight behind an open source VHDL/Verilog simulator. A few exist but they are pretty slow and way behind the curve in language support. Heck, maybe they can drive adoption of one of the up-and-coming HDL's like chisel, or create one even better. A guy can dream...

Nowadays, I don't believe you need a paid-for simulator like Questa, VCS, etc. I am developing verilog in my day job for FPGA's using icarus verilog (an open source simulator)which works fine for fairly large real world designs (I am also using cocotb for testing my code) and supports quite a lot of system verilog too.

LeifCarrotson9y ago

As someone who has little experience with FPGAs beyond some experiments with a Spartan-6 dev board that mostly involved learning to write VHDL and building a minimal CPU, I found the simulator to be of limited use. My tiny projects were small enough that the education simulator was plenty fast. It was nice when I didn't have the board available, and occasionally, the logic analyzer was useful when I didn't understand what my code was doing to a data structure. But usually, it was just a lot easier to simply flash the board and run the thing.

What's the use of a simulator when you can spin up an AWS instance and run your program on a real FPGA?

krupan9y ago

Simulations give you better controllability and better visibility. In other words, you can poke and prod every internal piece of the design in simulation land. In real hardware, not so easy.

That being said, you are far from alone as an FPGA developer in skipping sim and going straight to hardware. Tools like Xilinx's chipscope help with the visibility problem in real hardware too.

Cyph0n9y ago

> For complex designs the simulator that comes with the Vivado tools (Mentor's modelsim) is not going to cut it.

It's now called QuestaSim I believe. But are you sure it can't handle simulating large designs? If yes, what is the full-featured software from Mentor that can?

> Heck, maybe they can drive adoption of one of the up-and-coming HDL's like chisel

Chisel isn't a full-blown HDL from what I understand; it's only a DSL that compiles to Verilog. In other words, you'd still need a Verilog simulator to actually run your design.

krupan9y ago

Questa is the full blown tool. Modelsim is a step down and that's what comes with FPGA tools. Usually the version of modelsim that Xilinx and Altera ship is crippled performance wise.

the_duke9y ago

I'd be interested in practical use cases that come to your mind (like someone who commented about genomics data processing for a university).

What could YOU use this for professionally?

(I certainly always wanted to play around with an FPGA for fun...)

ktta9y ago

Machine Learning, most likely. See this: https://news.ycombinator.com/item?id=13074021

Monte Carlo sims for options pricing? I've done this before on FPGA, might have a go at doing it for this instance as a fun exercise to test the concept!

dx0349y ago

Not sure if that makes sense with the offer that Amazon has. The machines are huge, so either you're pricing a huge amount of options at a very high speed (which you'd probably do in-house with FPGAs that you own), or you'll be much cheaper using a good machine locally. Never found MC sims to be a bottleneck regarding time, but YMMV I guess?

koolba9y ago

Just wait till this gets combined with Lambda.

dx0349y ago

How would they do that? Since the FPGAs are not shared, I don't see how you could use it for very short-lived instances.

koolba9y ago

If the spin up time is fast enough then they could do it. Alternatively if it's active enough then there would a stream of requests processed by the same, already loaded, FPGA.

mozumder9y ago

Anyone have a hardware ZLIB implementation that I can drop into my Python toolchains as a direct replacement for ZLIB to compress web-server responses with no latency?

Could also use a fast JPG encoder/decoder as well.

fpgaminer9y ago

Why stop there? Hack your kernel to deliver network packets directly to the FPGA and then implement the whole server stack in the FPGA. Why settle for response times on the order of milliseconds when you can get nanoseconds?

But seriously, I'm open to ideas for technologies that you or anyone else needs implemented for these instances. Would make an interesting side business for me.

EDIT: I should point out that I'm an experienced "full-stack" engineer when it comes to FPGAs. I've implemented the FPGA code and the software to drive them. None of this software developed by "hardware guys" garbage.

mozumder9y ago

Speaking as a hardware guy, I think that's the ultimate goal as well :)

Been planning a NIC card that directly serves web apps via HDL for a while now...

wmf9y ago

Given that your EC2 Web server is limited to 20 Gbps, you're probably better off using Intel zlib and choosing the right compression level tradeoff. If you're willing to pay a fortune for 100 Gbps of zlib then the FPGA might be more appropriate.

For JPEG the GPU instances might be better.

mozumder9y ago

The problem is the latency associated with software Zlib, on the order of several milliseconds for a typical web response, and the CPU usage the entails, thereby limiting web request-response throughput.

kylek9y ago

I'm not totally up to date on it, but the RISC-V project has a tool (Chisel) that "compiles" to verilog... Interesting times for sure!

emmelaich9y ago

Also checkout clash-lang.org which takes an almost Haskell language to VHDL or Verilog or others.

_nrvs9y ago

_NOW_ things are getting really interesting!

mmosta9y ago

FPGA Instances are a game changer in every way.

Let this day be known as the beginning of the end general-compute infrastructure for internet scale services.

lisper9y ago

Newbie question: What do verilog and VHDL compile down to, i.e. what is the assembly/machine language for FPGAs?

JensSeiersen9y ago

A logic-gate/register netlist, i.e. a digital schematic of your design. This is done by a synthesizer program. It is then mapped to the available resources of your chosen FPGA, by a mapping program. Now you have the logic equivalent schematic using the FPGAs resources. Then the netlist is place-and-routed to fit it into the FPGA. If the design is to large/complex or the timing requirements to strict (to high a clock frequency), this phase can fail. This phase can also take many hours to complete, even on fast computers.

alexforencich9y ago

Binary FPGA configuration instructions - block RAM contents, routing switch configruation, register configuration an initial state, PLL/DCM configuration, and of course LUT contents. That's the final result of the toolchain, ready to get sent to the FPGA via JTAG or written into a configuration flash chip. It's the FPGA equivalent of machine code.

For the higher level object file or assembly language, that would be a netlist - essentially a digital representation of a schematic. The HDL is transformed into a netlist, then the netlist is optimized and the components converted from generics to device-specifc components, then the placement and routing is determined, and finally a 'bit' file is generated for actually configuring the FPGA. This process can take several hours for a large design.

anilgulecha9y ago

logic-gate layout?

jordz9y ago

Azure will be next I guess. They're already using FPGA based systems to power Bing and their Cognitive Services.

XnoiVeX9y ago

That's just anecdotal. No one has seen it. The Wired article sounded like content marketing.

dgacmu9y ago

They've published papers about it -- https://www.microsoft.com/en-us/research/publication/configu... -- they're giving talks about it -- Mark Russinovich was here a few weeks ago with a very long talk. Doug Burger and Derek Chiou are leading a lot of these efforts, and they're absolutely for real.

I'm not sure I agree with them that this is the right path forward (but they're smart and know their stuff, so I'm probably wrong), but it's absolutely for real.

brilliantcode9y ago

wow. that's what was going through my mind reading this article but it quickly dawned upon me (and sad) that I probably won't be able to build anything with it as we are not solving problems that require programmable hardware but euphoric nonetheless to see this kind of innovation coming from AWS.

Ceriand9y ago

Is there direct DMA access to/from the network interface bypassing the CPU?

alexforencich9y ago

Doesn't look like it from the article. That could be very interesting, but there could be network architecture constraints that prevent Amazon from providing that from the get-go. And it wouldn't be used in all cases, so that could burn a lot of switch ports. Seems like they're targeting more compute offload and less network appliance.

SEJeff9y ago

Are these custom fpgas or an Altera or Xylinx?

alexforencich9y ago

Looks like they are using Xilinx Ultrascale+ FPGAs.

AlphaWeaver9y ago

It appears to be a Xylinx.

jasoncchild9y ago

Oh man...this is freaking awesome!

n00b1019y ago

This is huge

j / k navigate · click thread line to collapse

204 comments

fpgaminer9y ago

These FPGAs are absolutely _massive_ (in terms of available resources). AWS isn't messing around.

neurotech19y ago

Any thoughts on why AWS/Xilinx didn't go for a mid-range FPGA to help validate customer requirements?

My guess is that Amazon will have to be very careful not to price themselves out of the market, for mid-range Deep Learning based cloud apps.

Wild guestimate but I think it'll cost more than $20/hr for each instance.

fpgaminer9y ago

That's my guess anyway.

phairoh9y ago

Thank you so much for these posts, fpgaminer. They've been extremely helpful to me in framing how these things could be used.

deafcalculus9y ago

What can each one of those 2.5 million "logic elements" do? Last time I used an FPGA, they were mostly made up of 4-bit LUTs.

fpgaminer9y ago

al2o3cr9y ago

[0] https://github.com/cseed/arachne-pnr

ranman9y ago

If you don't click through to read about this: you can write an FPGA image in verilog/VHDL and upload it... and then run it. To me that seems like magic.

HDK here: https://github.com/aws/aws-fpga

(I work for AWS)

cottonseed9y ago

This is so awesome, I can't even. I wrote arachne-pnr [0] to learn about FPGAs to get ready for this day. Just signed up, can't way to play with these!

I hope the growing popularity of FPGAs for general-purpose computing will help push the vendors to open up bitstreams and invest in open-source design tools.

makapuf9y ago

Wow Clifford is that you ? I hope this, exciting as it may be, won't make you leave open fpga efforts for the dark side (saw your talk last Fosdem, was very exciting)

orbifold9y ago

adamdecaf9y ago

Is that repo going to be made public? It looks to be private right now.

ranman9y ago

Yup, sorry -- working on fixing that now. Check back in a bit.

ranman9y ago

If you guys are curious about these announcements I'll be recapping them and going into more detail on twitch.tv/aws at 12:30 pacific

Huh? Isn't Twitch just for gaming content?

shaklee39y ago

What's the cost?

noselasd9y ago

So it's tied to the PCIe bus - how do you interact with your FPGA once you programmed it - are there general drivers you can use, or do you also have to create a linux driver to talk to your FPGA ?

https://arxiv.org/pdf/1602.04283v1.pdf

eliben9y ago

I'm not sure what you mean by the "magic" part here, can you please clarify?

[background: many years of writing VHDL specifically for FPGAs, using various dev boards and custom boards]

ebrewste9y ago

zyngaro9y ago

cma9y ago

How do FPGAs compare with GPUs for the inference stage of Deep Learning algorithms? Can they accelerate it a lot?

nl9y ago

No, but they do use less power:

grandalf9y ago

aseipp9y ago

RandomOpinion9y ago

The press release says:

So basically, buying a copy of Vivado is the minimum. There aren't any open source tools that directly output Xilinx FPGA bitstreams that I know of.

pjmlp9y ago

+1 for VHDL. :)

ap222139y ago

that repository is 404?

grandalf9y ago

Hmm this still isn't public. Any ETA?

cardigan9y ago

This is really cool. Do you think it will be possible to run MongoDB on an FPGA anytime soon?

Something12349y ago

I really hope that is sarcasm.

wyldfire9y ago

Wow. An app store for FPGA IPs and the infrastructure to enable anyone to use it. That's really cool.

baybal29y ago

>Wow. An app store for FPGA IPs

I see people making video transcoder instances on day 1, and MPEGLA bankrupting Amazoners with lawsuits on day 2

toomuchtodo9y ago

I guess the online distribution of FPGA configurations was an eventual event?

https://github.com/aws/aws-fpga

tener9y ago

Pretty sure the AWS EULA makes you responsible for violating any IPs. I didn't read it, but if it doesn't say so already then their lawyers are crap.

duaneb9y ago

> I see people making video transcoder instances on day 1, and MPEGLA bankrupting Amazoners with lawsuits on day 2

Only if they distribute it through amazon. Just put the code up in a torrent; anyone can run it without MPEGLA knowing.

ChargingWookie9y ago

Yeah this is incredible for fpga users. There is now a market for freelance fpga developers

CalChris9y ago

Yes this is cool if you're already using FPGAs and yeah, there will be a market for FPGA designers.

But I also think this is FPGAs for the Rest of Us. Suddenly, FPGAs are available without having to buy some development board from Xilinx, install a toolchain, use said (shitty) toolchain ...

Me, I was thinking of FPGAs as being something I'd use down the road a few years, eventually, etc. But instead, I'm looking at this right now. This morning. Waiting for the damn 404 to go away on:

This reduces the barrier to entry. It also reduces the transaction cost (h/t Ronald Coase).

4 more replies

perlgeek9y ago

wmf9y ago

For any cryptocurrency that's profitably mineable on AWS the difficulty immediately increases to the point that it's no longer profitable.

https://www.runabove.com/FPGAaaS.xml

mi100hael9y ago

> Maybe not for Bitcoin, because people have invested heavily into dedicated mining hardware

The thing is, it seems like people always invest heavily into dedicated hardware when using FPGAs. I'll be interested to see what people actually end up using this service for.

Sanddancer9y ago

irq-19y ago

OVH is testing Altera chips - ALTERA Arria 10 GX 1150 FPGA Chip

ktta9y ago

If anyone is wondering how the FPGA board looks like

https://imgur.com/a/wUTIp

jeffnappi9y ago

Here's an even better view: http://www.bittware.com/xilinx/wp-content/uploads/sites/5/20...

via http://www.bittware.com/xilinx/product/xupp3r/

Thanks OP

alexforencich9y ago

Are they actually using that one, or is that just a board that happens to have that particular FPGA on it?

ktta9y ago

The FPGA in the image is the retail version. But it's more than likely that amazon is using the same one since they don't modify the GPUs although they purchase them on a much larger scale.

PeCaN9y ago

Good lord that is beautiful. What a massive FPGA.

anujdeshpande9y ago

Here's a post by Bunnie Huang, from a few months ago saying that Moore's law is dead and we will now have more of such stuff - http://spectrum.ieee.org/semiconductors/design/the-death-of-...

Pretty interesting read. Also, kudos to AWS !

prashnts9y ago

r00fus9y ago

Wouldn't bandwidth/transfer costs basically nullify the computing gains? I know someone who used to be in genomics and cloud-anything was priced-out due to transfer costs.

You can't justify buying it but you can justify renting it? Has your department heard of amortization?

op00to9y ago

[1]: https://www.amazon.com/Digital-Design-Introduction-Verilog-H...

prashnts9y ago

Renting for a short period of time vs. buying the hardware are very different IMO.

krupan9y ago

CamperBob29y ago

Chip designers are too paranoid about their source code leaking.

gricardo999y ago

technological9y ago

Quick Question: If anyone wants to learn programming an FPGA is learning C only way to go ? how hard is to learn and program in verilog/VHDL without electrical background ?

If anyone suggests links or books, please do

Thank You

ranman9y ago

I have a physics background but not an EE background. I found verilog pretty easy to grasp. VHDL took me a lot longer.

To get some basic ideas I always recommend the book code by charles petzold: https://www.amazon.com/Code-Language-Computer-Hardware-Softw...

It walks you through everything from the transistor to the operating system.

(Apparently I need to add that I work for AWS on every message so yes I work for AWS)

technological9y ago

Thank you

ktta9y ago

I'm surprised where you got the idea of using C to program FPGAs, are you thinking of SystemC or OpenCL (they're both vastly different from each other)

I'm really surprised a sibling comment recommended the code book. It really meant to be a layman's reading about tech. It's a great book but it won't teach you programming FPGAs.

technological9y ago

I thought FPGA are programmed using low level languages like C

serge2k9y ago

220 new.

I am so glad I don't have to buy textbooks anymore.

- http://www.clifford.at/papers/2015/icestorm-flow/

pjc509y ago

pjmlp9y ago

No, you can also go the Ada way with VHDL.

matt_d9y ago

Here's a collection of get-started resources: http://tinyurl.com/fpga-resources

You can start with the EDA Playground tutorial, practice with HDLBits, while going through a book alongside (e.g., Harris & Harris) for examples, exercises, and best practices.

Similarly to a sibling thread, I'd also go with a free and open source flow, IceStorm (for the cheaply available iCE40 FPGAs): http://www.clifford.at/icestorm/

You can follow-up from the aforementioned tutorial and continue testing the designs on an iCE40 board -- starting here: http://hackaday.com/2015/08/19/learning-verilog-on-a-25-fpga...

Here are some really great presentations about it (slides & videos) by the creator (which can also serve in part as a general introduction):

- http://www.clifford.at/papers/2015/yosys-icestorm-etc/

Have fun!

CamperBob29y ago

Honestly, when it comes to learning logic design, if you're not already a programmer you're probably better off.

A C programmer will just spend a lot of time learning why the things they already know how to do are not useful.

duaneb9y ago

I actually recommend Haskell. The division between computation and I/O is strikingly similar (and equally difficult to avoid), and they're both declarative systems.

grandalf9y ago

I found VHDL much easier than C or verilog, I think it has to do with how your brain is wired.

brendangregg9y ago

BenoitP9y ago

An Intel VP mentioned it at JavaOne. He said they would provide FPGA support for OpenJDK. One central use case he mentioned would be big data & machine learning on Spark.

----

I have been unable to follow this development at all, however. Do you have any news about this project? I've been looking for a blog, a github or a mailing list, but can't find any.

__d9y ago

Maxeler sells a compiler (and hardware) for writing FPGA apps in Java: https://www.maxeler.com

theatrus29y ago

Because the model is so different there would be no benefit.

brendangregg9y ago

[1] http://www.graphics.stanford.edu/papers/rigel/

baybal29y ago

java hello world will not fit even into a 10 gigagate chip

CalChris9y ago

Funny thing is that bytecode is actually pretty dense, more dense than x86. But it's that everything else which makes Java images pretty huge.

duaneb9y ago

Tree shaking is not difficult; the JVM just hasn't had much need for trimming its runtime OR static compilation.

jsh3d9y ago

petra9y ago

Are the speedups enough to negate the much higher cost of an FPGA vs a GPU ?

huntero9y ago

Given that the Amazon cloud is such a huge consumer of Intel's X86 processors, even using Amazon-tailored Xeon's, it's surprising that Amazon chose Xilinx over the Intel-owned Altera.

rphlx9y ago

1024core9y ago

I'm a total FPGA n00b, so here's a dumb question: what can you do with this FPGA that you can't with a GPU?

cheez9y ago

FGPA = Field Programmable Gate Array.

Another way to put it: you can create a GPU with a FPGA, but not vice versa.

1024core9y ago

Thanks. But what's the capacity of this particular FPGA? How much can it "do" ? Surely it can't emulate a dozen Xeons; so what's the upper bound on what can be done on this FPGA?

deelowe9y ago

I can't answer the GPU comparison question, but I can answer the question of what you "can" do on a FPGA. Here are some example cores for FPGAs: http://opencores.org/projects

Hopefully, by browsing that list, you can see how FPGAs aren't really directly comparable to something like a GPU.

adamnemecek9y ago

Does this mean that ML on FPGA's will be more common? Can someone comment on viability of this? Would there be speedup and if so would it be large enough to warrant rewriting it all in VHDL/Verilog?

ktta9y ago

Yes, definitely to your first and last two questions!

There are some major to minor hurdles. Although some of them might not seem like much[0], here they are:

This comes with it's own slew of SW problems and good luck trying to understand what's breaking what with the much slower compilation times and terribly unhelpful debugging messages.

And also, what might be interesting to you is that, Microsoft is doing a LOT on research on this.

I think all of the articles on MS's use of FPGAs can explain better than I can in this comment.

Some links to get you started: MS's blog post: http://blogs.microsoft.com/next/2016/10/17/the_moonshot_that...

Papers: https://www.microsoft.com/en-us/research/publication/acceler...

Media outlet links: https://www.top500.org/news/microsoft-goes-all-in-for-fpgas-... https://www.wired.com/2016/09/microsoft-bets-future-chip-rep...

I'd suggest started with the wired article or MS's blog post. Exciting stuff.

[1]: https://www.xilinx.com/support/documentation/selection-guide...

klagermkii9y ago

Would love to know what that gets priced at per hour, as well as if they plan to have smaller FPGAs available while developing.

majke9y ago

Bitcoin mining. WPA2 brute forcing.

Maybe someone will finally find the triple-des password used at adobe for password hashing.

The possibilities are endless :)

problems9y ago

rphlx9y ago

The boards used for this preview do not have enough memory bandwidth to pose even a modest threat to the latest batch of memory-hard GPU PoW algos.

jakozaur9y ago

So know anyone can run their High Frequency Trading business on their side :-P.

So much easier than buying hardware. Also deep learning works sometimes similarly. It's easier to play with on AWS with their hourly billing than buying hardware for many use cases.

zitterbewegung9y ago

The latencies from AWS servers to the exchanges probably would make HFT applications unfeasible.

spullara9y ago

Not when you use Amazon's new regions, us-fin-1, that is within the exchange's datacenter. /s?

https://www.xilinx.com/products/boards-and-kits/device-famil...

theocean1549y ago

Yeah you need to be in the colo. Also these aren't on the network card, the cpu introduces too much latency

RossBencina9y ago

> Xilinx UltraScale+ VU9P fabricated using a 16 nm process.

> 64 GiB of ECC-protected memory on a 288-bit wide bus (four DDR4 channels).

> Dedicated PCIe x16 interface to the CPU.

Does anyone know whether this is likely to be a plug-in card? and can I buy one to plug in to a local machine for testing?

jasonwatkinspdx9y ago

smilekzs9y ago

Even if it does, this can easily sell for $10k+.

duskwuff9y ago

Much more. The FPGA alone will push the parts cost to ~$30K-55K+.

errordeveloper9y ago

Yeah, the point is that you should need to buy any hardware even for development, which is the biggest win to me!

krupan9y ago

LeifCarrotson9y ago

What's the use of a simulator when you can spin up an AWS instance and run your program on a real FPGA?

krupan9y ago

Simulations give you better controllability and better visibility. In other words, you can poke and prod every internal piece of the design in simulation land. In real hardware, not so easy.

That being said, you are far from alone as an FPGA developer in skipping sim and going straight to hardware. Tools like Xilinx's chipscope help with the visibility problem in real hardware too.

Cyph0n9y ago

> For complex designs the simulator that comes with the Vivado tools (Mentor's modelsim) is not going to cut it.

It's now called QuestaSim I believe. But are you sure it can't handle simulating large designs? If yes, what is the full-featured software from Mentor that can?

> Heck, maybe they can drive adoption of one of the up-and-coming HDL's like chisel

Chisel isn't a full-blown HDL from what I understand; it's only a DSL that compiles to Verilog. In other words, you'd still need a Verilog simulator to actually run your design.

krupan9y ago

Questa is the full blown tool. Modelsim is a step down and that's what comes with FPGA tools. Usually the version of modelsim that Xilinx and Altera ship is crippled performance wise.

the_duke9y ago

I'd be interested in practical use cases that come to your mind (like someone who commented about genomics data processing for a university).

What could YOU use this for professionally?

(I certainly always wanted to play around with an FPGA for fun...)

ktta9y ago

Machine Learning, most likely. See this: https://news.ycombinator.com/item?id=13074021

Monte Carlo sims for options pricing? I've done this before on FPGA, might have a go at doing it for this instance as a fun exercise to test the concept!

dx0349y ago