The Nvidia DGX-1 Deep Learning Supercomputer in a Box (opens in new tab)

(nvidia.com)

215 pointsdphidt10y ago98 comments

98 comments

Just for some perspective, a little over 10 years ago, this $130k turnkey installation would sit at #1 in TOP500, easily beating out hundred-million-dollar initiatives like NEC's Earth Simulator and IBM's BlueGene/L: http://www.top500.org/lists/2005/06/ (170 TFLOPS vs. 137 TFLOPS)

At the other end, even a single GTX 960 would make it onto the list, placing in the 200s.

trsohmers10y ago

The 170 TFLOPs number that NVIDIA gives out is for FP16, while the Top 10 list gives its number for for FP64. The P100 that makes up this NVIDIA box gives about 5.3TFLOPs per card, or a total of 42.4TFLOPs for the whole box.

Sure, you can say that deep learning doesn't need FP64, but it is REALLY unfair to compare this to anything on the TOP500 list, especially when you consider the fact that this is not balanced in terms of memory size or bandwidth (in relation to the number of FLOPs) when you compare it to any real supercomputer class system.

0x07c010y ago

Was thinking the same, but look at the memory bandwidth of this thing, 720GB/sec.* That is the number you should look at, and that is a sweet number... Also the NVLink tech looks nice for multi-GPU/heterogeneous computing (I guess the latency is the most important, but no idea how that is ). Do's some one know if the Xeons are connected to the GPUs with NVLink or are they on PCIE ? (I know the new POWER chips have NVlink but haven’t read that Intel supports it.)

*http://www.anandtech.com/show/10222/nvidia-announces-tesla-p...

1 more reply

jlebar10y ago

Still, how many of these boxes are we talking about to match the performance of the #1 from the top 500 in 2005? 10? 20? That's still under $3m for 20, which is pretty impressive to me.

1 more reply

doyoulikeworms10y ago

Out of curiosity, what are some problems/solutions that require FP64?

1 more reply

lern_too_spel10y ago

That is 170 TFLOPS Rpeak (theoretical performance assuming you could find a workload that doesn't need to wait for data movement) at half precision vs. 137 TFLOPS Rmax (usable performance on a dummy linear algebra problem) at double precision. No, it would not top the list.

KKKKkkkk110y ago

Theoretical peak flops rate is useless for indicating performance nowadays. There are new benchmarks that take memory and network performance into account such as HPCG and HPGMG. On these benchmarks, throughput-oriented machines such as the ones Nvidia sells do not look good at all.

ssh4210y ago

already mentioned 16FP 170 TFLOPS (that is 64 FP 42.5TFLOPS) of DGX-1. There is also issue of GPU vs CPU: basically you couldn't directly compare these operations on same scale. You could easily drop 100x of your GPU performance at bad case scenario. Basic idea of GPU that you could possible gain sometimes extra 1000s times performance

aconz210y ago

Check out the specs here: http://images.nvidia.com/content/technologies/deep-learning/...

though I'm most curious about what motherboard is in there to support NVLink and NVHS.

Good overview of Pascal here: https://devblogs.nvidia.com/parallelforall/inside-pascal/

1 question: will we see NVLink become an open standard for use in/with other coprocessors?

1 gripe: they give relative performance data as compared to a CPU -- of course its faster than a CPU

dgacmu10y ago

You mean you're not surprised that a machine with 8 GPUs, apparently costing $129k USD (from comment below), can outperform a single CPU? :)

(Of course, a better metric is that it's getting ~56x the performance at probably ~10x the TDP, but that's not surprising for a GPU with the current state of deep learning code.)

To their credit, the thermal and power engineering needed to get that dense a compute deployment is challenging. (bt, dt, have the corpses of power supplies to show for it.) But the price means that it's going to be limited to hyper-dense HPC deployments by companies that don't have the resources to engineer their own for substantially less money, such as Facebook's Big Sur design: https://code.facebook.com/posts/1687861518126048/facebook-to... . And, of course, the academics and hobbyists will continue to use consumer GPUs , which give much better performance/$ but aren't nearly as HPC-friendly.

aconz210y ago

To be fair, they are comparing it to a dual-socket CPU; which is twice as fair as comparing to a single!!

What I was getting more at was: I want to know the relative performance compared to another 8 Tesla box. I know comparing apples isn't good marketing, but c'mon.

1 more reply

astrodust10y ago

$129K buys you a lot of dual 22-core servers.

3 more replies

virtuallynathan10y ago

It looks like it uses a separate daughterboard that houses the GPUs + NVLink, connected to the main motherboard using quad Infiniband EDR (400Gbps) + RDMA. http://images.anandtech.com/doci/10225/SSP_85.JPG

pinewurst10y ago

The diagram is confusing, but the GPUs are connected to the NVLink matrix which is connected to the motherboard via the PLX PCIe switches. The quad IB/dual 10GbE are separate IO attached to the motherboard.

https://devblogs.nvidia.com/parallelforall/inside-pascal/

1 more reply

phelm10y ago

I am looking forward to OpenCL catching up with CUDA in maturity and adoption, so that NVidia's monopoly in Silicon for deep learning will come to an end.

blt10y ago

I'm at GPU Technology Conference, where this computer was announced this morning. The amount of "wood behind the arrow" NVidia has for AI is insane. Even though the current demographic of GPU development is full of HPC simulations, physics, graphics... it's obvious that their biggest thrust is in machine learning. I don't think OpenCL can compete with this amount of money and enthusiasm. NVidia is rich and their engineers are very good. Some big changes would need to happen before OpenCL catches up to CUDA.

dharma110y ago

I think it's just really bad management from AMD. Took them ages to wake up, and now they have what looks like a relatively small team on their Boltzmann initiative. Remains to be seen what happens to it.

How much do you think it would really cost to develop an OpenCL equivalent of CuDNN (even a stripped down version, just fast)? I know AMD are struggling but we are talking about allocating a handful of talented engineers

pjmlp10y ago

For that to happen OpenCL has to be at the same level as CUDA in language support and tooling.

Having C only wasn't a good idea. NVidia was quite clever in giving first class treatment to C++, Fortran and any compiler vendor that wished to target PTX.

Also the visual debugging tools are quite good.

Khronos apparently needed to be hit hard to realise that not everyone wants to be stuck with C for HPC in the 21st century.

Also although Apple is the creator of OpenCL, they don't seem to give much love to it.

Then you have Google caring about it's Renderscript dialect, which doesn't help to the overall uptake in OpenCL.

There isn't a monopoly, rather vendors that lacked the perception to appeal to the developers wanted to have as tooling and performance.

Anyone is free to go use OpenCL, use C or a language with a compiler with a C target, do printf debugging and feel free.

Are any vendors already doing SPIR support?

bgalbraith10y ago

What monopoly? You totally have a choice, it's just that NVIDIA made a large bet on GPGPU and it is paying off for them. You don't see AMD heavily pushing their cards for compute purposes or developing computational developer relations.

RussianCow10y ago

You often don't have a choice because a large amount of GPGPU software is written using CUDA, which is Nvidia-specific.

1 more reply

jlebar10y ago

clang now has a mostly-working CUDA frontend (disclaimer, I work on it). And it has an AMD GPU backend (whether this is in a good state I don't know). I don't expect that putting these pieces together would be a huge project.

DeepYogurt10y ago

Me too. I really want to see some benchmarks between cuda code and opencl code generated from cuda with AMDs compiler. Actually if anyone has a geforce/tesla get on this!

Robadob10y ago

I haven't seen any recent benchmarks, but ones from 2011 all seemed to show CUDA and OpenCL on open footing in terms of performance when optimised properly.[1][2] CUDA simply had better library support, and a more well defined and uniform architecture to target. Whereas OpenCL is likely to require more programming to fill in the gaps for library support, and different optimisations depending on the architecture you wish to target. I'm guessing since then, the CUDA compiler may have improved somewhat in terms of optimisation based on some micro-benchmarking research I was looking the at the other day.

There's also Intel's MIC to consider now to, although that has a vastly different architecture to GPU. Again performance was similar between MIC and GPU in 2013[3], each performing better where their architecture was more suited, GPUs were capable of providing double the bandwidth for random access data.

In terms of AMD vs NVIDIA, I've not looked into it, I doubt AMD has anything to really compete with NVIDIAs current GPU accelerated compute lines. However again there was always that distinction (re bitcoin?) that AMD cards have better integer arithmetic and NVIDIA better float arithmetic.

Disclaimer: I use CUDA in my research, never tried OpenCL.

[1] http://arxiv.org/abs/1005.2581

[2] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=604719...

[3] http://arxiv.org/abs/1311.0378

1 more reply

nightski10y ago

What hardware could OpenCL even run on that would come remotely close to what this system has to offer?

olympus10y ago

OpenCL runs on Nvidia GPUs, so you could do an apples to apples comparison on this system.

2 more replies

nl10y ago

The problem is that no one[1] uses OpenCL because the performance isn't there. There is little sign of that changing, too.

pjmlp10y ago

Another reasons are language support and tooling.

CUDA had Fortran and C++ since day one and thanks to PTX was quite easy to add support for other languages.

Whereas OpenCL was stuck on "C only" model from Khronos, which forced everyone to use C or generate C code and be constrained to the device drivers.

This has been seen as such a big issue that SPIR and C++ SPIR got introduced with OpenCL 2.0.

Another very important one is debugging support. Last time I checked no one had visual tooling at the same level as NVidia's one.

badminton110y ago

Costs $129,000 and needs 3.2 kilowatts to run.

dogma113810y ago

3.2KW isn't that insane for a server, you can buy high end desktop PSU's of 1.6KW (I'm running a 1200W one) if you are using multiple GPU's, a high end CPU, 32-64GB of memory and loads of storage coupled with overclocking and the substantial cooling required it's not that hard to get to around 1KW power consumption on a high end gaming rig these days.

olympus10y ago

To me, $129k isn't surprising since it is only going to be bought by researchers with big budgets. Small-timers will still build 3x GTX980 systems for under $5k.

3.2 KILOwatts sounded insane to me, but I suppose you'll have your own server rack to put it in if you can afford to buy one of these.

TkTech10y ago

3.2kw isn't that insane considering what you're getting out of it. A coffee pot is 1kw, a toaster is 1.2kw, an electric broiler is 3.6kw. Running costs would be a very tiny part of any budget. Ends up being $9.216/day assuming peak costs, peak usage, and 24h operation.

astrodust10y ago

If that sounds insane, you're going to lose your mind when you realize how many KILOwatts your oven uses.

3.2KW is less than a dishwasher.

2 more replies

Fomite10y ago

"To me, $129k isn't surprising since it is only going to be bought by researchers with big budgets" Yeah, this is essentially "Big chunk of a computational researcher's startup budget" or an infrastructure grant.

jra10110y ago

More detail on the GPUs in the system:

https://devblogs.nvidia.com/parallelforall/inside-pascal/

Robadob10y ago

Have they published a copy of the video of the autonomous car trained with unsupervised learning from the keynote anywhere?

I'd love to show it to my father.

madengr10y ago

Note the P100 is 20 Tflops for half precision (16 bit). For general purpose GPU (I use them for EM simulation) I assume one would want 32-bit, which is 10 Tflops. But still looks much much better for 64-bit computations than the previous generation

pareci10y ago

Curious. Why do you post here when every other comment is random posturing?

madengr10y ago

They were touting 20 Tflops, but that's only for FP16, which isn't useful for many engineering computations that use GPU. I already can hit 2 Tflop F32 with two K20. It's a nice improvement over what I have now, but nothing astronomical.

sp33210y ago

Wow, I didn't realize they were shipping HBM2 already. 720GB/s - with only 16GB of RAM, you can read it all in 22 milliseconds!

DeepYogurt10y ago

They're not. All that was mentioned in the talk was that this chip is coming soon.

sp33210y ago

There's a big green "Order Now" button about 1/3 of the way down the page.

1 more reply

nickpeterson10y ago

I have to wonder about intel and their Xeon Phi range. Last I checked they were supposed to launch a followup late last year that never manifested. Now we're 4 months in 2016 and still no new phi's.

Couple that with the fact that they want you to use their compilers (extremely expensive), on a specialized system that can support the card, and you get a platform that nobody other than supercomputer companies can reasonably use. Meanwhile any developer who want to try something with cuda can drop $200 dollars on a GPU and go, then scale accordingly. I think intel somewhat acknowledged this by having a firesale on phi cards and dev licenses last year but it was only for a passively cooled model (really only works well in servers, not workstations).

Intel do this:

  - Offer a $200-400 XEON PHI CARD
  - Include whatever compiler needed to use it with the card
  - Make this easily buyable
  - Contribute ports of Cuda-based frameworks over to Xeon Phi

I feel like they could do this pretty easily, even if it lost money, it's pennies compared to what they're going to lose if nvidia keeps trumping them on machine learning. They need to give dev's the tooling and financial incentive to write something for Phi instead of cuda, right now it completely doesn't exist and frameworks basically use Cuda by default.

If you're AMD, do the same thing but replace the phrase Xeon Phi with Radeon/Firepro

manav10y ago

$129k for this machine. In the keynote its interesting that they mentioned the product line being: "Tesla M40 for hyperscale, K80 for multi-app HPC, P100 for scales very high, and DGX-1 for the early adopters".

The GP100/P100 with the 16nm process probably gives a considerable performance/power advantage over the Tesla... but this gives me the feeling that we may not see consumer or workstation-level Pascal boards for a while.

svensken10y ago

I was wondering about this too, the way they plugged old K80's at the end for non-deep-learning applications. Either they're clever about keeping multiple product lines alive (more profits!) or it's a big cop-out (they're hiding something about P100 that makes it a bad choice for GPGPU - maybe price?)

blakes10y ago

$129k seem extremely fair for what you get actually, in my experience.

Coding_Cat10y ago

Wait, how many chips did they cram in there that they're getting 170 TFlops. Even at a very generous 10 TFLOP per chip that would be 17 chips.

krasin10y ago

NVIDIA Tesla P100 has 21 TeraFLOPS of FP16 performance by their words. So they got 8 chips there.

jsheard10y ago

Yep, they showed a diagram of how it fits together: http://i.imgur.com/xk1daFG.jpg

2 more replies

Coding_Cat10y ago

Ah, half-floats. That explains it. Still pretty high but realistic at least.

intrasight10y ago

Is also fun to contemplate that in about five years you'll likely be able to buy one of these on eBay for about $10K.

dougmany10y ago

This announcement reminds me of the part of Outliers that spelled out how Bill Gates and others became who they are because they had access to very expensive equipment before anyone else did (and spent 10K hours on it).

AndrewKemendo10y ago

How does this compare to some of the systems provided by cloud providers? Seems like requiring an on-site capability is a hurdle for integration if you already have your data on a cloud provider.

[1] https://aws.amazon.com/machine-learning/ [2] https://azure.microsoft.com/en-us/services/machine-learning/

jdcarter10y ago

I would argue that this box is probably targeted at cloud providers. The Nvidia GRID boards are similar--they're not for consumers, but for GPU/Gaming-as-a-service providers.

ansible10y ago

The unified memory architecture with the Pascal GP100 is pretty sweet. That will make it easier to work with large data sets.

visarga10y ago

It's good to see powerful machine learning hardware come out. Much of the progress in ML has come from hardware speedup. It will empower the next years of research.

bpires10y ago

I wonder how much faster the new Tesla P100 is compared to the Tesla K40 in training neural networks. The K40s were the best available GPUs for training deep neural networks.

aperrien10y ago

Does anyone know if the Pascal architecture is built using stacked cores? Or is this one of those applications where thermal problems keep that technique from being used?

wmf10y ago

No, the Pascal GPU itself is not stacked. Die stacking makes almost no sense for processors.

pmorici10y ago

Anyone have any idea of how the GPUs in this machine compare to the GPUs in their high end gaming products?

0x07c010y ago

Tesla's has more Double Float cores compared to gaming cards.

nshm10y ago

Looks like a research in machine learning will only be done in huge corporations. You'll need an amount of funding comparable to LHC.

Time to use better models like kernel ensembles, maybe they are not that accurate, but they are easier to train on a single CPU.

Houshalter10y ago

You can already do deep learning on cheap consumer hardware. And $100k is expensive, but it's nowhere near LHC levels.

Fomite10y ago

This price point is extremely accessible to most major research universities as well.

caycep10y ago

Does that mean Pascal release is just around the corner?!?

-unreformed box builder

chm10y ago

Any idea how much this costs?

dazzeruk10y ago

The Nvidia slides had it at $129,000 a pop

rckclmbr10y ago

Wow, cheap, great for startups!

2 more replies

dharma110y ago

I'll take 5

dharma110y ago

was hoping they would announce Pascal GTX's. Oh well. Computex I guess

agumonkey10y ago

What a peculiar pascaline.

j / k navigate · click thread line to collapse

98 comments

mortenjorck10y ago

At the other end, even a single GTX 960 would make it onto the list, placing in the 200s.

trsohmers10y ago

0x07c010y ago

*http://www.anandtech.com/show/10222/nvidia-announces-tesla-p...

1 more reply

jlebar10y ago

Still, how many of these boxes are we talking about to match the performance of the #1 from the top 500 in 2005? 10? 20? That's still under $3m for 20, which is pretty impressive to me.

1 more reply

doyoulikeworms10y ago

Out of curiosity, what are some problems/solutions that require FP64?

1 more reply

lern_too_spel10y ago

KKKKkkkk110y ago

ssh4210y ago

aconz210y ago

Check out the specs here: http://images.nvidia.com/content/technologies/deep-learning/...

though I'm most curious about what motherboard is in there to support NVLink and NVHS.

Good overview of Pascal here: https://devblogs.nvidia.com/parallelforall/inside-pascal/

1 question: will we see NVLink become an open standard for use in/with other coprocessors?

1 gripe: they give relative performance data as compared to a CPU -- of course its faster than a CPU

dgacmu10y ago

You mean you're not surprised that a machine with 8 GPUs, apparently costing $129k USD (from comment below), can outperform a single CPU? :)

(Of course, a better metric is that it's getting ~56x the performance at probably ~10x the TDP, but that's not surprising for a GPU with the current state of deep learning code.)

aconz210y ago

To be fair, they are comparing it to a dual-socket CPU; which is twice as fair as comparing to a single!!

What I was getting more at was: I want to know the relative performance compared to another 8 Tesla box. I know comparing apples isn't good marketing, but c'mon.

1 more reply

astrodust10y ago

$129K buys you a lot of dual 22-core servers.

3 more replies

virtuallynathan10y ago

pinewurst10y ago

https://devblogs.nvidia.com/parallelforall/inside-pascal/

1 more reply

phelm10y ago

I am looking forward to OpenCL catching up with CUDA in maturity and adoption, so that NVidia's monopoly in Silicon for deep learning will come to an end.

blt10y ago

dharma110y ago

pjmlp10y ago

For that to happen OpenCL has to be at the same level as CUDA in language support and tooling.

Having C only wasn't a good idea. NVidia was quite clever in giving first class treatment to C++, Fortran and any compiler vendor that wished to target PTX.

Also the visual debugging tools are quite good.

Khronos apparently needed to be hit hard to realise that not everyone wants to be stuck with C for HPC in the 21st century.

Also although Apple is the creator of OpenCL, they don't seem to give much love to it.

Then you have Google caring about it's Renderscript dialect, which doesn't help to the overall uptake in OpenCL.

There isn't a monopoly, rather vendors that lacked the perception to appeal to the developers wanted to have as tooling and performance.

Anyone is free to go use OpenCL, use C or a language with a compiler with a C target, do printf debugging and feel free.

Are any vendors already doing SPIR support?

bgalbraith10y ago

RussianCow10y ago

You often don't have a choice because a large amount of GPGPU software is written using CUDA, which is Nvidia-specific.

1 more reply

jlebar10y ago

DeepYogurt10y ago

Me too. I really want to see some benchmarks between cuda code and opencl code generated from cuda with AMDs compiler. Actually if anyone has a geforce/tesla get on this!

Robadob10y ago

Disclaimer: I use CUDA in my research, never tried OpenCL.

[1] http://arxiv.org/abs/1005.2581

[2] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=604719...

[3] http://arxiv.org/abs/1311.0378

1 more reply

nightski10y ago

What hardware could OpenCL even run on that would come remotely close to what this system has to offer?

olympus10y ago

OpenCL runs on Nvidia GPUs, so you could do an apples to apples comparison on this system.

2 more replies

nl10y ago

The problem is that no one[1] uses OpenCL because the performance isn't there. There is little sign of that changing, too.

pjmlp10y ago

Another reasons are language support and tooling.

CUDA had Fortran and C++ since day one and thanks to PTX was quite easy to add support for other languages.

Whereas OpenCL was stuck on "C only" model from Khronos, which forced everyone to use C or generate C code and be constrained to the device drivers.

This has been seen as such a big issue that SPIR and C++ SPIR got introduced with OpenCL 2.0.

Another very important one is debugging support. Last time I checked no one had visual tooling at the same level as NVidia's one.

badminton110y ago

Costs $129,000 and needs 3.2 kilowatts to run.

dogma113810y ago

olympus10y ago

To me, $129k isn't surprising since it is only going to be bought by researchers with big budgets. Small-timers will still build 3x GTX980 systems for under $5k.

3.2 KILOwatts sounded insane to me, but I suppose you'll have your own server rack to put it in if you can afford to buy one of these.

TkTech10y ago

astrodust10y ago

If that sounds insane, you're going to lose your mind when you realize how many KILOwatts your oven uses.

3.2KW is less than a dishwasher.

2 more replies

Fomite10y ago

jra10110y ago

More detail on the GPUs in the system:

https://devblogs.nvidia.com/parallelforall/inside-pascal/

Robadob10y ago

Have they published a copy of the video of the autonomous car trained with unsupervised learning from the keynote anywhere?

I'd love to show it to my father.

madengr10y ago

pareci10y ago

Curious. Why do you post here when every other comment is random posturing?

madengr10y ago

sp33210y ago

Wow, I didn't realize they were shipping HBM2 already. 720GB/s - with only 16GB of RAM, you can read it all in 22 milliseconds!

DeepYogurt10y ago

They're not. All that was mentioned in the talk was that this chip is coming soon.

sp33210y ago

There's a big green "Order Now" button about 1/3 of the way down the page.

1 more reply

nickpeterson10y ago

I have to wonder about intel and their Xeon Phi range. Last I checked they were supposed to launch a followup late last year that never manifested. Now we're 4 months in 2016 and still no new phi's.

Intel do this:

  - Offer a $200-400 XEON PHI CARD
  - Include whatever compiler needed to use it with the card
  - Make this easily buyable
  - Contribute ports of Cuda-based frameworks over to Xeon Phi

If you're AMD, do the same thing but replace the phrase Xeon Phi with Radeon/Firepro

manav10y ago

svensken10y ago

blakes10y ago

$129k seem extremely fair for what you get actually, in my experience.

Coding_Cat10y ago

Wait, how many chips did they cram in there that they're getting 170 TFlops. Even at a very generous 10 TFLOP per chip that would be 17 chips.

krasin10y ago

NVIDIA Tesla P100 has 21 TeraFLOPS of FP16 performance by their words. So they got 8 chips there.

jsheard10y ago

Yep, they showed a diagram of how it fits together: http://i.imgur.com/xk1daFG.jpg

2 more replies

Coding_Cat10y ago

Ah, half-floats. That explains it. Still pretty high but realistic at least.

intrasight10y ago

Is also fun to contemplate that in about five years you'll likely be able to buy one of these on eBay for about $10K.

dougmany10y ago

AndrewKemendo10y ago

How does this compare to some of the systems provided by cloud providers? Seems like requiring an on-site capability is a hurdle for integration if you already have your data on a cloud provider.

[1] https://aws.amazon.com/machine-learning/ [2] https://azure.microsoft.com/en-us/services/machine-learning/

jdcarter10y ago

I would argue that this box is probably targeted at cloud providers. The Nvidia GRID boards are similar--they're not for consumers, but for GPU/Gaming-as-a-service providers.

ansible10y ago

The unified memory architecture with the Pascal GP100 is pretty sweet. That will make it easier to work with large data sets.

visarga10y ago

It's good to see powerful machine learning hardware come out. Much of the progress in ML has come from hardware speedup. It will empower the next years of research.

bpires10y ago

I wonder how much faster the new Tesla P100 is compared to the Tesla K40 in training neural networks. The K40s were the best available GPUs for training deep neural networks.

aperrien10y ago

Does anyone know if the Pascal architecture is built using stacked cores? Or is this one of those applications where thermal problems keep that technique from being used?

wmf10y ago

No, the Pascal GPU itself is not stacked. Die stacking makes almost no sense for processors.

pmorici10y ago

Anyone have any idea of how the GPUs in this machine compare to the GPUs in their high end gaming products?

0x07c010y ago

Tesla's has more Double Float cores compared to gaming cards.

nshm10y ago

Looks like a research in machine learning will only be done in huge corporations. You'll need an amount of funding comparable to LHC.

Time to use better models like kernel ensembles, maybe they are not that accurate, but they are easier to train on a single CPU.

Houshalter10y ago

You can already do deep learning on cheap consumer hardware. And $100k is expensive, but it's nowhere near LHC levels.

Fomite10y ago

This price point is extremely accessible to most major research universities as well.

caycep10y ago

Does that mean Pascal release is just around the corner?!?

-unreformed box builder

chm10y ago

Any idea how much this costs?

dazzeruk10y ago

The Nvidia slides had it at $129,000 a pop

rckclmbr10y ago

Wow, cheap, great for startups!

2 more replies

dharma110y ago

I'll take 5

dharma110y ago

was hoping they would announce Pascal GTX's. Oh well. Computex I guess

agumonkey10y ago

What a peculiar pascaline.

j / k navigate · click thread line to collapse