undefined | Better HN

0 pointsdnautics7y ago0 comments

Yeah that's kind of what I'm referring to but the default array typing in flux.ml doesn't encode tensor dimensionality in the type system. If it did (which it very easily could in julia) you wouldn't wind up with a situation where your learning task halts in the middle of a training run, which can happen in flux.ml

0 comments

ninjin7y ago

Due to the way that code composition works in Julia, there is no real “default” array for Flux. Rather, you can lift in any array type that you like. The GPU arrays are an excellent example of this, Flux “knows” nearly nothing about GPUs (apart from a few convenience functions), yet works perfectly when using a GPU array type. So there is nothing stopping you from lifting in say StaticArrays [1] which carries the sizes in the type or NamedArrays [2] where dimensions have explicit names – the latter being superior in practice to the former in my opinion, or perhaps someone is up for marrying the two?

[1]: https://github.com/JuliaArrays/StaticArrays.jl

[2]: https://github.com/davidavdav/NamedArrays.jl

In brief, it is not the duty of the automatic differentiation package to favour a specific array type – it just works for all of them, which is something that I find fairly magical with Julia.

dnauticsOP7y ago

1) It is not the duty of AD to favor an array type, but flux is an ML library. When you do something like Chain() or Dense() or LSTM() in flux, which is very obviously an ML tensor operation, it SHOULD pick reasonable, fixed (or variable!) tensor dimension. This is maybe not so easy, but it should be doable. Likewise, I wish Flux had "batch" and "minibatch" types that had specifiable dimensions so that if you try to hook up to data to layers of the wrong shape it gives an early warning.

2) StaticArrays would be a good starting point, but the point of it is to optimize Arrays by unrolling for loops and triggering SIMD (IIRC) and there are performance penalties when your arrays get really large, which they do, in ML. Something LIKE the staticarrays typesystem but without the overoptimization would be welcome.

3) (kind of tangential) I have beef with how GPU is handled as GPUArray in julia. It really should be handled as a worker node using the ClusterManagers-type semantic; you should be async sending tasks to the GPU as if it were a remote agent (which it kind of is, due to PCI bus bandwidth and latency bottlenecks) and waiting for the result to come back as a Future.

byt1437y ago

Regarding 3, can you make an issue or discourse post for discussion?

j / k navigate · click thread line to collapse