Kernels, a free hosted Jupyter notebook environment with GPUs (opens in new tab)

(kaggle.com)

95 pointsbenhamner7y ago12 comments

12 comments

The timing of this submission appears to follow the Colaboratory submission with similar functionality just removed: https://news.ycombinator.com/item?id=17692263

Notably, both are owned by Google. The free-GPU notebook market is catching on.

jl27187y ago

Hey Ben, are these going to support arbitrary CUDA?

jl27187y ago

Also, does Kaggle have any IP rights to code written in a kernel?

benhamnerOP7y ago

Code made public on Kaggle Kernels is currently required to be under an Apache 2.0 license.

Beyond that, for private work the non-legalese TL;DR is no, at least not beyond what's required for us to operate the service. I'll refer you to https://www.kaggle.com/terms, and copy one relevant section below. If your work is in the context of making submissions to a specific machine learning competition, then that competition may have bespoke exceptions to this as well (which would be detailed in the rules of the competition).

"For all User Submissions, you grant Kaggle a license to translate, modify (for technical purposes, for example making sure your content is viewable on an iPhone as well as a computer) and reproduce and otherwise act with respect to such User Submissions, in each case to enable us to operate the Services, as described in more detail below. You acknowledge and agree that Kaggle, in performing the required technical steps to provide the Services to our users (including you), may need to make changes to your User Submissions to conform and adapt those User Submissions to the technical requirements of communication networks, devices, services, or media, and the licenses you grant under these Terms include the rights to do so. You also agree that all of the licenses you grant under these Terms are royalty-free, perpetual, irrevocable, and worldwide. These are licenses only — your ownership in User Submissions is not affected."

benhamnerOP7y ago

At the moment, we're focused on providing great support for the Python and R analytics/machine learning ecosystems. We'll likely expand this in the future, and in the meantime it's possible to hack through many other usecases we don't formally support well.

mlthoughts20187y ago

How do you handle custom environment requirements, whether it’s Python version, library version, or more complex things in the environment that some code might run on?

Basically, suppose I wanted everything that I could define in a Docker container to be available “as the environment” in which the notebook is running. How do I do that?

I ask because I’ve started to see an alarming proliferation of “notebook as a service” platforms that don’t offer that type of full environment spec, if they offer any configuration of the run time environment at all.

I’ve taught probability and data science at university level and worked in machine learning in a variety of businesses too, and I’d say for literally all use cases, from the quickest little pure-pedagogy prototype of a canned Keras model to a heavily customized use case with custom-compiled TensorFlow, different data assets for testing vs ad hoc exploration vs deployment, etc., the absolutely minimum thing needed before anything can be said to offer “reproducibility” is complete specification of the run time environment and artifacts.

The trend to convince people that a little “poke around with scripts in a managed environment” offering is value-additive is dangerous, very similar to MATLAB’s approach to entwine all data exploration with the atrocious development havits that are facilitated by the console environment (and to specifically target university students with free licenses, to use a drug dealer model to get engineers hooked on MATLAB’s workflow model and use that to leverage employers to oblige by buying and standardizing on abjectly bad MATLAB products).

Any time I meet young data scientists I always try to encourage them to avoid junk like that. It’s vital to begin experiments with fully reproducible artifacts like thick archive files or containers, and to structure code into meaningful reproducible units even for your first ad hoc explorations, and to absolutely always avoid linear scripting as an exploratory technique (it is terrible and ineffective for such a task).

Kaggle Kernels seems like a cool idea, so long as the programmer must fully define artifacts that describe the complete entirety of the run time environment, and nobody is sold on the Kool Aid of just linear scripting in some other managed environment.

Each kernel for example could have a link back to a GitHub repo containing a Dockerfile and build scripts for what defined the precise environment the notebook is running in. Now that’s reproducible.

3 more replies

j / k navigate · click thread line to collapse

12 comments

minimaxir7y ago

The timing of this submission appears to follow the Colaboratory submission with similar functionality just removed: https://news.ycombinator.com/item?id=17692263

Notably, both are owned by Google. The free-GPU notebook market is catching on.

jl27187y ago

Hey Ben, are these going to support arbitrary CUDA?

jl27187y ago

Also, does Kaggle have any IP rights to code written in a kernel?

benhamnerOP7y ago

Code made public on Kaggle Kernels is currently required to be under an Apache 2.0 license.

benhamnerOP7y ago

mlthoughts20187y ago

How do you handle custom environment requirements, whether it’s Python version, library version, or more complex things in the environment that some code might run on?

Basically, suppose I wanted everything that I could define in a Docker container to be available “as the environment” in which the notebook is running. How do I do that?

3 more replies

j / k navigate · click thread line to collapse