Notably, both are owned by Google. The free-GPU notebook market is catching on.
Beyond that, for private work the non-legalese TL;DR is no, at least not beyond what's required for us to operate the service. I'll refer you to https://www.kaggle.com/terms, and copy one relevant section below. If your work is in the context of making submissions to a specific machine learning competition, then that competition may have bespoke exceptions to this as well (which would be detailed in the rules of the competition).
"For all User Submissions, you grant Kaggle a license to translate, modify (for technical purposes, for example making sure your content is viewable on an iPhone as well as a computer) and reproduce and otherwise act with respect to such User Submissions, in each case to enable us to operate the Services, as described in more detail below. You acknowledge and agree that Kaggle, in performing the required technical steps to provide the Services to our users (including you), may need to make changes to your User Submissions to conform and adapt those User Submissions to the technical requirements of communication networks, devices, services, or media, and the licenses you grant under these Terms include the rights to do so. You also agree that all of the licenses you grant under these Terms are royalty-free, perpetual, irrevocable, and worldwide. These are licenses only — your ownership in User Submissions is not affected."
Basically, suppose I wanted everything that I could define in a Docker container to be available “as the environment” in which the notebook is running. How do I do that?
I ask because I’ve started to see an alarming proliferation of “notebook as a service” platforms that don’t offer that type of full environment spec, if they offer any configuration of the run time environment at all.
I’ve taught probability and data science at university level and worked in machine learning in a variety of businesses too, and I’d say for literally all use cases, from the quickest little pure-pedagogy prototype of a canned Keras model to a heavily customized use case with custom-compiled TensorFlow, different data assets for testing vs ad hoc exploration vs deployment, etc., the absolutely minimum thing needed before anything can be said to offer “reproducibility” is complete specification of the run time environment and artifacts.
The trend to convince people that a little “poke around with scripts in a managed environment” offering is value-additive is dangerous, very similar to MATLAB’s approach to entwine all data exploration with the atrocious development havits that are facilitated by the console environment (and to specifically target university students with free licenses, to use a drug dealer model to get engineers hooked on MATLAB’s workflow model and use that to leverage employers to oblige by buying and standardizing on abjectly bad MATLAB products).
Any time I meet young data scientists I always try to encourage them to avoid junk like that. It’s vital to begin experiments with fully reproducible artifacts like thick archive files or containers, and to structure code into meaningful reproducible units even for your first ad hoc explorations, and to absolutely always avoid linear scripting as an exploratory technique (it is terrible and ineffective for such a task).
Kaggle Kernels seems like a cool idea, so long as the programmer must fully define artifacts that describe the complete entirety of the run time environment, and nobody is sold on the Kool Aid of just linear scripting in some other managed environment.
Each kernel for example could have a link back to a GitHub repo containing a Dockerfile and build scripts for what defined the precise environment the notebook is running in. Now that’s reproducible.