Definitely excited to check this out; thanks Jeremy and Sylvain!
personally i think these work great because i can add cells to inspect data or try experiments easily as i'm reading to help me understand whats going on
Not really being interested in ML, but doing all the available most popular courses to keep up, I really liked how fastai doesn’t just teach you ready and known models, but also how to compose differentiable building blocks to design NN’s yourself.
But Fast.ai has TWO co-founders, and somehow, Rachel doesn't seem to get any credit in these discussions (not the book specifically, I'm talking about the overall enterprise). Not quite sure why; A lot of the content on the website is written by her, and it's clear she adds a lot of value to the endeavor as a whole.
She created and taught the NLP and Computational Linear Algebra courses, and has written most of the material on the fast.ai blog, and of course (as noted) co-founded fast.ai. Overall, I'd agree that she doesn't get as much credit as she deserves. That's perhaps partly due to her increasing focus on ethics issues, which aren't generally discussed much on HN (sadly).
I also would say that Sylvain Gugger doesn't get as much credit as he should -- he has been an equal partner with me in creating the book and fastai library.
(I discussed this response with Rachel prior to posting it.)
Polish does not work this way. Source: I am Polish. Perhaps jph00 meant Turkish. Issue filed.
[1] https://github.com/fastai/fastbook/blob/master/10_nlp.ipynb
But for the book I mentioned Polish due to this paper: https://arxiv.org/abs/1810.10222 . But as you say, now the word "agglutinative" isn't technically correct. I'm actually not sure what the right word is to describe languages that have lots of big compounds with no spaces. (Which is the key issue here, as to why we need subword tokenization techniques).
https://en.wikipedia.org/wiki/Synthetic_language
There's a spectrum between synthetic and analytic languages ( https://en.wikipedia.org/wiki/Synthetic_language#Synthetic_a... ) and those closer to the synthetic end are the ones giving you trouble.
Polish will be subtype of synthetic called fusional/inflected which means things need to be adjusted to fit together, agglutinative languages are those that use mainly agglutination where morphemes are stuck together as is:
https://en.wikipedia.org/wiki/Agglutinative_language
Since it's a spectrum / categorization based on features, all languages will show these features to various degrees. E.g. the famous "anti|dis|establish|ment|ari|an|ism" in english and "anty|samo|u|bez|przedmiot|owia|nie" as a similar example in polish (both from https://pl.wikipedia.org/wiki/Aglutynacyjno%C5%9B%C4%87 ), or more humble "houseboat" or "bitwise".
There are also polysynthetic languages, which is the name for the extreme of this spectrum, but there are no familiar examples of these (Mayan languages, Ainu, Inuit, Aleut are only i recognize from those mentioned on wikipedia).
Side note: IMHO, you are exaggerating the ability of Polish to form long compounds. Dissecting the "Bezbarwne zielone idee wściekle śpią" example from https://arxiv.org/pdf/1810.10222.pdf#page=3 reveals no words longer than 4 morphemes:
bez-BARW-n-e ZIEL-on-e IDE-e WŚCIEK-l-e ŚP-ią, where I put word roots in uppercase and bound morphemes in lowercase.
The longest sequences of morphemes (for a loose definition of morpheme) I can think of are conditional mood of verbs with double prefixes like po-wy-CHODZI-ł-y-by-ście. However, the sequences of bound morphemes in those forms, which may look complex to you, form a finite-state language that admits just a few sequences.
Megszentségteleníthetetlenségeskedéseitekért for example.
There are quite a lot of languages that do though: https://en.wikipedia.org/wiki/Agglutinative_language
Minor point, a requirements.txt file or something would be convenient to get started quickly.
We didn't expect the draft to get this much attention, frankly!
That's not how the GPL works.
What is Fastai? Why do I need it?
Something as basic as an elevator speech that introduces your product in your github page and book intro can mean 10x or 100x more sales.
If you force people into having to search it for you, you have already lost most of them.
For this author it is as you already know everything about Fastai, but if you did, you would not be needing this book in the first place.
It happens a lot to technical writers because they have spent years thinking about a topic, so they could not put themselves in the shoes of someone who does not.
By contrast, Jeremy and team have proven that "build it and they will come" is not dead. They built high quality courses and quickly became authoritative with no marketing and with full transparency and openness in everything they do.
This book draft looks great. Everyone else is talking about "democratising AI" - this is actually doing it.
They don't have access to compute power (GPUs) or bandwidth to download datasets hundreds of gigabytes of data, which they'll find right there, so this should help them since they don't need to have powerful machines or worry about experiment tracking.
We also have a Publish option to make an application from a notebook in one click behind the scenes with training parameters in a form generated automatically, so they can write scripts and instrument model training.
The fast.ai course will also help current or future members, and other students. It's important for us to make it even easier for people to enter the field.
Most DL researchers I know also have a pretty good knowledge of available libraries and make it a habit to check them pretty often.
Few people here need to look it up; the basic promotion has already been done very effectively for the target audience. Besides, the intro chapter explains very clearly what the book is about and who it's for.
If you want to use code in the book under a non GPL license, then you could just buy the book when it comes out. That doesn't seem like an unreasonable burden.
PS: none of this is anything to do with O'Reilly or their lawyers.