Skip to content

Top New Best Ask Show Jobs

Show HN: Ecco – See what your NLP language model is “thinking” | Better HN

Show HN: Ecco – See what your NLP language model is “thinking” (opens in new tab)

(eccox.io)

185 pointsjalammar5y ago34 comments

34 comments

jalammarOP5y ago

Hi HN,

Author here. I had been fascinated with Andrej Karpathy's article (https://karpathy.github.io/2015/05/21/rnn-effectiveness/) -- especially where it shows neurons being activated in response to brackets and indentation.

I built Ecco to enable examining neurons inside Transformer-based language models.

You can use Ecco to simply interact with a language model and see its output token by token(as it's built on the awesome Hugging Face transformers package). But more interestingly you can use it to examine neuron activations. The article explains more: https://jalammar.github.io/explaining-transformers/

I have a couple more visualizations I'd like to add in the future. It's open source, so feel free to help me improve it.

ninjin5y ago

I can not thank you enough for your “The Illustrated Transformer” [1] that I have directed two cohorts of MSc students to – it is a true gem of an article. A few years ago my group made an interface to visualise contextual word representations [2] that looked like a primordial soup ancestor to your most recent article (no screenshots though, sadly). I hope putting these together brings you as much joy as it does to your fans in academia and education like myself reading it. Despite Chris Ohla’s effort with Distill, I still think we lack a good way to give the amount of credit efforts like yours deserve.

[1]: https://jalammar.github.io/illustrated-transformer

[2]: https://github.com/uclnlp/muppetshow

I also want to make an additional "Thank You" note for the author on the lovely "The Illustrated Word2Vec" [0]. I wish every concept Machine Learning or otherwise would follow such a framework.

[0] https://jalammar.github.io/illustrated-word2vec/

jalammarOP5y ago

I'd love to look at your group's visualizations! Is it a private repo? because the link doesn't open up. It never stops to blow my mind that we can represent words and concepts in vectors of numbers.

Thanks for your kind words! It's a labor of passion, honestly. And while in previous years it was a nights-and-weekends project, I have recently been giving it my entire time and focus -- which is why I'm able to dip my toes more heavily into R&D like Ecco and the "Explaining Transformers" article.

airstrike5y ago

I just want to say I absolutely love the name and logo. Brings back some fond memories of an incredibly hard game from once upon a time...

Having said that, IANAL, but I find it unlikely that the use of a dolphin and the word Ecco together are not trademarked, so you may want to check on that before someone bugs you about it

cmrdsprklpny5y ago

"Ecco the Dolphin" is a game for Sega consoles. https://en.wikipedia.org/wiki/Ecco_the_Dolphin

Grimm15y ago

This is fantastic, I used your earlier transformers article to first get a real grasp on the architecture. I hope you expand this to accommodate other modes of attention outside of transformers paradigm as well!

jalammarOP5y ago

Wonderful! Thanks!

I am curious about those recent O(L) attention transformers (see slide 106 of http://gabrielilharco.com/publications/EMNLP_2020_Tutorial__...). If these methods are converging towards a new self-attention mechanism, I'd love to try illustrating that.

What other attention modes are you referring to? Did something in particular catch your attention?

GistNoesis5y ago

Interesting. The non-negative matrix factorization on the first level kinda highlight some semantic groupings : paragraph, verbs, auxiliaries, commas, pronouns, nominal propositions.

I tried to look at higher level layers, and the grouping were indeed of higher level : for example at level 4 there was a grouping which highlighted for any punctuation (and not just comma). The grouping were also qualifying more : for example ("would deliberately" whereas at lower level it was just would).

But it's not as clear as I had hoped it would be. I hoped it would somehow highlight grouping of higher and higher size, that could nicely map to the equivalent of a parse-tree.

The problem I have with this kind of visualizations, is that they often require interpretation. Also, they don't tell me if the structure was really present by the neural network but was just not apparent because the prism of the Non-negative Matrix Factorization hid it.

For my own networks, instead of visualizing, I like to quantify things a little more. I give the neural network some additional layers, and I try to make the neural network produce the visualization directly. I give it some examples of what I'd like the visualization to look like, and jointly train/fine-tune the neural network so that it solve simultaneously his original task, and the production of the visualization which is then easier to inspect.

Depending on how many additional layers I had to add, and depending on where they were added, and depending on how accurate (measured by a Loss Function!) the network prediction are, I can better infer how it's working internally, and whether or not the network is really doing the work or if it is taking some mental shortcuts.

For example in my Colorify [1] browser extension, which aims to reduce the cognitive load of reading, I use neural networks to predict simultaneously visualizations of sentence-grouping, linguistic features, and even the parse-tree.

[1] https://addons.mozilla.org/en-US/firefox/addon/colorify/

Linguist here. I give, what's a "nominal preposition"? (I glanced at the web page and at the paper, but neither "nominal" nor "preposition" shows up in either.)

GistNoesis5y ago

Sorry, I was meaning nominal group, and I did an anglicism from french "nominal proposition", which means "noun clause".

jalammarOP5y ago

Interesting. Thanks for sharing your notes on the higher layers. Allow me to repost that to the discussion board on github.

I do get your point on interpretation. This work is just a starting point. I'm curious to arrive at ways to automatically select the appropriate number of factors for a specific sequence. Kind of like the elbow method for K-means clustering.

Helping people understand "what the ai is thinking" is really important when you are trying to get organizations to adopt the technology. Great work.

Exactly and maybe we can "lobotomize" sections of the models that replicate unwanted bias in the training data.

Der_Einzige5y ago

This work is awesome!

Are there theoretical reason to choose NMF over other dimensionality reduction algorithms, e.g. UMAP?

Is it easy to add other DR algorithms? I may submit a PR adding those in if it is...

jalammarOP5y ago

I actually started with PCA. But NMF proved more understandable since negative dimensions in PCA are hard to interpret. I didn't consider UMAP, but would be interested to see how it performs here.

It should be easy, yeah. for NMF, the activations vector is reshaped from (layers, neurons, token position) down into (layers/neurons, token position). And we present that to sklearn's NMF model. I would assume UMAP would operate on that same matrix. That matrix is called 'merged_act' and is located here: https://github.com/jalammar/ecco/blob/1e957a4c1c9bd49c203993...

anfal_alatawi5y ago

Thank you, Jay! I appreciate the addition of the colab notebooks with code examples. I can't wait to play around with this and investigate how language models _speak_.

jalammarOP5y ago

Thanks! Please let me know if you have any feedback!

yowlingcat5y ago

Wow, love the NNMF visualization. Like all great visualizations, it does a very good job of showing and not telling me what's going on. More of this, please. One question: how does this kind of thing line up with what people describe as "explainable AI?"

gfody5y ago

It's not explainable until all these weights are between unambiguous concepts in a knowledge base rather than plain text tokens that must be interpreted. For some reason we gave up on symbolic AI in the 70's and decided making machines write poetry is where the money's at.

jalammarOP5y ago

These are AI explanation methods. They belong to the toolbox which would include LIME, Shapley values...etc. Input saliency is a gradient-based explanation method.

blackbear_5y ago

Any examples of novel insights obtained with this method?

jalammarOP5y ago

What I found most fascinating is identifying neuron firing patterns corresponding to linguistic properties: e.g. groups of neurons that fire in response to verbs, or pronounds.

Scroll down to "Factorizing Activations of a Single Layer" in https://jalammar.github.io/explaining-transformers/ to see those.

The figure above it, titled 'Explorable: Ten Activation Factors of XML' shows how neuron firing patterns in response to XML -- opening tags, closing tags, and even indentation.

It's still fresh, but I'm keen to see what other people uncover in their examinations (or what shortfalls/areas of improvement there are for such a method).

amrrs5y ago

It's also mentioned in this video https://youtu.be/gJPMXgvnX4Y?t=429

ZeroCool2u5y ago

Fantastic work. This is the kind of stuff we need to get these models actually adopted and integrated into non-tech organizations.

NMF for factorizing activations is brilliant!

pizza5y ago

One small step on the path towards solid-state intelligence

khalidlafi5y ago

looks great!

jalammarOP5y ago

Thank you!

j / k navigate · click thread line to collapse