The Landscape of Machine Learning on ArXiv (opens in new tab)

(lmcinnes.github.io)

31 pointslmcinnes2y ago10 comments

10 comments

Pretty interesting!

A big surprise for me was to find the explainability cluster quite far from the causality one. But I guess it stems from a cultural facet: causality is mainly the purview of statistics (with Pearl at the helm) with a strong medical sciences focus; when explainability is more of a reaction to algorithm used in the industry (trees, GLMs, neural networks, etc.); which you deploy for performance, and only then you care about knowing the why.

pyromaker2y ago

This is cool! Always find these visualization helpful but it does get quite big sometimes.

[shameless plug]

I created a personalized newsletter for Arxiv (which I plan to expand to others) so you can receive the latest research papers on the topics. You can also filter by some keywords too (e.g. only give me LLM or RAG related papers).

https://app.scholars.io

lbeckman3142y ago

Beautiful!

https://github.com/TutteInstitute/datamapplot

esafak2y ago

Thank you! What about the frontend? This is palpably faster than other graph libraries.

lbeckman3142y ago

Interesting! Looks like it's bundled as a single HTML file (and then hosted on Github Pages). [0]

From the DataMapPlot Docs [1]:

"While the interactive plots work well in-line in a notebook, it is often useful to be able to send plots to others. The save method allows you to save the plot to a single HTML file that will embed (compressed) copies of the data. You can then share the HTML as required."

[0] https://github.com/lmcinnes/datamapplot_examples/blob/master...

[1] https://datamapplot.readthedocs.io/en/latest/interactive_int...

rdedev2y ago

I created something like this for my data engineering class project. It was a temporal visualization of citation networks. It was fun to see different domains like computer vision and nlp be seemingly separate but then as time went on become pretty coupled with each other

fleischhauf2y ago

I suppose the underlying modality does not matter too much if it is about learning per se

HanClinto2y ago

What is being used to build the data map -- how does one project these document vectors into 2D space?

BenoitP2y ago

OP is also the author of the popular dimensionality reduction algorithm UMAP.

I guess the pipeline was embedding documents with an LLM (or even plain old word2vec average over the abstract might do it), and then reducing that to 2 dimensions with a cosine similarity metric with the help of UMAP.

I have no idea about colors and local cluster naming though. Maybe that's handcrafted.

dkural2y ago

Please add bioRxiv if you can, so many life-sciences relalted ML papers there.

j / k navigate · click thread line to collapse

10 comments

BenoitP2y ago

Pretty interesting!

pyromaker2y ago

This is cool! Always find these visualization helpful but it does get quite big sometimes.

[shameless plug]

https://app.scholars.io

lbeckman3142y ago

Beautiful!

https://github.com/TutteInstitute/datamapplot

esafak2y ago

Thank you! What about the frontend? This is palpably faster than other graph libraries.

lbeckman3142y ago

Interesting! Looks like it's bundled as a single HTML file (and then hosted on Github Pages). [0]

From the DataMapPlot Docs [1]:

[0] https://github.com/lmcinnes/datamapplot_examples/blob/master...

[1] https://datamapplot.readthedocs.io/en/latest/interactive_int...

rdedev2y ago

fleischhauf2y ago

I suppose the underlying modality does not matter too much if it is about learning per se

HanClinto2y ago

What is being used to build the data map -- how does one project these document vectors into 2D space?

BenoitP2y ago

OP is also the author of the popular dimensionality reduction algorithm UMAP.

I have no idea about colors and local cluster naming though. Maybe that's handcrafted.

dkural2y ago

Please add bioRxiv if you can, so many life-sciences relalted ML papers there.

j / k navigate · click thread line to collapse