I've used them to do things like characterise users based on follow/follower patterns, but there are many more applications.
In the past I've had great success with Facebook Research's StarSpace
As a general rule of thumb, it is important your graph has enough redundancy in it, i.e. the more relations, the better. Also, bear in mind these models do not support multi-modality, i.e. literals such as numbers, strings, geo coordinates, timestamps are simply treated as entities. In most cases it is probably better to filter literals out before generating the embeddings.
I think the main use-case is plugging in an existing knowledge graph, and it filling in the gaps, correct?
Can I augment this will really high-quality embeddings for the nodes, that were learned over auxiliary unlabelled text?
What are other ways I can augment the data set?
Is this useful only when there are many edge-types, or is it also good when there are very few?
It looks promising, I just couldn't immediately grok when I use should look to this library.
I used graph embeddings as input to a classifier to classify people when follower/followee information was easy to gather but text wasn't.
Basically anything that can be represented as a graph can be used. There is some interesting work being done using code syntax trees as input which uses a very similar approach. See code2vec[2]
I'm not aware of any way to transfer text embeddings into graph emneddings, but you can could concatenate them and use them together (I've done this before) or maybe do some dimension reduction or do a multi-task learning thing and try to learn some combined representation.
I'm not ware of the scalability limits for this particular library, but Facebook Research's pytorch-biggraph[3] (released 2 days ago) scales to trillions of edges and billions of nodes.
[1] https://github.com/facebookresearch/StarSpace
[2] https://arxiv.org/abs/1803.09473
[3] https://ai.facebook.com/blog/open-sourcing-pytorch-biggraph-...
> the main use-case is plugging in an existing knowledge graph, and it filling in the gaps Correct. That is known as Link Prediction. There are other machine learning tasks you can do, though: for example you can generate embeddings and then cluster them. Or you can use embeddings to see if distinct entities are indeed the same.
> Can I augment this will really high-quality embeddings for the nodes, that were learned over auxiliary unlabelled text? I know there is a handful of papers in literature that do that, but we have not implemented any of them yet in AmpliGraph. Examples:
* Xie, Ruobing, et al. "Representation Learning of Knowledge Graphs with Entity Descriptions." AAAI 2016. * Xu, Jiacheng, et al. "Knowledge Graph Representation with Jointly Structural and Textual Encoding." arXiv preprint arXiv:1611.08661 (2016). * [Han16] Han, Xu, Zhiyuan Liu, and Maosong Sun. "Joint Representation Learning of Text and Knowledge for Knowledge Graph Completion." arXiv preprint arXiv:1611.04125 (2016).
> What are other ways I can augment the data set? I would try first with a dataset with no literals (no strings, no numbers, no geo coordinates) as these are treated as entities, for now. I suggest generating embeddings first on your current graph, and measuring the predictive power using http://docs.ampligraph.org/en/1.0.1/generated/ampligraph.eva... Merging additional datasets would be another option, to get more data to work on.
> Is this useful only when there are many edge-types, or is it also good when there are very few?
Also when there are a few.
Let us know how you likeit, and if you need assistance, we have a public Slack channel - happy to answer any question! https://join.slack.com/t/ampligraph/shared_invite/enQtNTc2NT...