undefined | Better HN

0 pointsIanCal6mo ago0 comments

That's the one! It's not even that weird of a case compared to others but is an excellent example.

Here's the history of the Paris example: https://en.wikipedia.org/wiki/University_of_Paris where there was one, then many, then fewer universities. Answering a question of "what university is referred to by X" depends on why you want to know, there are multiple possible answers. Again it's not the weirdest one, but a good clear example of some issues.

There's a company called Merk, and a company called Merk. Merk is called Merk in the US but MSD outside of it. The other Merk is called Merk outside the US and EMD inside it. Technically one is Merk & Co and used to be part of Merk but later wasn't and due to trademark disputes, which aren't even all resolved yet.

This is an area I think LLMs actually have a space to step in, we have tried perfectly modelling everything so we can let computers which have no ability to manage ambiguity answer some questions. We have tried barely modelling anything and letting humans figure out the rest, as they're typically pretty poor at crafting the code, and that has issues. We ended up settling largely on spending a bunch of human time modelling some things, then other humans building tooling around them to answer specific questions by writing the code, and a third set who get to actually ask the questions.

LLMs can manage ambiguity, and they can also do more technical code based things. We haven't really historically had things that could manage ambiguity like this for arbitrary tasks without lots of expensive human time.

I am now wondering if anyone has done a graph db where the edges are embedding vectors rather than strict terms.

0 comments

isoos6mo ago

> I am now wondering if anyone has done a graph db where the edges are embedding vectors rather than strict terms.

Curious: how would you imagine it working if there were such a graph db?

IanCalOP6mo ago

I had the idea a few hours ago so I'm sure there are holes in this but my first idea is forming a graph where the relationship isn't a fixed label but a description that is then embedded as a vector.

First of all, consider that in a way each edge label is a one-hot binary vector. And we search using only binary methods. A consequence is anything outside of that very narrow path all data is missed in a search. A simple step could be to change that to anything within an X similarity to some target vector. Could you then search "(fixed term) is a love interest of b?" and have b? filled from facts like "(fixed term) is intimate with Y" and "(fixed term) has a date with Z"?

There are probably issues, I'm sure there are, but some blend of querying but with some fuzziness feels potentially useful.

fishmicrowaver6mo ago

Isn't this exactly what neo4j does for graphrag?

1 more reply

j / k navigate · click thread line to collapse