itstomo | Better HN

itstomo

8 karmaJoined August 9, 202233 submissions

Recent submissions

Show HN: AgentDrive – Persistent file storage for AI agents (opens in new tab)

(getagentdrive.com)

Open Source Multimodal Semantic Search

I've developed a framework for multimodal semantic search (RAG), which orchestrates MongoDB, Pinecone, S3 bucket, LLM API, etc.

If you are not too familiar with "semantic search," think of it as RAG.

To enable semantic search for text, image, etc, we need to pre-process our data before sending it to the database.

My framework works as a middle layer between your app and existing databases, where all the processings happen.

The key differentiator is its MongoDB-compatible NoSQL interface: you interact with this RAG engine exactly as you would with MongoDB, while benefiting from powerful vector search and document-augmentation capabilities.

It's open source: https://github.com/onenodehq/onenode

3itstomo10mo ago0

Pocket LLM Server Just Like a Pocket WiFi

Hey HN,

If there is a pocket-sized compact hardware that hosts large-size open-source LLMs that you can connect offline, wouldn't it be helpful?

The benefits:

- You can use large-size open-source LLMs without using up your PC or smartphone's compute

- You can protect your privacy

- You can use high performance LLM offline

3itstomo10mo ago4

Do you know any AI agent that uses PC to complete any task?

Hey hacker news, I'm looking for an AI agent that completes arbitrary tasks by using the entire PC (screen view + clicking buttons etc).

I know there are many attempts to achieve this, but I don't know any one that actually works.

If you know any, please comment below! Thanks!

2itstomo10mo ago0

The only multimodal database with semantic search capabilities

I'm developing a truly multimodal database by document db, object storage, and vector db into a nicely unified interface.

Traditionally, text and other media files such as images, video, and audio are treated separately requiring developers to set up pipeline to deal with multi modalities in their app.

Moreover, when they need semantic search capabilities over their multimodal data, they need to set up embedding pipeline with vector db.

I want to make an abstracted document DB that natively accepts multiple modalities inside JSON documents and automatic semantic search indexing.

Please check out the docs and let me know your thoughts.

https://docs.capydb.com

1itstomo1y ago0

CapyDB – Truly Multimoal Semantic Search (opens in new tab)

(docs.capydb.com)

2itstomo1y ago0

Self-Evolving AI

chat AI (2023) -> -> AI agent (2204) -> MCP (early 2025) -> ??? (2025~)

So... for an AI agent to be truly self-evolving, it has to have access to modify ITSELF, not only the outside world that it interacts with. This means that it has to be able to modify its source code by itself.

To do this, the most straightforward way is to give the AI a whole server to run itself, with the ability to scan its source code, modify it, and reboot the server to kind of "update" its version. If things go well, this shall show us something interesting.

1itstomo1y ago2

Multimodal Semantic Search

There is no truly multimodal semantic search database. So I wanted to make one.

https://www.capydb.com

2itstomo1y ago0

Product Hunt Is Staged

Product Hunt is staged and stolen. You can visit PH at 12am (PST) and see 10 products that get 50 upvotes immediately after launching.

If you don't buy upvotes, you'll be wiped out from the top list and won't even be viewed by users.

You can search and find a website where you can buy 50 upvotes for $399.

4itstomo1y ago0

Give a whole server to an agent with the full permission of doing whatever

2023 was a year of pure upgrade of LLM models. 2024 was a year of improvements with more complex reasonings. 2025 might be the year of self-evolving AI agents with the capabilities of installing, subscribing to SaaS, paying hosted database by itself etc.

Setting ethical considerations aside for the moment, I think it can be interesting to see what happens if we have a whole server with a self-evolving AI agent inside with full permission to modify its server. To prevent shutting down itself when installing new dependencies, it can create a copy of its server and after successful installation, it can safely terminate the older version of its server.

Possibly it could generate revenue to pay its server fee and database fee etc. (Very optimistically tho)

What do you think?

5itstomo1y ago6

Make LangChian more database oriented

The current LangChian approach is framework-oriented, which treats frameworks themselves as the central part of the system and databases as a sub-part of the system. This gives a higher level of abstraction but a lower level of flexibility because we will have less database-like operation, which has more flexibility.

This is one of the main reasons why people are complaining about the complexity of LangChain despite its intentions of making development easier.

What we end up developing internally for our own front end is a more database-oriented solution. We basically implemented the embedding and indexing functionality directory on top of MongoDB and Pinecone, making it look like a MongoDB with a built-in async embedding pipeline. The reason why we chose to do this is because we wanted to keep database-like syntax that is as powerful as LangChian.

What do you guys think? And what is your solution for your apps?

1itstomo1y ago0

AI Computer Interface (ACI)

Hi Hacker News,

What do you think about "AI-Computer Interface"?

Let me explain a bit. "Human-computer interface (HCI)", was first developed around WWII. Since then it evolved so much and today we have very sophisticated and intuitive interfaces.

It's the AI era, and there are so many projects and products trying to automate various tasks that humans are doing. However, main approach is to try to adapt their LLM-based products to current HCI, which is developed for humans, not LLMs. This approach actually works with some workaround, such as taking screenshots to understand what actions are available, etc.

Example project: https://github.com/OpenInterpreter/open-interpreter

I wonder if creating UI specifically for LLMs/AIs instead of adapting LLMs into human-friendly UI can make it possible to achieve more efficient automation processes.

Do you know any of such projects or products?

Happy holidays!

3itstomo1y ago2

Show HN: A Fusion of MongoDB and Pinecone

Integrating a robust RAG system into my application was more challenging than I anticipated. While excellent databases and frameworks are available, the process was far from straightforward. Vector databases served as a strong foundation, but mapping vectors to corresponding resources became tedious unless I embedded all metadata into the vector storage - an approach I found less than ideal.

Frameworks like LangChain and LlamaIndex offered inspiration, but their documentation often felt overwhelming due to multiple tools and versions, which added complexity instead of clarity.

After experimenting, I settled on a hybrid approach: using MongoDB as the primary database to manage RAG resources and Pinecone as a plugin for vector search. This combination offered the flexibility and performance I needed for smooth integration.

To help others facing similar challenges, I’ve made this setup available to developers. I’d love to hear your thoughts!

Documentation: https://docs.onenode.ai

1itstomo1y ago0

Recent submissions

Show HN: AgentDrive – Persistent file storage for AI agents (opens in new tab)

(getagentdrive.com)

3itstomo2mo ago0

Open Source Multimodal Semantic Search

I've developed a framework for multimodal semantic search (RAG), which orchestrates MongoDB, Pinecone, S3 bucket, LLM API, etc.

If you are not too familiar with "semantic search," think of it as RAG.

To enable semantic search for text, image, etc, we need to pre-process our data before sending it to the database.

My framework works as a middle layer between your app and existing databases, where all the processings happen.

It's open source: https://github.com/onenodehq/onenode

3itstomo10mo ago0

Pocket LLM Server Just Like a Pocket WiFi

Hey HN,

If there is a pocket-sized compact hardware that hosts large-size open-source LLMs that you can connect offline, wouldn't it be helpful?

The benefits:

- You can use large-size open-source LLMs without using up your PC or smartphone's compute

- You can protect your privacy

- You can use high performance LLM offline

3itstomo10mo ago4

Do you know any AI agent that uses PC to complete any task?

Hey hacker news, I'm looking for an AI agent that completes arbitrary tasks by using the entire PC (screen view + clicking buttons etc).

I know there are many attempts to achieve this, but I don't know any one that actually works.

If you know any, please comment below! Thanks!

2itstomo10mo ago0

The only multimodal database with semantic search capabilities

I'm developing a truly multimodal database by document db, object storage, and vector db into a nicely unified interface.

Traditionally, text and other media files such as images, video, and audio are treated separately requiring developers to set up pipeline to deal with multi modalities in their app.

Moreover, when they need semantic search capabilities over their multimodal data, they need to set up embedding pipeline with vector db.

I want to make an abstracted document DB that natively accepts multiple modalities inside JSON documents and automatic semantic search indexing.

Please check out the docs and let me know your thoughts.

https://docs.capydb.com

1itstomo1y ago0

CapyDB – Truly Multimoal Semantic Search (opens in new tab)

(docs.capydb.com)

2itstomo1y ago0

Self-Evolving AI

chat AI (2023) -> -> AI agent (2204) -> MCP (early 2025) -> ??? (2025~)

1itstomo1y ago2

Multimodal Semantic Search

There is no truly multimodal semantic search database. So I wanted to make one.

https://www.capydb.com

2itstomo1y ago0

Product Hunt Is Staged

Product Hunt is staged and stolen. You can visit PH at 12am (PST) and see 10 products that get 50 upvotes immediately after launching.

If you don't buy upvotes, you'll be wiped out from the top list and won't even be viewed by users.

You can search and find a website where you can buy 50 upvotes for $399.

4itstomo1y ago0

Give a whole server to an agent with the full permission of doing whatever

Possibly it could generate revenue to pay its server fee and database fee etc. (Very optimistically tho)

What do you think?

5itstomo1y ago6

Make LangChian more database oriented

This is one of the main reasons why people are complaining about the complexity of LangChain despite its intentions of making development easier.

What do you guys think? And what is your solution for your apps?

1itstomo1y ago0

AI Computer Interface (ACI)

Hi Hacker News,

What do you think about "AI-Computer Interface"?

Let me explain a bit. "Human-computer interface (HCI)", was first developed around WWII. Since then it evolved so much and today we have very sophisticated and intuitive interfaces.

Example project: https://github.com/OpenInterpreter/open-interpreter

I wonder if creating UI specifically for LLMs/AIs instead of adapting LLMs into human-friendly UI can make it possible to achieve more efficient automation processes.

Do you know any of such projects or products?

Happy holidays!

3itstomo1y ago2

Show HN: A Fusion of MongoDB and Pinecone

Frameworks like LangChain and LlamaIndex offered inspiration, but their documentation often felt overwhelming due to multiple tools and versions, which added complexity instead of clarity.

To help others facing similar challenges, I’ve made this setup available to developers. I’d love to hear your thoughts!

Documentation: https://docs.onenode.ai

1itstomo1y ago0