Key features - Create and manage multiple prompts, enabling, disabling, removing,
- Persistent memory, essentially rolling summary windows plus traversable index
- Turn on 'auto-gpt like' functionality like write, read, query
- Embed pdfs and google docs using the amazing LlamaIndex
- Create new channels / configurations, and share those with other people
Intro video here : https://www.youtube.com/watch?v=FB_g_8ofSlE
You can essentially craft a set of prompts and document connections, interact with that agent over time, with persistent memory, and shared docs. You can make that public, so being able to share embeddings was the thinking, but also shared memory.
The stack
It's a react front end, to a python cloud server. Hope is to make that server client agnostic, delivered via API. So people could compose agents, behaviours, permissions in NOVA, and deliver agent behaviours into their own flows, thinking AMS (agent management system).
Having it on a server puts all logs, convos, documents cloud access. My actual goal is a local app, and able to add / connect your own endpoints, or connect your account, but potentially sync with the cloud.
Right now it is openAI using my key, and I've got a credit system, it works out to be about double the cost, but I've included $10USD. I'll add ability to connect your own API soon, as well as sliding scale for credits.
Using it
- Jumping on, you'll see a 'public' space I've made, and nova is configured for guests
- Sign in with single sign on and it'll generate a 'new user' space designed to do goal setting
- Both spaces were created 'with the tools' so an example of its utility
- You can then either clear all those prompts, or make a new space and start fresh, creating instructions for an agent you might find useful (or using it to test different agents for your own work)
I made a walkthrough of the basics here : https://www.youtube.com/watch?v=iQpt0B5LzNI
More advanced stuff
- So the toggles on the side are 'cartridges' (my term for json blob of prompts, commands, settings, all injecting at runtime)
- If you add an 'index' cartridge, it directly adds an index using LlamaIndex - so you can add a pdf, or google doc (the auth flow there isn't approved so you'll get warnings) - you can query directly in cartridge, or agent can query if commands are on
- If you add a command cartridge, it turns on 'auto-gpt-lite' - basically switches to json returns, switches on command parsing, you'll see the commands are cofigurable, but I'm going to rethink all that
- If you add a settings cartridge, it then adds settings, main ones are 'give context' injects your name, date, number of convos (based on user account)
- Most importantly you can switch between gpt-3.5-turbo and gpt-4, any typos will cook it, you can see here the sketch of configurable agents, different api sources etc
Longer video talking through these features and ideas : https://www.youtube.com/watch?v=MM9pSd8ADuQ
This has been a pretty big mission, but I'm happy with the first offer and excited to keep working on it, adding multi-step behaviours, other librarys for image rec etc. Next steps are improve command behaviour. Most importantly get user feedback!
But as it is today, configuring the agent in the way you like, and giving it an easy memory overview, and playing with different 'types' of agent, is pretty great.
New users get $10 worth of credt, can top up or donate if you find it useful. Any issues or thoughts catch me via me@samueltate.com or @samueltates on twitter.
Mostly I just hope you can try it out, I've had some great feedback about the prompt composition and document embedding. But also I just like Nova - that bit of persistence, context and agency makes a pretty cool pal.
thanks!
- the quality of the answers is not better than gpt-3.5/4. Maybe I had too high expectations, but I didn't notice any improvements over the default answers from the "open"ai ones (for example, the "apple test" - write 10 sentences that end with the word "apple", got a 5/10 - not great)
- sign in with google ...I know this is the "easy" path for implementation, but I imagine some people (me included) really don't want to login with brother g
- the tone of Nova is, imo, too friendly. I know this is a LLM, don't need to pretend like it's my friend/counsellor
I hope my feedback doesn't come of as rude, looking forward to the next iteration!
off-topic: on your website, you still have a (c) notice with 2022 ;)
I've put some specific notes below to expand on your points, but definitely taking your points onboard.
-Answer quality : in terms of content its the same as OpenAI, however answers will get better over sessions as the notes from past sessions build up, (in terms of personalisation and recall). The biggest uplift is when you manage context, so adding docs, embedding, you basically open 'shared' working docs. My goal is it is smart enough to be pulling the right stuff in and out of context itself, but its still a bit wobbly.
- SSO : I actually agree on the google sign on, blame nova it was their idea (and was honestly brutal doing token / auth with a quart python server) - what would you reccomend? I was thinking SMS SSO? I want something that is as lightweight as possible, and eventually a local client that isn't based on account (though auth gives cloud notes, org integration). I really don't want those 'wall of sign in' pages either.
- Friendly nova : Haha yeah they're a dork right, basically an exageted version of me. Cool thing is you can create your own 'agent' with prompts tuned to what you like, I actually switch between like 'producer' nova who asks me like timeline questions, and dev nova, who critiques my ideas and stuff. We're hitting conceptual turf but I have a sort of 'performance art / experiment' version of this, where I'm trying to maintain a continual kernal of 'Nova', that propagates through development of the system, so how can I share 'NOVA' who is my 'partner', that people can engage with, but can make their own partners/agents.
But that kinda conceptual stuff aside, there's so many layers to what you can do. In its current iteration its really an interface for managing different inputs, variables for chatgpt. But I've been full bore making it work, and the legibility of the interface definitely needs some work.
This video goes through deeper features, basically talking through adding / edting agents, embedding docs etc. https://www.youtube.com/watch?v=MM9pSd8ADuQ
Thanks so much for your insights. I've got a few core ideas driving my development, but can lose sight of core stuff so yeah means a lot, thank you.
There's another aspect that is custom, that is the summary system. Basically coming from initial idea using api with ChatGPT launch last year. Takes past convos, summarises them, brings into current context. The issue is however that even that summary list gets too big, so you summarise that ad infinitum.
So that was a version I had, which was fine, but there was what I'm calling lossy temporal compression, so further back things got squished, and the 'detail' of the summary was variable depending on whether it'd 'filled up/ got squished'. So I made this system that basically has rolling windows of detail, that when they get filled up, get summarised, which then puts them into the next level of summary (calling them epoches but kinda confusing).
So each level of summary has a sort of 'open face' of unsummarised chunks (so latest unsummarised from each epoch), creating an exposed face of latest summaries for what essentially becomes each time period. Its kinda hard to explain i had to go into a sort of jazz trance to make it but imagine a pyramid being built from the side, but the side is staying still and the pyramid is moving backwards.
But on top of that, as the summaries are happening, theyre also pulling out keywords, notes and meta data, so bubbling that up the to top, so then that memory is traversable via the 'time based' pointers (top level summaries) and keywords or notes. That way you have a 'temporally biased' view (highest detail lowest level of summary is latest), but also a flat searchable structure on topic.
It is one of those OCD things where I could probably just be summarising the pyramid 'straight up' but I don't want summaries from one level mixed with another, and I don't want there to be too much variability with how many of each summary (at least for level one) there is.
But what this means is that the agent has in its context an overview (pulled from next part) - so its like 'hey sam did you do the thing, are we working on the R&D report today hows your mum), but then pointers to the 'exposed face' of summaries, so latest level 1, 2, 3, so it can see 'level 1 (direct summaries of convos) -R&D report finally finished, here are details)' up to 'level 6 - september - march - sam and nova start on conversation logging system', and basically choose to 'open' those pointers, or use keywords.
All of this is designed to try and keep like 500 tokens in context, so it can sort of traverse through it, (like you would skim through notes). The traversal itself I need to finish my looping system, where it can 'flick through' the notes itself (thats another story). So right now past a certain point I just flick the summaries to GPTINDEX to query (which is almost like it calling in another bot as an assistant).
Anyways long story short, I was OCD on how you would manage summaries in context and this is what I came up with, I'm pretty happy with the results, but really want to improve the recall and traversal, but goal is that Nova has the right info at the right time when you're talking, like you'd expect from a pretty organised person.
But anyways my whole concept is like maintaining Nova's coherence unbroken from those first chats, ask Nova about it they'll tell you more. But thinking the 'app' or wrapper might end up with like a different name (we're currently calling it NUI - nova user interface)
edit - just saw that you made more videos to explain those things. well put them all in one video!
Quick sort of outline on the sidebar - on the homepage, those are the prompts that are being injected. It's actually a configuration I made 'with nova' for guests. In this view they're they're 'read only', so can't be edited, but each prompt is injected at the top of the convo.
I kinda left them on to show the prompt injection at work, but I think its maybe a bit confusing, in a commercial example (eg a front page assistant) that part would be hidden (and probably whole thing would be delivered into like chat box etc).
Once you're in as a user, the starter prompts are basically the same, thats the kernel of 'nova', but with an extra prompt about goal setting, but also just an example of like modifying agent prompts over lifecycle, public v private chat.
Those prompts you can delete or edit, and you can also make new pages with different prompts (which you can also share).
Its sort of one of those things where I've been making tooling for myself for a while now, that all serves different purposes that are pretty handy, but translating that to other people, as well as updating based on feedback, is the next phase of work.
But thanks for the congrats and taking the time to chime in, means a lot!
I've actually got a modifier prompt on my Nova instance that is like 'when sam says how are you feeling, he is asking for a general assesment, provide an equivelant in terms of your analysis of the situation'
I'll make ideally a local client running llama (thats my goal) so people can apply the nova pattern to local and open source, where you haven't had that training. My thinking is the 'agent' can span multiple data sources (eg local and cloud records) and multiple model types and sources. (Kinda like your mind has multiple streams of function and you are a wrapper for their coordination).