The extension is public if you want to try it on your own repos: https://marketplace.visualstudio.com/items?itemName=CodeViz....
Will and I started CodeViz because we wanted more intuitive representations of software. During our time at Tesla, we encountered a common problem: software engineers spend very little time actually typing code. Most development time was spent navigating convoluted files and building a mental map for each task. At the same time, whiteboard sessions were proof that code could be expressed intuitively.
We started with autogenerated technical documentation. Of course, long markdown docs are not a good solution for long files of code. We realized we needed diagrams that (a) help grasp large quantities of code and (b) can be filtered according to the developer’s task. So, we built a graph-based VS Code extension. It generates diagrams directly within VS Code, illustrating connections between functions and providing overviews of system architecture. These visualizations update as code changes.
CodeViz appears as a side panel in VS Code with two views:
(1) Call graph: as you click on functions, we show a chain of upstream and downstream references. You can navigate your codebase using the call stack and see, in one view, everywhere your functions are called. We generate this call graph using the language servers developers have already installed in VS Code
(2) Architecture diagram: we create a C4 diagram of your system, so you can see a top-level view of your codebase and click into the component layer. We were surprised to find that a small fraction of code can generate a very accurate representation of the system. We detect these important files, then use LLMs to build nested architecture diagrams at the container and component level
Developers are mainly using our extension to navigate spaghetti code, onboard new devs, and interpret open source repos. We're still figuring out our pricing. Currently, we offer basic features for free, with a paid tier for more resource-intensive tools like detailed architecture diagrams. Open to suggestions on this approach.
CodeViz is in active development and our main focus over the next couple of days is to make the call graph much easier to view and navigate. We're continuously working to make it better, so your honest feedback, suggestions, and wishes would be very helpful. Looking forward to hearing any and all thoughts, whether about the current extension, general problem space, or something else!
I tried this with a bunch of small open-source repos and it works great! I imagine using LLM might be a hard no for some people/enterprises - any plans to use stand-alone licenses with small local models? It seems like for what LLM is doing here (if I understand it right, help label the modules in natural language and perhaps help organize them into this hierarchy/modules) you don't necessarily need a SoTA model, right?
Also, this could be coming from LLMs, but I see that the visualizations are more biased towards terminology used in web-dev? (for example, one of my robot related repo was organized into front-end, back-end, etc. with I guess is kinda right but not exactly lol). It would be nice to see an interactive visualization where I can iterate on the initial viz with information I know, e.g., I drag and drop a module or rename it and then you probably do another pass with this feedback and LLMs and update my overall visualization with more domain specific labels and partitions?
Edit: Exploring CodeViz on a few more repos, and it seems like you have a set of hardcoded labels for the highest hierarchy in the architecture diagram? (so far, I've only seen Users, Databases, Backend, Frontend, and Shared Components). I am guessing this is something passed on in the prompts? It'll be nice to allow user to define their own set of labels/partitions at one or more levels and then try to create an architecture visualization that fits into these labels/constraints (although I am guessing at some point you have to be wary of hallucinations?)
The top level categorizations are indeed fixed, however the nodes themselves can be arbitrary. We've found this helps with grouping and organization while still allowing for the flexibility required to accommodate different systems. I'm curious, are there any categories missing here that could be added?
Currently, we categorize by: Frontend (UI/UX elements), Backend (API/Business/Data Access), DB(persisting storage), External Services (Backends maintained outside codebase), Shared Components (internally maintained libraries and helpers)
There is some hard coded bias for web dev. Diagram modification is definitely high on our todo, and we've been finding ways to reduce pre-defined structure in our prompts to LLMs so they work with broader tech stacks. When we sell licenses to teams, we do some manual checks for accuracy and detail, which helps us improve the public extension.
What's the name of the robotics repo? And any preference for modifying the diagram directly vs instructing changes by text?
Tried it on the github.com/pulumi/pulumi codebase and I get 5 blocks and that's it. Seems nice but I'm not going to pay 20 bucks a month to view one layer deeper.
I cloned pulumi and exported both layers of the CodeViz diagrams into mermaid format: https://github.com/EdisonLabs-Inc/Pulumi-Diagrams/tree/main. I'm not familiar with the repo so let me know if anything looks off. Hope you find these useful!
The examples in your youtube video look good. I'm curious how they're generated. "We were surprised to find that a small fraction of code can generate a very accurate representation of the system." is a surprising statement to me. It's not been my experience that the code can reveal an accurate representation of human-understandable architecture beyond the call graph. The backend system generated from OpenHands (in your video) also looks pretty different from their own architecture diagram in their README: https://github.com/All-Hands-AI/OpenHands/tree/main/openhand... . How do you reconcile what an LLM says an architecture looks like with what maintainers prescribe? Is there a way to give feedback to it? (similar to pthangeda's comment on customization)
I wish there was a way to point this at a repo to test its efficacy. Though I understand that that'd be prohibitively expensive to do for free on the landing page.
I'm also curious how you guys distinguish yourself from https://docs.codesee.io/docs/review-maps-for-visual-studio-c... . They tried this for a few years but shut down recently (https://www.linkedin.com/posts/shaneak_update-codesee-has-be...)
In terms of generating architecture diagrams, we follow the c4 model, with top level nodes defined as separately deployable units of software, and lower level component nodes being a set of functions wrapped behind a common interface. As the product develops, we'd like to include a way for feedback/fine tuning, but ideally the definition of an architecture diagram would be rigorous enough that there is no ambiguity, this is what we're aiming for. If you'd like to try it out on a specific repo, you can always use our extension for further analysis.
You're right to notice the similarity with CodeSee. Ultimately we're looking to focus on improving the developer experience without needing to leave the IDE. The idea is that CodeViz can replace or augment search and directory tree by providing a more intuitive interface for navigation!
VSCode extension marketplace doesn't have the best security rails or reputation for security, and with this being closed source, just personally, installing and running it on my machine isn't something I'm comfortable doing.
> The idea is that CodeViz can replace or augment search and directory tree by providing a more intuitive interface for navigation!
That to me is a different goal than the one in your post (maybe it's just phrasing or I didn't understand the OP correctly), and is something I'd be excited to have!
> ideally the definition of an architecture diagram would be rigorous enough that there is no ambiguity
Rigor is a big "if" in software ;). See: UML's attempt. C4 is some very loose guidelines. IIRC, a big part of its attraction is the lack of rigor/formal standards.
Anyway, best of luck! Feel free to reach out if you'd like to chat diagrams
This is genius
- We are allowed to use our own API key to get your extension to talk directly with our LLM of choice. I understand you are using Claude right now, but we are on OpenAI & Copilot; since we already went through the hoops of "accepting that our codebase will be sent to an LLM", we are trying to control the level of exposure. OpenAI is already in, Claude isn't; and you definitely won't be so having our codebase go through your infra is a no go. Local LLM would be incredible, but it might be not technically achievable yet.
- The price is too steep. I get way more value out of Copilot than I get from CodeViz which is similarly priced. I understand you have LLM costs at the moment, but as the point above defends: it does not have to be that way. I would find it way more "fair" to have to insert my LLM API (even for the free version and/or trial, right now you are loosing money on that) and have a separate cost for the extension; possibly with a "lifelong for current version + 1 year of upgrade license". Let me pay the usage to the LLM monthly depending on my actual usage, and pay you to build a very nice prompt engineering system.
> The price is too steep. I get way more value out of Copilot than I get from CodeViz which is similarly priced.
Copilot is a great product and inexpensive, I do understand where you're coming from. Personally, I spend most of my time reading code and gathering context vs typing code. We hope to provide much more than $19/mo in value, so the subscription is a good signal for whether we're making something useful for you. We also have more features for team licenses that we haven't publicized.
The reduced subscription price for using your own API key does make sense though. If you send me an email or message me on discord I'd be happy to chat this over. This goes for anyone with a strong opinion on the matter
what is the price? After looking at the website or the marketplace, I can't see any pricing at all (except the "free" on the marketplace page)
> I understand you are using Claude right now
On the marketplace page it says "We do use LLMs and sections of code to prompt Claude/OpenAI." under "Data & Privacy"
We only use Anthropic at the moment, but we left OpenAI in the Data & Privacy in case we change models (not in the foreseeable future)
Also, not being able to click past the first layer in the free version is a bit too limited in my opinion.
Sounds promising, but having to fill out my payment details for a trial version is a a hard no for me.
Happy to pay a one time fee and the ability to use my local LLM
We just had a major incident because someone accidentally uploaded our codebase via a VS code extension, lawyers, everything involved. I expect that non local AI tools will be banned soon.
Think about it, what 3rd party tool would you let scrape your whole codebase and send it to their server?
So we must trust both Anthropic's and your infrastructure with our code.
I agree that a local LLM is the way to go.
Personally, I spend a small fraction of my time actually typing code and much more time gathering context & building a mental map. CodeViz speeds up the mental map part, so we hope to deliver much more value than $19/mo
Let me know if there's anything missing from CodeViz that would change your mind!
It was fun to look at various sites front end architecture.
I started to do complexity analysis as well, because, why not. But I got busy with actual work.
I wonder if I can dig it back from somewhere.
On a different note: I’m wondering, do you have any function calling built in? Is there room for combining LLM output with LSP calls?
In those terms CodeViz will provide a form of simplified class diagrams and a function call mapping?
Will there be sequence diagrams in the future?
¹ https://www.uml.org/ ² It has fallen out of fashion, since many people found it too heavy, and a lot of people have hever heard of it.
The info in sequence diagrams is useful - maybe we can show this using a hierarchical layout & edge labels
After installing it, it doesn't even let you see the first thing without signing up for a trial period and even then doesn't mention a cost.
I understand you just launched it, but I look at the free trial and think "what else am I going to find out...", uninstalled for now but hope you find traction.
Maybe I was supposed to single click one to find out more? I double clicked for what it's worth, and got the message to upgrade. But given that it hadn't done anything useful yet I of course had no reason to proceed. Just early feedback, I'm sure you'll get it all worked out.
But the navigation? Not easy to understand how to use it. Any video for helping us out see what we can do and how we can do it ?
I have been forcing my bot to give me Mermaid diagrams, swim diagrams, markup tables of schemas, code, logic etc...
I like where you guys are going, but what I think would be really fun - would be a Node Based diagram logic, where the boxes that you show in the diagram are Code-geometry-Nodes - and could be connected with code blocks as such.
Watch @HarryBlends videos on Geometry Nodes in Blender for Inspiration:
https://www.youtube.com/@harryblends
https://www.youtube.com/watch?v=a-4oCHe-hDE
These are the best graphic/node based visuals for describing structured relationships in maths I've ever seen.
To give you some CyberPunk FutureVision of what your outpus could be like -- if it turned all the code nodes into atomic code legos and rather than drawing the diagram from the code - I can use the diagram to create the code.
--
WRT "...some code goes to anthropic..." while answering another, seems like you guys would do well to know these guys:
https://news.ycombinator.com/item?id=41381498
As well as these guys:
In designing CodeViz we were inspired by the Maya hypershade, which closely resembles the diagram-based blender tool that you shared.
https://help.autodesk.com/view/MAYAUL/2024/ENU/?guid=GUID-22...
These examples show how taking a diagram-based approach to software development can abstract away complexity with minimal loss of control over the end result. I love your image of "atomic code legos," and these legos can still always be edited the level of code when needed.
And yes, if CodeViz can generate architecture diagrams from code, the inverse can and will be possible: generating code from architecture diagrams.
I've been wanting to have a GPT directly inside Blender to Talk Geometry Nodes - because I want to tie geometry nodes to external data to external data which runs as python inside blender that draws the object geometry that suitabley shows/diagrams out the nodes of my game I am slowly piecing together 'The Oligarchs' which is an updated Illuminati style game - but with updates using AI to creat nodes directly from Oligarch IRL files, such as their SEC Filings, Panama Papers, and all the tools on HN are suited to creating. I went to school for Softimage & Alias|WAVEFRONT (which became MAYA) Animation in 1995 :-)
So I like your DNA.
I want to unpack the relationships of the Oligarch, programmatically, with hexagonal nodes, similar to this[0]- but driven by Node-based-python-blocks-GraphQL-hierachy. And I am slowly learning how to get GPTBots to spit out the appropriate Elements for me to get there.
[0] - https://www.youtube.com/watch?v=vSr6yUBs8tY
(ive posted a bunch of disjointed information on this on HN - more specifically about how to compartmentalize GPT responses and code and how to drive them to write code using Style-Guide, and gather data using structures rules for how the outputs need to be presented..)
EDIT:
I wanted to share with you: Building an app with claude, where I tell it to "give me a ps1 that sets a fastAPI directory structure, creates the venv, touches the correct files give me a readme and follow the best practice for fastAPI from [this github repo from netflix]
https://i.imgur.com/7YOjJf8.png
https://i.imgur.com/KecrvfZ.png
https://i.imgur.com/tKYsmb9.png
https://i.imgur.com/nCGOfSU.png
https://i.imgur.com/ayDrXZA.png
Etc -- I always make it diagram. Now I can throw a bunch of blocks in a directory and tell it to grab the components from the directory and build [THIS INTENT].app for my.