Untapped Way to Learn a Codebase: Build a Visualizer (opens in new tab)

(jimmyhmiller.com)

244 pointsandreabergia3mo ago46 comments

46 comments

The problem with most people's code is that it's full of unnecessary complexity and creates a ton of work. I swear at least 90% of projects from 'top' companies, by 'top' engineers is full of unnecessary complexity which slows everything down significantly. They literally need a team of 20+ engineers to do the work which could have been done more effectively with 1 good engineer.

Based on modern metrics for code quality, almost nobody will realize that they're looking at bad code. I've seen a lot of horrible codebases which looked pretty good superficially; good linting, consistent naming, functional programming, static typing, etc... But architecturally, it's just shockingly bad; it's designed such that you need to refactor the code constantly; there is no clear business layer; business logic traverses all components including all the supposedly generic ones.

With bad code, any business requirement change requires a deep refactoring... And people will be like "so glad we use TypeScript so that I don't accidentally forget to update a reference across 20 different files required as part of this refactoring" - Newsflash: Your tiny business requirement change requires you to update 20+ files because your code sucks! Sure TypeScript helps in this case, but type safety should be the least of your concerns. If code is well architected, complex abstractions don't generally end up stretching across more than one or two files.

There's a reason we say "Leaky abstraction" - If a complex abstraction leaks through many file boundaries; it's an abstraction and it's leaky!

lpedrosa3mo ago

I fully agree with your sentiment, and it also drives me crazy sometimes.

I wonder if the main problem was all the min maxing interview patterns that rewarded algorithm problem solvers back in the 2010's onwards.

People applied for software engineering jobs because they wanted to play with tech, not because they wanted to solve product problems (which should have a direct correlation with revenue impact)

Then you have the ego boosting blog post era, where everyone wanted to explain how they used Kafka and DDD and functional programming to solve a problem. If you start reading some of those posts, you'll understand that the actual underlying problem was actually not well understood (especially the big picture).

This led the developer down a wild goose chase (willingly), where they end up spending tons of time burning through engineering time, which arguably could be better spent in understanding the domain.

This is not the case for everyone, but the examples are few.

It makes me wonder if the incentives are misaligned, and engineering contributing to revenue ends up not translating to hard cash, promos and bonuses.

In this new AI era, you can see the craftsman style devs going full luddite mode, IMO due to what I've mentioned above. As a craftsman style dev myself. I can only set up the same async job queue pattern that many times. I'm actually enjoying the rubber ducking with the AI more and more. Mostly for digging into the domain and potential approaches for simplification (or even product refinement).

socketcluster2mo ago

It's infuriating to think about interviews as a likely cause for this complexity bloat because I made so many comments online about this exact problem with big tech interview processes and people would usually acknowledge the problem but no company ever fixed it! The only people who didn't think there was a problem, unironically, were those who were very good at fast puzzle-solving.

Painful for me because I excel at architecture. My puzzle-solving skills are actually good too, but unfortunately, not under time constraints! Sometimes I feel like there's been an industry-wide conspiracy against the software architect archetype!

I remember since I first learned coding at a young age, I wanted to be a software architect and I was shocked to learn that this skill was rarely appreciated in the industry. I became convinced that the software developer role had become a kind of 'bullshit job' of sorts to meet the needs of the reserve bank's job-creation agenda.

I suppose the silver lining is that at least now LLMs have a bias towards puzzle-solving and so lead most codebases astray... This increases my value as a software architect or 'craftsman' in your words.

I think you make a good argument there. You can extrapolate it to almost every aspect of society. Since you go to school, everything has been geared towards measuring thinking speed... We've been using thinking speed as the definition of intelligence... You know who else besides high IQ individuals are good at thinking fast? LLMs!

It's kind of interesting and fitting though that the AI agents we invented have the same biases as the humans at the top of our organizations!

I feel like the whole "there is only one kind of intelligence" belief which was pervasive in big tech has been thoroughly debunked by now.

judahmeek3mo ago

> If code is well architected, complex abstractions don't generally end up stretching across more than one or two files.

This is a naive metric since it's satisfied by putting the entire code base into a single file.

Part of the reason that business requirement changes to modern web dev code bases require changes to so many files is because web devs are encouraged to restrict the scope of any one file as much as possible.

I can't tell if you're complaining about that specifically or if you think it's possible to have both a highly modularized code base & still restrict business requirement changes to only a couple files.

If the latter, then I'd love to know guidelines for doing so.

wackget3mo ago

You just described literally all modern web development.

socketcluster3mo ago

Almost all, yes.

I said 90% in my comment but that's from my professional experience which is probably biased towards complex projects where maintainability is more important.

tclancy3mo ago

This is an interesting approach. I think, in a way, it mirrors what I do. Having contracted for much of my career, I’ve had to get up to speed on a number of codebases quickly. When I have a choice of how to do this, I find a recently closed issue and try to write a unit test for it. If nothing else, you learn where the tests live, assuming they exist, and how much of a safety net you have if you start hacking away at things. Once I know how to add tests and run them (which is a really good way to deal with the codebase setup problem mentioned in the article because a lot of onboarding docs only get you to the codebase running without all the plumbing you need), I feel like I can get by without a full understanding of the code as I can throw in a couple of tests to prove what I want to get to and then hope the tests or CI or hooks prevent me from doing A Bad Thing. Not perfect and it varies depending on on how well the project is built and maintained, but if I can break things easily, people are probably used to things breaking and then I have an avenue to my first meaningful contribution. Making things break less.

QuantumNoodle3mo ago

I am quite skeptical and reserved when it comes to AI, particularly as it relates to impacts of the next generation of engineers. But using AI to learn a code base has been life-changing. Using a crutch to feel your way around. Then ditching the crutch when things are familiar, like using a map until you learn the road yourself.

RealityVoid3mo ago

Super useful, indeed. My only fear is that at times it can lead to superficial understanding. You don't get the satisfying click of all pieces, just a surface level understanding. I find that once AI gives me the lay of the land I still need to deep dive myself, but I can take shortcuts I would have never taken and it feels live traversing the scenery with a map. Pretty nifty!

ambicapter3mo ago

I think it is possible to use AI in a way that ends up with understanding and very easy to use it in a way where nothing at all sticks. Vibe coders by definition know or understand 0% of their codebase but you can use AI in a more questioning manner where you can get answers that are testable, test them immediately, and add the correct answers to the context immediately while embedding a clearer picture in your mental model.

patrickdavey3mo ago

I'm about to start a new role. What have you found most effective in using it to learn a new code base? Just asking questions about "what is this class doing" ? drawing architecture diagrams?

QuantumNoodle3mo ago

Just ask it what naturally draws your curiosity and use it to build your mental model. I may add that our company got us enterprise subscription (so models aren't trained on our IP) so I can just point it at the entire codebase, rather than copying/pasting snippets into a chat window.

What does this program accomplish? How does it accomplish it? Walk me through the boot sequence. Where does it do ABC?

I work in a company where I frequently interact with adjacent teams' code bases. When working on a ticket that touches another system, I'll typically tell it what I'm working on and ask it to point me to areas in the code that are responsible for that capability and which tests exercise that code. This is a great head start for me. I then start "in the ball park".

I would not recommend to have it make diagrams for you. I don't know what it is but they LLMs just aren't great at coveting information into diagram form. I've had it explain, quite impressively, parts of code and when I ask it to turn that into a diagram it comes up short. Must be low on training data expressing itself in that medium. It's an okay way to get the syntax for a diagram started, however.

I wish you an auspicious time in your new role!

catapart3mo ago

Your visualizer looks great! I really like that it queues up tasks to run instead of only operating on the code during runtime attachment. I haven't seen that kind of thing before.

I built my own node graph utility to do this for my code, after using Unreal's blueprints for the first time. Once it clicked for me that the two are different views of the same codebase, I was in love. It's so much easier for me to reason about node graphs, and so much easier for me to write code as plain text (with an IDE/language server). I share your wish that there were a more general utility for it, so I could use it for languages other than js/ts.

Anyway, great job on this!

hyperific3mo ago

GitHub Next comes to mind

https://githubnext.com/projects/repo-visualization/

esafak3mo ago

Not very useful, is it?

criddell3mo ago

Is this similar to what you can get with Doxygen?

https://en.wikipedia.org/wiki/Doxygen#/media/File:Doxygen-1....

vasvir3mo ago

That would be my question too...

Charon773mo ago

In reverse engineering we often use Graph View to see execution flow as well. Glad to see it being used elsewhere

touristtam3mo ago

Do you automate that? If so what tooling do you use?

Pay083mo ago

IDA does it by default, for example.

Quiark3mo ago

Do you guys remember the smalltalk toolkit posted here a while ago which their creators made specifically for help understanding new codebases?

xkriva113mo ago

https://gtoolkit.com/ or https://moosetechnology.org/

bokchoi3mo ago

Woah, that Glamorous Toolkit environment looks amazing. Thanks for the pointer.

FailMore3mo ago

The building of the visualiser was less interesting to me than the result and your conclusion. I agree that finding new ways to ingest the structure and logic of software would be very useful, and I like your solution. Is there a way to test it out?

hks03mo ago

I always thought to do this visualization in 3d and maybe with VR. Not sure how useful or pleasing experience it would be. Kudos to the author of the project to get this done!

avaer3mo ago

I got Minority Report vibes.

This kind of approach might be what (finally) unlocks visual programming?

I feel like most good programmers are like good chess players. They don't need to see the board (code). But for inputting the code transformation into the system this might be a good programmer's chessboard.

Though to make it work concretely for arbitrary codebases I feel like a coding agent behind the scenes is 100% required.

netsharc3mo ago

A 3d environment (VR-headset with Tom Cruise-style-swiping, or Doom-style with WASD navigation) would be cool, one could be "in orbit", observing the system, watching the nodes and their interactions, and pause and see what messages they're passing to each other. How about time-travel-debugging to allow rewinds too!

As a bonus, porting Doom to it should be "trivial".

mathgeek3mo ago

> I feel like most good programmers are like good chess players.

A specific type or area of developers, I'd say. There are many types and not all of them require understanding sizeable code bases to do their work well.

soulofmischief3mo ago

Understanding your large codebase is a few prompts away. You can ask a model to trace through and provide reports on the project's design, architectural and implementation. From there, you can drill in with followups.

Done right, you may not know specific lines or chunks of code by heart, but much like a tuned-in company CEO, you have eyes and ears on the ground and retain global oversight and insight of the project itself. For specifics, you can learn what you need as you need it. If that means knowing how every single module works, that's just a conversation with your agent.

satheeshds3mo ago

- But I'll admit, this isn't precisely how I would do it today

How would you do it today?

jimmyhmiller3mo ago

I try to explain what I mean the next few sentences of the post. I have spent a good amount of my career jumping into fairly large code bases. I don't need to take it quite so step by step. I have seen enough code to take shortcuts, to guess at what is there.

But telling people that isn't helpful. I try at the beginning to give more step by step of how I would get into understand the code base if I didn't already know these kinds of shortcuts. (I'm not sure I could write those down, they are just know how and heuristics, like how when you are a starting to code a missing ; can take a much longer time to see than as you've been programming for a while)

onionisafruit3mo ago

I thought that was curious. He says this isn’t how he would do it today then goes on to do it today (or presumably the same day he wrote that he wouldn’t do it this way today).

xtiansimon3mo ago

> "...how I learn an unfamiliar codebase"

There should be more writing and discussion in this area for several reasons. Simplest reason because we're curious about how others do this. But also because it's an interesting topic, IMHO, because layers of abstraction--code designed to run other code--can be difficult to talk about, because the referents get messy. How do you rhetorically step through layers of abstraction?

cyberpunk3mo ago

Doesn’t anyone use debuggers anymore?

When I have a codebase I dont know or didn’t touch in some time and there’s a bug, first step is reproduce it an then set a breakpoint early on somewhere, crab some coffee and spend some time to step through it looking at state until I know what’s happening and from there its usually kind of obvious.

Why would one need a graph view to learn a codebase when you can just slap a red dot next to the route and step a few times?

justinhj3mo ago

I have found that interactive visualizations are a great way to understand code and systems in general. Now you can have an AI make one in under a minute it's a very useful tool.

https://heyes-jones.com/externalsort/treeofwinners.html

Take this example. I can step through the algorithm, view the data structure and see a narration of each step.

A debugger is useful for debugging edge cases but it is very difficult to learn a complex system by stepping through it.

luxurytent3mo ago

This may be where AI coding tools unlock us. Being able to build tooling against novel concepts that change how we approach reading and writing code. I like it!

glaslong3mo ago

Very cool! For all its faults, seeing control and value change flows through execution is one of the things I really liked about Unreal's Blueprint viz scripting system. This looks like a better take on that.

And for huge git repos I always like to generate a Gource animation to understand how the repo grew, when big rearrangements and refactors happened, what the most active parts of the codebase are, etc.

esafak3mo ago

A use case that interests me is dynamic visualization for debugging, when there are interacting systems.

esafak3mo ago

To flesh this out, let me see the volume of calls and data from one place to another. Help diagnose back-pressure, drops, rejections, and any other irregularities.

Think of an on-caller who wants to quickly pinpoint a problem. Visualization could help one understand the nature of the problem before reading the code. Then you could select a part of the visualization and ask the computer to tell you what that part does, if there are any recent changes to it, etc.

gowld3mo ago

Where's the visualizer the blog post talks about?

How is it different from regular code browser/indexers?

indiestack3mo ago

The unit test approach from the contractor in the thread is gold: "find a recently closed issue and try to write a unit test for it." This forces you to understand the test infrastructure, the module boundaries, and the actual behavior — not just the code structure.

I'd add one more technique that's worked well for me: trace a single request from HTTP endpoint to database and back. In a FastAPI app, that means starting at the route handler, following the dependency injection chain, seeing how the ORM/query layer works, and understanding the response serialization. You touch every layer of the stack by following one real path instead of trying to understand the whole codebase at once.

Visualizers are nice for the "big picture" but they rarely help you understand why the code works the way it does. The why is in the git history and the closed issues, not in a dependency graph.

hxugufjfjf3mo ago

Cool project! Would you be willing to share the source code?

jnpnj3mo ago

This is the first thing that I used LLMs on. Not code generation, but parser and tooling to gain understanding. Also saves resources in the long run.

lysace3mo ago

One of my favorite uses for Claude Code is to point it at a section of seriously badly written code with undecipherable symbol names, over the top cyclomatic complexity etc and just ask it to make the code readable.

cyrusradfar3mo ago

Warning Blatant Self Promotion

I created Intraview for VS Code, Cursor, etc., that makes it easy to create code tours with your Coding agent by simply saying, "Create a tour that helps me understand how to get started with this repository."

It has other features, but it was designed for the problem of getting in new code bases and it allows the tours to be saved in the repo as flat json files. You can re-open or share tours with new folks, and if the code changes the system notifies you how to ask your agent to update the tour.

Just a thought.

TonyStr3mo ago

You are so lucky to have git history and issues to work from!

j / k navigate · click thread line to collapse

46 comments

socketcluster3mo ago

There's a reason we say "Leaky abstraction" - If a complex abstraction leaks through many file boundaries; it's an abstraction and it's leaky!

lpedrosa3mo ago

I fully agree with your sentiment, and it also drives me crazy sometimes.

I wonder if the main problem was all the min maxing interview patterns that rewarded algorithm problem solvers back in the 2010's onwards.

People applied for software engineering jobs because they wanted to play with tech, not because they wanted to solve product problems (which should have a direct correlation with revenue impact)

This led the developer down a wild goose chase (willingly), where they end up spending tons of time burning through engineering time, which arguably could be better spent in understanding the domain.

This is not the case for everyone, but the examples are few.

It makes me wonder if the incentives are misaligned, and engineering contributing to revenue ends up not translating to hard cash, promos and bonuses.

socketcluster2mo ago

It's kind of interesting and fitting though that the AI agents we invented have the same biases as the humans at the top of our organizations!

I feel like the whole "there is only one kind of intelligence" belief which was pervasive in big tech has been thoroughly debunked by now.

judahmeek3mo ago

> If code is well architected, complex abstractions don't generally end up stretching across more than one or two files.

This is a naive metric since it's satisfied by putting the entire code base into a single file.

If the latter, then I'd love to know guidelines for doing so.

wackget3mo ago

You just described literally all modern web development.

socketcluster3mo ago

Almost all, yes.

I said 90% in my comment but that's from my professional experience which is probably biased towards complex projects where maintainability is more important.

tclancy3mo ago

QuantumNoodle3mo ago

RealityVoid3mo ago

ambicapter3mo ago

patrickdavey3mo ago

I'm about to start a new role. What have you found most effective in using it to learn a new code base? Just asking questions about "what is this class doing" ? drawing architecture diagrams?

QuantumNoodle3mo ago

What does this program accomplish? How does it accomplish it? Walk me through the boot sequence. Where does it do ABC?

I wish you an auspicious time in your new role!

catapart3mo ago

Your visualizer looks great! I really like that it queues up tasks to run instead of only operating on the code during runtime attachment. I haven't seen that kind of thing before.

Anyway, great job on this!

hyperific3mo ago

GitHub Next comes to mind

https://githubnext.com/projects/repo-visualization/

esafak3mo ago

Not very useful, is it?

criddell3mo ago

Is this similar to what you can get with Doxygen?

https://en.wikipedia.org/wiki/Doxygen#/media/File:Doxygen-1....

vasvir3mo ago

That would be my question too...

Charon773mo ago

In reverse engineering we often use Graph View to see execution flow as well. Glad to see it being used elsewhere

touristtam3mo ago

Do you automate that? If so what tooling do you use?

Pay083mo ago

IDA does it by default, for example.

Quiark3mo ago

Do you guys remember the smalltalk toolkit posted here a while ago which their creators made specifically for help understanding new codebases?

xkriva113mo ago

https://gtoolkit.com/ or https://moosetechnology.org/

bokchoi3mo ago

Woah, that Glamorous Toolkit environment looks amazing. Thanks for the pointer.

FailMore3mo ago

hks03mo ago

I always thought to do this visualization in 3d and maybe with VR. Not sure how useful or pleasing experience it would be. Kudos to the author of the project to get this done!

avaer3mo ago

I got Minority Report vibes.

This kind of approach might be what (finally) unlocks visual programming?

Though to make it work concretely for arbitrary codebases I feel like a coding agent behind the scenes is 100% required.

netsharc3mo ago

As a bonus, porting Doom to it should be "trivial".

mathgeek3mo ago

> I feel like most good programmers are like good chess players.

A specific type or area of developers, I'd say. There are many types and not all of them require understanding sizeable code bases to do their work well.

soulofmischief3mo ago

satheeshds3mo ago

- But I'll admit, this isn't precisely how I would do it today

How would you do it today?

jimmyhmiller3mo ago

onionisafruit3mo ago

I thought that was curious. He says this isn’t how he would do it today then goes on to do it today (or presumably the same day he wrote that he wouldn’t do it this way today).

xtiansimon3mo ago

> "...how I learn an unfamiliar codebase"

cyberpunk3mo ago

Doesn’t anyone use debuggers anymore?

Why would one need a graph view to learn a codebase when you can just slap a red dot next to the route and step a few times?

justinhj3mo ago

I have found that interactive visualizations are a great way to understand code and systems in general. Now you can have an AI make one in under a minute it's a very useful tool.

https://heyes-jones.com/externalsort/treeofwinners.html

Take this example. I can step through the algorithm, view the data structure and see a narration of each step.

A debugger is useful for debugging edge cases but it is very difficult to learn a complex system by stepping through it.

luxurytent3mo ago

This may be where AI coding tools unlock us. Being able to build tooling against novel concepts that change how we approach reading and writing code. I like it!

glaslong3mo ago

esafak3mo ago

A use case that interests me is dynamic visualization for debugging, when there are interacting systems.

esafak3mo ago

To flesh this out, let me see the volume of calls and data from one place to another. Help diagnose back-pressure, drops, rejections, and any other irregularities.

gowld3mo ago

Where's the visualizer the blog post talks about?

How is it different from regular code browser/indexers?

indiestack3mo ago

Visualizers are nice for the "big picture" but they rarely help you understand why the code works the way it does. The why is in the git history and the closed issues, not in a dependency graph.

hxugufjfjf3mo ago

Cool project! Would you be willing to share the source code?

jnpnj3mo ago

This is the first thing that I used LLMs on. Not code generation, but parser and tooling to gain understanding. Also saves resources in the long run.

lysace3mo ago

cyrusradfar3mo ago

Warning Blatant Self Promotion

Just a thought.

TonyStr3mo ago

You are so lucky to have git history and issues to work from!

j / k navigate · click thread line to collapse