Codegen was engineered backwards from real-world, large-scale codebase analysis + refactors we performed on multi-million-line enterprise codebases. It provides a scriptable interface to a powerful, multi-lingual language server built on Tree-sitter.
We realized that many code transformation tasks that impact large teams - refactors, enforcing patterns, analyzing control flow - are fundamentally programmatic operations. Yet existing tools like LibCST and Jscodeshift often require you to think in terms of ASTs and parser internals rather than the high-level changes you want to make.
Therefore, we built Codegen to match how developers actually think about code changes:
# Move a symbol to a new file
# Handles imports, references, dependencies
function.move_to_file("new_file.py")
# Rename across the codebase
class_def.rename("NewName") # Updates all usages, preserves formatting
# Analyze call patterns
for usage in function.usages:
print(f"Used in {usage.file.name}")
Codegen handles the edge cases automatically - updating imports, preserving dependencies, maintaining references, and resolving naming conflicts. You focus on intent, we handle the details.Under the hood, Codegen performs static analysis to build a rich graph representation of your code. This enables:
- Versatile and comprehensive operations
- Built-in visualization capabilities
- Blazing fast execution of large-scale refactors
We've seen a wide variety of advanced code manipulation programs emerge, including:
- Mining codebases for LLM pre-training data
- Analyzing security vulnerabilities
- Large-scale API migrations
- Enforcing code patterns
We're excited to share this with the community and look forward to your feedback. Give it a spin and let us know what you think!
uv tool install codegen
codegen notebook --demo
Docs: https://docs.codegen.com
GitHub: https://github.com/codegen-sh/codegen-sdk
Community: https://community.codegen.comLet us know if you have any questions or interesting use cases you'd like to explore.