I noticed in the GitHub that they mention it is only around 60% reliable even on their own training data, but the image shown on the front page feels pretty misleading. I made 10 images that were very similar in complexity to the examples shown, and even after running it around 50 times on each image, not a single one worked correctly. In the rare cases where it produced something, the output was completely wrong.
This seems pretty misleading in its current state and definitely needs more work.
In the limitation section they are quite direct with it: "images used .. are mostly isometric and noise-free CAD-images" and "limited CAD vocabulary used in this study needs to be extended by including more sophisticated CAD tokens such as revolve operation, edge operation (e.g., fillets/chamfers)". It currently supports only series of pads (can be subtractive). Many simple beginner exercises of seemingly similar complexity don't fit those constraints. Some of the parts shown could be more naturally created using wider set of tools, but technically can be created using only pads.
So even if you tried with clean screenshot of simple 3d model, it will likely fail if camera settings aren't right, and it will fail if model can't be represented using series of pad operations. Anything containing spheres, cones, nearly all lathe parts, fillets which can't be included in 2d sketches will fail. In theory arbitrary extrusion angles are supported, but all examples showed only axis aligned parts.
That said I wouldn't be surprised if it failed even if you considered exact limitations not just similar complexity in your attempts.
Assuming images in the googledrive are the training data, "mostly isometric and noise-free CAD images" is a bit of understatement. All of them are at very specific angles using and single style. Specific solid gray infill color for "images", and white infill with weird shading around perimeter for "sketches". Both with white background and pixelated non antialised 1px lines. No reason to expect it capable of processing anything which doesn't have exactly same visual style. For practical product that would have to be solved but that wasn't really the point of this research. All the style transference papers have show that it's more or less solvable problem, but for a paper exploring what model architecture and 3d representation could work best for AI cad it seems like unnecessary distraction that would only bloat the training costs and time. Most annoying part is that it makes hard to test the model with your own inputs.
So I would really appreciate a good AI/LLM tool that I can feed my sketch and parameters and it can save me hours of searching web and watching tutorials on how to extrude a circle over a curve
BTW, any existing AI tools work really well with OpenSCAD, so if you want a parametrized model that can be made out of simple shapes, I highly recommend this flow
There are mechanical engineers out there who can literally model objects nearly as fast and they can 'think' about the layout of said object. If you look at the complexity of, say, a CAD model from a real, highly complex aluminum casting section of an automotive subframe, or the living-room-sized cross-fuselage spar forging of a fighter aircraft, with hundreds of ribs and fillets and features- and compare that to the simple model you are trying to make in OpenSCAD, you should quickly realize the parallels in difficulty you are trying to express (similar to the person without knowledge of C++ or Python watching someone be able to build applications by typing code from their fingertips as if they already knew what to type...)
You are struggling for a few reasons- 1., it is a knowledge hurdle of an entire field you are trying to surmount- again, go watch someone actually model a real, complex part and watch the speed, they can do so in a tool like Solidworks, CATIA, NX, etc... at a rate that is far different because they have experience that it can honestly take even good people years to accumulate - and 2. they are using professional tools - you mention OpenSCAD, like it is CAD, but it isn't. It is programmatic mesh generation, and it turns out that programmatically typing out how to generate complex things is much more difficult than a combination of a graphical GUI and graph-based generator that all big CAD programs figured out starting in the 1980s. If those tools you use were really the best way to make complex models 'paramaterized', then why do we design our fighter jets, Formula 1 cars, or Space X rockets in Dassault's CATIA or Siemen's NX ?
You want a LLM to take a sketch to your CAD, but what I'm saying is, there are people out there that can skip the sketch and build the CAD as fast as you can likely hand draw the first sketch, and these are skills you can actually learn, but you may just be using the wrong tools and have not had the practice necessary.
like its purposefully built to be unusable
I think this is possible, but the ‘trick’ would be translating your instructions in English into some kind of language that the CAD software understands.
I’m on a bunch of 3D printing forums, and everyone tries to describe what the finished product would LOOK like. They end up making PICTURES when what they really want is a STL file.
Two dimensions are easier to visualize then three, so let’s put it this way:
If you wanted to turn “English” into “a 2D image that’s dimensionally accurate”, you’d want to translate from “English” to “SVG.”
SVG is dimensionally accurate. JPG isn’t. The file format itself has no concept of “dimensions” only “pixels.”
I haven't tried telling AI to "make a thing," but I'm able to get Co-pilot to refactor code. It's just the geometry that makes my head spin.
It will take much longer than a day for AI to get to this level, so there's not much to lose by just learning how to use the software now :)
From the outside, the hard part of designing a chair is making a blueprint. At least making a blueprint looks hard to people who've never made one. According to outsiders, the next layer of the onion is perhaps inserting reasonable constraint dimensions for similar reasons.
From the inside, as a guy who's recreationally made furniture, the hard part is judgment about joint selection and design, experience with wood warping (all wood changes shape with the seasons, a good woodworker makes it look easy to work around and a bad one makes expensive firewood that rapidly falls apart). Another insider PoV is judgment about wood selection to get the correct balance of final finish durability and appearance. Finally working toward outer layers of the onion, its time to do parametric joint design decisions... What's the ideal number and size of dovetail joints for, perhaps, a drawer.
I've seen prints of chairs before I don't need a LLM to make one similar to the ones I've seen before and could probably make from memory (at least ones I built myself), the library has loanable books and woodworking magazines. I do see the attraction from the outside.
Consider something like a Windsor chair. The larger the wedge in the spindles the tighter and longer lasting the chair until you break something trying to force them in; there's a lot of judgment and experience in designing, selecting, and installing spindles, but none of it is written down so it'll be hard to train a LLM... Tighten it until it breaks then don't tighten it that much next time. Most super detailed plans for Windsors are for inferior machine produced replicas which are not necessarily useful for a fine woodworker and are not exactly what craftsmen would aspire to. People who want "a cheap chair" will buy a 4-pak of folding chairs from walmart anyway, not make a homemade Windsor-style chair.
Another somewhat more blunt example is for actual woodworkers the "problem" with hand cut dovetails isn't knowing what they look like or how to make a diagram of one, but gaining the experience behind a hammer and chisel to push your luck while cutting them as far as possible without going too far and turning the part into scrap. One unavoidable part of woodworking is I've turned quite a bit of wood into scrap on the last step; oh well make another. At least I can burn scrap wood to keep warm LOL.
Its kind of like from outside the programming fraternity the non-programmers think the only skills required to program are typing real fast and being very experienced at fizzbuzz during interviews. But that doesn't work IRL, from an inside-out perspective.....
The woodworking world is not exactly lacking for a library of "semi-decent" plans. An automated system to make enormous quantities of low quality unverified and untested plans would not really help the field, no.
So why $work-1 spent so much time on this was quite logical. When you have point clouds generated from crappy head mounted cameras, you get models that are very complex.
for example, if you look at a point cloud of an Ikea LACK (https://www.ikea.com/gb/en/p/lack-nest-of-tables-set-of-2-wh...) It will be massively complex. this means that when you want to perform nay kind of interaction with it, its computationally difficult (https://www.researchgate.net/publication/221064696/figure/fi...)
So an active area of research is point cloud to "CAD" model (ie simplyfied, where a LACK tabl would be ~40 triangles rather than 400k)
One of those ways is to say "oh this pointcloud looks like a table, lets generate a bunch of hypothesis tables and see if they fit." One way to do that is to have a model that understands parametric CAD, and can create a number of tables with parameters that can be adjusted until it fits.
A perhaps easier way is to take a point cloud, get an image model trained on CAD models to draw models, in 2d imagery, then use something like this to get an actual model out.
Its not efficient, but it might work.
There are also lots of other cases, like automatic plagiarism, which are less good.
Basically leverage the randomness to create many variations, then select the most accurate variation automatically.
Terribly wasteful of time and processing power, but so is using GPU time to make pretty pictures randomly.
It's analogous to "all squares are rectangles, but not all rectangles are squares" (squares=CSG, rectangles=BREP)
CSG by itself isn't suitable for most CAD use-cases.
What is your workflow for llm integration to openscad?
Ironically the former is engineered to avoid the latter.
Every time I see one of these things,its like whoever worked on it doesn't know how to use CAD or understand what CAD is used for.
Every 6 months, I reevaluate how well LLMs can model from scratch or can modify existing files.
It's very superficial (atleast the last time I tried it). I'm guessing if/when LLMs crack visual reasoning, they might be able to do it.
I also wrote a bit about what goes into CAD apps! https://campedersen.com/tessellation
Which CAD program? I'm confused
Am I reading this right?
>Most importantly, GenCAD does not merely generate a 3D solid but also the entire CAD program.
Doesn't matter. CAD models/objects are represented by a sequence of operations on a primitive or sketch. Unlike meshes, that describe the manifested resulting shape of objects in 3D programs like Blender.
So it's about the fact, that their model outputs that hierarchy of operations. The history of development, not just the result.
That seems difficult enough that I have not found an open source program to load a 3D model and allow me to set the toolpaths in a UI, never mind have an LLM generate them from the model.
Actually the drawing and modeling are very much the hard part, so much so that the open source geometric kernels are decades behind the commercial ones. The computational geometry is genuinely a hard problem due to floating point errors and degenerate cases like parallel surfaces and tangent lines.
Once you have the geometric kernels, CAM is little more than a physically aware pathfinding optimization problem. Computationally expensive but otherwise straight forward. The kernel, on the other hand, has to be built up experimentally, tracking down every place where the math breaks down or there’s a pathological case, until you’ve got the thousands of special cases worked out.
https://arxiv.org/abs/2603.04337 https://arxiv.org/abs/2603.05607 https://arxiv.org/abs/2605.01171
For a more detailed review: https://github.com/lichengzhanguom/LLMs-CAD-Survey-Taxonomy
"These fonts are licensed under the Open Font License. You can use them in your products & projects – print or digital, commercial or otherwise."
Then then have a trained llm that has can generate kcl to either create new parts or act as a llm assistant for changes to existing parts.
It’s neat that llms can do 3-D but I wonder how much of the problem is integration.
I don't mean to come across as personally critical. From your comments it sounds possible (I am not sure, of course) that you have been having some distressing experiences. If so, we hope things improve. But please don't post these comments to HN - they aren't on topic here, and they're not an effective way to address the situation.
Re your other comment: I am sure it is a serious issue, although I don't understand it. It's just not an issue that can be solved through internet forum threads like HN.