That's fine. Dojo doesn't make this decision for you. Instead it gives you a full range of tools to choose for yourself how you want this to work.
Some UI widgets (Dijits) choose to use a client-side HTML fragment, but others do not have that requirement. Some of the work on projects like xstyle and put-selector seek to reduce this reliance further.
Dijits can be invoked either through extra markup in your base HTML page, or through a JS constructor. In the HTML templated widgets, there are attributes that bind DOM nodes to reference variables in your JS, so the widget knows where to insert content, or subwidgets, etc., and where to attach DOM events. The HTML isn't stored in JS, it's stored in an HTML file that gets combined through build tools (so you don't need to do this mix in process, it just works).
But again, that's an optional feature that's used primarily by the widget system and some of the widgets in Dojo.