What I'd do is extract all the deepest-level content by conventional means (XSL or a standalone parser) and pass those into an LLM one by one. It would be easier for an LLM to tell if a given string is an ISO-formatted date, for instance, than attempting to identify the entire schema at once. You might not even need the LLM if you use type inference libraries and the schema isn't too exotic.
Having used the results so far to annotate the original elements and attributes with their types, you could then pass a generated, simplified XML document into the LLM. So where the original document has real data, you can start replacing it with simple data that conforms to the same structure and data type. If the LLM is still confused, try giving it just the structure which you've identified with no actual data within the elements and attributes, only type annotations.
TL;DR: a depth-first approach and then building up from there will work better than giving everything to an LLM all at once. They are only clever thematic Markov chains after all.