You would open source the procedure and reference where the data came from. If there is any non-open source content used in training, then the project couldn’t qualify as “open source”.
But this thread is about misuse of the term as applied to the weights package. Those of us who know what open source means should not continue to dilute the term by calling these LLMs by that term.