This problem is unfortunately quite common even in academic papers using this dataset, even though the authors warn about it.
EDIT: There is one more issue with Urbansound8k folds, and that is that the difficulty of the various folds is quite different. So one should ideally report the performance across all folds (mean/std or boxplot). But this is a minor issue compared to data leakage.
PS: Nice use of Comet.ml platform this, collaborating online on improving the experimental setup :)