Indeed, TensorFlow likes pushing everything to the GPU by default whereas many PyTorch DL implementations encourage feeding data from the CPU to the GPU as needed with a DataLoader.
There have been attempts at a PyTorch port of Parametric UMAP (https://github.com/lmcinnes/umap/issues/580) but nothing as good.