I am going to put model related code we use in a public repo soon (it is very similar to
https://github.com/liuliu/swift-diffusion but in NHWC format). ANE will be around 25s if it runs. DT's default only uses GPUs and 35s is on GPU (yes, like you said, upscaling would take extra 10s).