Not my area either, but I worked on related problems. The impression I got is that it depends on what you are using the model
for, i.e. are you trying to generate forecasts? prediction intervals? detecting anomalies? comparing to human forecasters? All models are wrong, but some are wrong in a useful way.
Also, a big caveat is that the output of those models, even basic ones, are supposed to come with prediction intervals, and not just as points. Libraries like statsmodels are massively guilty of making it hard to use the model like it's supposed to be used.