Kats: One stop shop for time series analysis in Python (opens in new tab)

(facebookresearch.github.io)

222 pointsbabak_ap4y ago29 comments

29 comments

hrzn4y ago

For those interested in time series library, we are developing Darts [1], which focuses on making it easy & straightforward to build and use forecasting models. Out of the box it contains traditional models (such as ARIMA) as well as recent deep learning ones (like N-Beats). It also allows to easily train models on multiple time series (potentially scaling to large datasets), as well as on multivariate series (i.e., series made of multiple dimensions). It will soon support probabilistic forecasts as well.

[1] https://github.com/unit8co/darts/

chollida14y ago

How well does it deal with time series sets that don't fit fully in memory?

Or put another way , how well does it scale horizontally to multiple machines.

We fine most time series libraries to be about the same in terms of features and speed, but very few can handle large datasets well, if at all.

And, of course, thanks for sharing your library, I'll definitely try it out!!

hrzn4y ago

This is supported but only by neural-nets models, which are fit using SGD, hence naturally not requiring the whole dataset in memory. Other models like ARIMA do need the full series loaded in memory.

The models that work on multiple time series in Darts accept Sequence[TimeSeries] for their fit() method. These sequences can either be Lists (fully in memory, simplest option), or when needed it can be a custom Sequence which for example does lazy loading from disk (somewhat similar to what PyTorch Datasets are doing) with the __getitem__() method.

If you need even more control, for instance because you have only one very long series that doesn't fit in memory then you can implement your own Darts "TrainingDataset". In this case you can control how to slice your series exactly.

Edit: I realised this only answers the first sentence of your comment ;) For now there's no mechanism for scaling to multiple machines beyond what PyTorch is already offering. AFAIK it's reasonably easy to scale to multiple GPUs on a machine, but I'm not sure how it would scale on several machines. We never had to try this yet! (Note that actually a single CPU can handle training deep nets models on 10's of thousands of time series similar to the M4 competition in a fairly reasonable time).

streamofdigits4y ago

is the focus on wrapping existing algorithms (like statsmodels) or are you developing at that level as well?

hrzn4y ago

Both - some models are wrapped (like ARIMA & ETS around statsmodels, Prophet around fbprophet) and we write others ourselves (RNNs, TCNs, N-Beats, ...). Basically we take a pragmatic approach here, we do whatever is best to use a given model in Darts.

acidbaseextract4y ago

The time series feature (TSFeature) extraction module in Kats can produce 65 features with clear statistical definitions, which can be incorporated in most machine learning (ML) models...

I'd be curious about the performance of these. A time series featurization library I've liked the look of but haven't used for real is catch22: https://github.com/chlubba/catch22

In particular I like catch22's methodology:

catch22 is a collection of 22 time-series [that are] are a high-performing subset of the over 7000 features in hctsa. Features were selected based on their classification performance across a collection of 93 real-world time-series classification problems...

StreakyCobra4y ago

There is also "tsfresh" [1] in the same domain that does «Automatic extraction of 100s of features». It filters the most useful features according to the given task, I quote: «This filtering procedure evaluates the explaining power and importance of each characteristic for the regression or classification tasks at hand.»

[1]: https://github.com/blue-yonder/tsfresh

benboughton14y ago

What are suggested online courses to learn about multi variable time series forecasting? My skill level is - ok with university level Biometrics but that was 10+ years ago, and I am web/self-taught python for web apps and automating GIS tasks.

dataviz10004y ago

Good question. I've been working on this too iterating through Youtube and Medium tutorials and working through all the notebooks I can find. The best examples I've found use LSTM for deep learning and vector autoregression (VAR) for classical statistical forecasting.

This series might be useful to you. https://www.youtube.com/watch?v=ZoJ2OctrFLA&list=PLvcbYUQ5t0...

benboughton14y ago

Great, thanks I'll check it out.

Clewza3134y ago

Anybody know where the name comes from? I'm hoping this is a tip of the hat at Zero Wing, otherwise you have no chance to survive make your time.

peterhil4y ago

Kats = Kits to Analyze Time Series: https://github.com/facebookresearch/Kats/blob/master/tutoria...

Ftuuky4y ago

"Kits to Analyze Time Series"

ackbar034y ago

Sorry Clewza313, Ftuuky wins

refactor_master4y ago

What are some ways to deal with large volumes of variable-length timeseries for real-time predictions? The best solutions I've tried myself all hinge on windowed-feature extraction or LSTMs. It generally works, but starts to fall apart when you're squeezed for data.

It seems that almost everywhere you look, every example has just one timeseries that needs to be dealt with. However, since the methods are much more "statistical" in nature, they can actually make meaningful predictions on a single sample.

esquire_9004y ago

I would say manual feature extraction? Your custom extraction could reduce the variable lengths to a uniform dimension (same number of features for every input), which can then be used by almost any algorithm.

These automatic extractions are very statistical in nature indeed, but for some datasets domain insights are more valuable and give more usable features (in my opinion). I found quite some datasets where manual features + gradient boosted trees give better results then automated statistical methods. Often combinations give better results :)

hrzn4y ago

For training forecasting models on multiple time series (and potentially large datasets), you can take a look at Darts [1] and the blog post [2].

[1]: https://github.com/unit8co/darts/

[2]: https://medium.com/unit8-machine-learning-publication/traini...

seertaak4y ago

Maybe lookup panel data and repeated experiments. Those techniques are applied when the data is "tabular"; there are often relatively few observations on any individual time axis, but there are many instances of these experiments. It's a branch of linear forecasting (least squares), but it's tailored for example for biological experiments where you have several sets of results - related but maybe not performed in the same lab - which you want to amalgamate.

joshlk4y ago

Sktime looks like a similar but more fully featured package: https://www.sktime.org/en/latest/

nextaccountic4y ago

Is Granger causality a common method for the kind of time series analysis made by this library?

I worked with this algorithm before so I was curious, but I can't find it in the API.

krit_dms4y ago

Im not sure what you are asking exacylu, but statmodels has vector autoregression included.

profquail4y ago

Kats looks like a useful library, but I’m a bit surprised to see they’re not enabling parallel execution for the numba kernels. Surely FB must have time-series data large-enough they’d see some performance benefits from parallelism in these functions?

disgruntledphd24y ago

Probably not for the team that uses these tools. I'd suspect it's mostly compute, revenue and user count predictions.

blondin4y ago

is it a wrapper around prophet?

ZeroCool2u4y ago

Based on the example, it looks like this is a framework that incorporates Prophet as one way to build time series models and takes things a few steps further.

antman4y ago

And does it also have prophet's constraint that to predict in time t+1 in Y you need input for t+1 for all X which usually you don't have?

wkyle4y ago

Prophet is just a specific forecasting procedure.

monkeybutton4y ago

I hadn't heard of prophet before, thanks!

nxpnsv4y ago

More like profet+ imho

j / k navigate · click thread line to collapse

29 comments

hrzn4y ago

[1] https://github.com/unit8co/darts/

chollida14y ago

How well does it deal with time series sets that don't fit fully in memory?

Or put another way , how well does it scale horizontally to multiple machines.

We fine most time series libraries to be about the same in terms of features and speed, but very few can handle large datasets well, if at all.

And, of course, thanks for sharing your library, I'll definitely try it out!!

hrzn4y ago

This is supported but only by neural-nets models, which are fit using SGD, hence naturally not requiring the whole dataset in memory. Other models like ARIMA do need the full series loaded in memory.

streamofdigits4y ago

is the focus on wrapping existing algorithms (like statsmodels) or are you developing at that level as well?

hrzn4y ago

acidbaseextract4y ago

The time series feature (TSFeature) extraction module in Kats can produce 65 features with clear statistical definitions, which can be incorporated in most machine learning (ML) models...

I'd be curious about the performance of these. A time series featurization library I've liked the look of but haven't used for real is catch22: https://github.com/chlubba/catch22

In particular I like catch22's methodology:

StreakyCobra4y ago

[1]: https://github.com/blue-yonder/tsfresh

benboughton14y ago

dataviz10004y ago

This series might be useful to you. https://www.youtube.com/watch?v=ZoJ2OctrFLA&list=PLvcbYUQ5t0...

benboughton14y ago

Great, thanks I'll check it out.

Clewza3134y ago

Anybody know where the name comes from? I'm hoping this is a tip of the hat at Zero Wing, otherwise you have no chance to survive make your time.

peterhil4y ago

Kats = Kits to Analyze Time Series: https://github.com/facebookresearch/Kats/blob/master/tutoria...

Ftuuky4y ago

"Kits to Analyze Time Series"

ackbar034y ago

Sorry Clewza313, Ftuuky wins

refactor_master4y ago

esquire_9004y ago

hrzn4y ago

For training forecasting models on multiple time series (and potentially large datasets), you can take a look at Darts [1] and the blog post [2].

[1]: https://github.com/unit8co/darts/

[2]: https://medium.com/unit8-machine-learning-publication/traini...

seertaak4y ago

joshlk4y ago

Sktime looks like a similar but more fully featured package: https://www.sktime.org/en/latest/

nextaccountic4y ago

Is Granger causality a common method for the kind of time series analysis made by this library?

I worked with this algorithm before so I was curious, but I can't find it in the API.

krit_dms4y ago

Im not sure what you are asking exacylu, but statmodels has vector autoregression included.

profquail4y ago

disgruntledphd24y ago

Probably not for the team that uses these tools. I'd suspect it's mostly compute, revenue and user count predictions.

blondin4y ago

is it a wrapper around prophet?

ZeroCool2u4y ago

Based on the example, it looks like this is a framework that incorporates Prophet as one way to build time series models and takes things a few steps further.

antman4y ago

And does it also have prophet's constraint that to predict in time t+1 in Y you need input for t+1 for all X which usually you don't have?

wkyle4y ago

Prophet is just a specific forecasting procedure.

monkeybutton4y ago

I hadn't heard of prophet before, thanks!

nxpnsv4y ago

More like profet+ imho

j / k navigate · click thread line to collapse