PID Controller for controlling the number of servers in a data-center (opens in new tab)

(gist.github.com)

83 pointsse4u4y ago78 comments

78 comments

Why not something more intelligent than PID?

Control theory knows a lot more algorithms. PID is arguably simple to implement but is not particularly good algorithm.

It kinda seems to me as if everybody red only the first page on control theory and decided they don't need to read further and base their solution on it.

PID will basically have you experience either large overshoots (which you will experience as overcorrecting to changes in demand) or slow adaptation to changes.

There is also possibility that your system changes and your PID parameters will cause the whole controller to misbehave.

I have implemented a controller for espresso machine boiler water temperature. Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.

jameshart4y ago

PID is a damn sight more sophisticated than most datacenter dynamic capacity control algorithms - most autoscalers barely even qualify as ‘bang bang’ controllers - they detect a need for more capacity, and add nodes at some artificially constrained rate until capacity is reached or they hit a max cluster size limit. Even rudimentary control theory is an improvement.

Of course the problem with applying PID to server capacity is that compute resources come in discrete chunks that are slow to bring online (‘computers’) rather than being a continually variable resource.

whatshisface4y ago

If you have enough of anything it starts acting continuous.

jcalvinowens4y ago

> Most autoscalers barely even qualify as ‘bang bang’ controllers - they detect a need for more capacity, and add nodes at some artificially constrained rate until capacity is reached or they hit a max cluster size limit. Even rudimentary control theory is an improvement.

Why do you think PID is an improvement here? IMHO the bang bang approach is preferable because it behaves extremely predictably. Operationally, knowing how the system will react to extreme conditions might be more valuable than being a little more optimal.

4 more replies

nine_k4y ago

I suspect that datacenters do not implement PID not because nobody there is aware of control theory. I suspect they tried and decided against it.

1 more reply

lmilcin4y ago

I guess progress must be made in small steps... sigh...

wenc4y ago

Just curious, how did you replace PID with MHE? PID is a control algorithm, while MHE is a state estimation algorithm -- MHE is not a control algorithm and does not perform control.

Did you mean MPC by any chance? (which MHE is often used in tandem with)

If so, MPC is indeed a superior algorithm, but it also requires a dynamic model (LTI, or state space). Such a model may or may not be easy to identify -- it would require the characterization of the dynamics of data center operations.

PID on the other hand, while less optimal, is "model-free" (technically it has a model, i.e. its tunings can be thought to derive from IMC or direct synthesis, even though in practice hand tuning is common) in that it can respond to a wide variety of circumstances without knowing much a priori about the underlying dynamics. PID tunings are also amenable to optimization (products like Loop Pro are used in industry)

Due to their simplicity, PIDs are capable of operating at much higher frequencies than more complex algorithms like MPC. PIDs operate in the order of miliseconds or faster, while MPC operates in the order of seconds to minutes because it has to solve an optimization problem at every iteration, which is too slow for fast loops. In the hierarchy of control, there's supervisory layers on top (RTOs), then MPCs then PIDs. It's usually not either-or, but all together, working at different layers.

Even in industries where MPC is dominant, PID control is still ubiquitous and used alongside it, especially for local regulatory control loops. I don't have enough insight into data center ops to know if PID control is good enough but in my experience PID can be good enough for many applications -- most control loops in the world are essentially still PID. Not because more advanced algorithms don't exist, but because PID has the advantage of being just good enough for most purposes. (cost is also an issue: PIDs are cheap, while licensing costs for industrial MPC software range from 10s to 100k$)

lmilcin4y ago

In my case I built a model of espresso machine consisting of various thermal masses with different properties and impedancies between them. (Boiler heater element, water in boiler, scale on the boiler walls, boiler walls, grouphead, water in pipe between boiler and grouphead, water in pipe and pump between reservoir and the boiler, water in reservoir tank).

This model can estimate future temperature of brew water based on current and past temperature measurements in various points in the system as well as on amount of water being pumped through the system.

There are four temperature measurements being made:

- ambient temperature

- water reservoir temperature (the thermometer touches the reservouir), though I am pretty sure this one could be estimated from past operation of the boiler.

- boiler temperature (the thermometer glued to the outside of the boiler at a selected point below water line)

- group head temperature (the thermometer glued to the outside of the group head at selected point). This one could also potentially be estimated from past operation but I tried it and it complicates my model too much.

In particular the model is designed to be able to calculate a single parameter, what is going to be the water temperature if used for brewing coffee, at a point in future, if heating element keeps adding given amount of energy and pump pumps given amount of water.

The way the model is being used depends on what the machine is waiting for. For example, when you want to brew coffee, it delays start of brewing until it can achieve stable temperature.

To achieve this as fast as possible (ie power on from cold to stable temperature) the machine is heating water at maximum power and the model is being executed 50 times a second to estimate what will be maximum temperature attained by brew water if we shut the power now. The idea here is we want to run the heater at max power for as long as possible and shut it off at exact moment so that the heat that spreads will cause the brew water to achieve the exact desired temperature.

Mind that brew water temperature is not measured directly. I can only measure it experimentally with a modified portafilter.

The parameters of the system are being observed and I use other filters to correct the parameters as system changes. For example, the system can detect amount of boiler scale developing rather precisely, mainly due to how it affects impedance between water in the boiler and the temperature sensor.

---

Now, I am pretty sure it is overkill for an espresso machine. I am doing this to teach myself some control theory. But the effects are real and the algorithm works like magic -- the machine starts and achieves optimal temperature in shortest possible time and then keeps it stable with no over or undershoots that could affect the brew.

In todays world the chips that can run this model are being sold for pennies and the only real complication is rather precise, noise free temperature measurement that you need for this plus measurement of amount of water being pumped.

2 more replies

plaguuuuuu4y ago

I had no idea this stuff existed. This sounds pretty exciting actually.

Is it really that hard/complex to calculate MPC that it can't be stuck inside a loop that runs at 1hz or less? Talking on decent generic or dedicated hardware. Some tiny SOC is a different story.

2 more replies

icegreentea24y ago

I think a better set of arguments against the PID as implemented is that is that does not appropriately take into account the actual penalties of the system. As written, the controller will treat errors symmetrically - the it takes the same effort to correct for over and under provisioning.

In reality, we know that this is not realistic. Over provisioning results in an immediate financial cost (that can be easily modeled in $$$), but under provisioning results in a far more complex penalty. I think it'd be important to understand these costs (along with the general shape of your traffic) before implementing a control system.

Furthermore, it's very likely that you'll want to implement a deadzone, and almost certainly you'll want to implement a low pass filter, especially if you're sampling processing time significantly faster than ~30 seconds (the estimated startup time). Oh, and the usual things like anti-integral windup, and hard limits so you don't bankrupt yourself.

se4uOP4y ago

Hi OP here. Yes you are right about the asymmetrical nature of the penalties. Actually I do handle this case by using an asymmetric `shrinkage` on the error. See the function called shrinkage in the notebook.

Good day :)

chillingeffect4y ago

But that's trivial to correct w an assyemtrical error function.

1 more reply

justapassenger4y ago

I've used PID at some very large systems. Main reason is simplicity - dynamic control is introducing huge amount of chaos to your system. Having simple algo like PID (that you need to tune carefully, and retune after each big change, true) has a big benefit - you can reason easily about behavior of the system. And for big systems, that's _extremely_ valuable property, that's often underestimated.

dahart4y ago

> Why not something more intelligent than PID?

I'm curious if there exists something intelligent enough to be able to rely on it for a large-scale deployment without human oversight? In general, I would think dead simple and manually controllable is a feature in the context of expenditure.

> Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.

I'm familiar with PID controllers, but haven't used MHE before. The formulas look really similar at first glance to me. Three terms and three weights? Is a slow PID controller due to poor tuning, or is MHE intrinsically better? What is the reason that MHE would adapt more quickly with less overshoot than PID? MHE appears to be three integrals instead of one, but I don't see immediately why that would be better, is there an intuitive and/or fundamental reason?

viraptor4y ago

> I'm curious if there exists something intelligent enough to be able to rely on it for a large-scale deployment without human oversight?

I think without a context that's a no. Some systems will want good throughput, some want low latency and require pre-scaling on some cues (time of day, day of week), some want minimal cost but do want to allow bigger bursts for a max of N minutes, etc.

Any intelligent scaling without human oversight has a good chance of either burning your money or not optimising for what you care about.

whatever14y ago

Yes, Chemical and Oil manufacturing is running on fully automated economic model predictive control for literally decades.

haolez4y ago

PID is often good enough and more robust than ad hoc algorithms to do the same.

lmilcin4y ago

PID is poor algorithm in this case because there is relatively large delay between signal to spin up a server and observed effect of it. PID requires a lot of iterations to stabilize, which multiplied by delay will require a lot of time.

A model predictive controller will need much less iterations because it would actually try to predict number of servers necessary based on some kind of model of the server farm.

Parameters for that model can even be learned/adjusted over time, automatically.

2 more replies

srean4y ago

PID roughly occupies the same space that logistic regression does in ML. Sure one can use more complicated DNNs when the task really needs it but for many many classification or conditional probability estimation problems a regularized LR goes a long way and its hard to beat it on the axis of simplicity. Of course your resume wont look as shiny.

jzelinskie4y ago

I'll bite. What espresso machine? Temperature control and temperature profiling is actually quite bad on most machines. It's actually mostly that people don't even have a good way to measure temperature at the grouphead to really know if their coffee is improving or if they've just improved their temps at the boiler.

lmilcin4y ago

I built this for Rancilio Silvia (see explanation of the model in another answer in this thread).

I chose Silvia mainly because it doesn't have its own electronics, it is all 230V AC wiring, thermostats, switches, etc.

I made myself a precise thermometer with PT 1000 probe inside coffee puck and I learned the temperature can vary as much as 10-15 degrees.

I first used PID but was not satisfied with long settling time, because when water is right temperature at the boiler it still needs to pass through huge hunk of metal that will determine to large extend the end temperature of brew water.

So I built a bit more complex algorithm with the aim of first getting the water a little hotter in the boiler (while grouphead is still cold) and then slowly adjusting setting for the boiler as the grouphead heats up.

But there were still problems, for example pumping cold water into the boiler threw everything into chaos. Then the problem that you get temperature readouts with delays and offsets.

That's when I decided to just build a more complete model of the system that does not only take current but also past states of the system into account.

viertaxa4y ago

Can anyone recommend a “Control theory for the layman” type textbook? Specifically I’m interested in finding something that gives good overviews and examples, and while I’m not afraid of math, I’m not looking for something overly academic or advanced.

deepspace4y ago

The gold standard introductory book on control theory is "Feedback Control of Dynamic Systems" by Franklin, Powell and Emami-Naeini. While math is absolutely unavoidable when studying the subject, FCDS starts very gently and is quite accessible to the layman.

1 more reply

se4uOP4y ago

Hi OP here.

To answer why PID, basically somebody I follow asked this question on twitter, and I thought yeah why not seems like a reasonable thing to try :)

Actually I do conclude that PID is not quite the right thing for this problem. For me the learnings from making a PID sort of work for this problem were:

1. must use the right error function. like frequency not time. 2. must use shrinkage on the error to handle discrete number of server. 3. have to run controller at a multiple of server delay to avoid perturbations.

Also I discuss the basic assumptions that a PID controller makes that are suboptimal in the video.

chillingeffect4y ago

My problem with PID is people don't wrap it around a model and use it to correct the errors in the model. Often their tuning ends up "carpet-bumping" between multiple critical points in the system. But as others have mentioned, it gets you ok, predictable performance.

mytailorisrich4y ago

> Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.

PID are simple to implement but tricky to tune.

In your case you could probably have solved most issues by improving tuning.

carlosf4y ago

Not sure if that's the case with others, but I have a really hard time connecting CT with actual applications. My classes were full of complex maths and toy examples, but very weak in actual engineering and heuristics.

Glawen4y ago

Well, you need control theory whenever you need to have something kept at a certain level. House temperature, cruise control speed, etc..

keithnz4y ago

as soon as I saw the title I thought, "huh, that really doesn't sound like a good idea". It would likely be over sensitive or under sensitive and likely require lots of continual tweaking with the tuning. Not to mention the discrete step wouldn't be smoothed out till you have quite a lot of server resources in play.

Sr_developer4y ago

Have you practiced control theory in an actual process plant? PID is not simple to tune it at all, well PI, nobody uses the derivative part. Countless studies published in the industry show that up to 30% of the PID control loops in operation are set to MANUAL by the operators and from the rest more than half have wrong parameters.Even then +90% of the controllers are PID, because despite all their problems they work better than the more sophisticated alternatives, they are more reliable, maintainable and cheaper to license.

> PID will basically have you experience either large overshoots (which you will experience as overcorrecting to changes in demand) or slow adaptation to changes.

Such a blanket statement is meaningless without a description of the system you are controlling, those kind of overshoot can be attributed to wrong parameters, to badly sized elements of control or even to bad measuring devices, the PID algorithm has zero to do with those cases.

> I have implemented a controller for espresso machine boiler water temperature. Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.

Did you try a "bang-bang" controller,I would no be surprised if you'd get the same results with 1% of the complexity.

lmilcin4y ago

The problem with espresso machine startup is that there is 1.5kg piece of metal (grouphead) through which flows around 60ml of water from the boiler.

What do you think is the relation between boiler water temperature and the water that actually reaches coffee?

When the machine starts, the grouphead is cold. If you want to get good brewing temperature you need to, either

a) Wait for 40 minutes until grouphead heats up slowly and everything stabilizes

b) Heat up water for 10 minutes and then pump a lot of water through the grouphead to get it hot from the water and pray everything works well

c) Build a model that will predict correct setting of water in the boiler given predicted temperature of the grouphead so that when you push water through the grouphead it cools just the right amount. Unfortunately, the correct brew temperature is 92C so you only have couple of degrees to work with. But, still, heating the water to higher temperature causes grouphead to heat up faster and you don't need to heat it as much because it will receive hotter water.

Commercial machines do not have this problem because they are started in the morning and only turned off for the night and they have huge tanks of brewing water in them so inflowing water does not affect the temperature so much.

chrisbolt4y ago

Useful if you’ve never heard of PID controller: https://en.wikipedia.org/wiki/PID_controller

yjftsjthsd-h4y ago

Thanks; I hadn't heard of it before

naoru4y ago

Do you work in a datacenter by any chance?

1 more reply

magicalhippo4y ago

Not sure how relevant, but reminded me of this thesis[1] which is based on resource closure operators[2]. The thesis applies the model to that of CPU frequency scaling, but I guess a model could be made for something like scaling number of compute nodes.

From the abstract of [2]:

We evaluate a specific design for a resource closure operator by simulation and demonstrate that the operator achieves a near-optimal balance between cost and value without using any model of the relationship between resources and behavior. Instead, the resource operator relies upon its control of the resource to perform experiments and react to their results. These experiments allow the operator to be highly adaptive to change and unexpected contingencies.

Not my field so not sure if anything significant has been done using this in the past 10 years, or if it fizzled out.

[1]: https://www.duo.uio.no/handle/10852/8753

[2]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.304....

juangburgos4y ago

Useful if you need to adjust the PID gains https://pidtuner.com

jcalvinowens4y ago

Why? A PID controller is always a kludge, here extremely so. Something ad hoc could easily be both more optimal and mathematically simpler to analyze and test.

srean4y ago

> Something ad hoc could easily be both optimal and mathematically simpler to analyze and test.

Could you give an example ? I don't think PIDs are chosen for their optimality properties.

jcalvinowens4y ago

More optimal, you deleted a word. Not optimal, but closer to optimal. For the problem we're looking at, the bang bang controller is a far better choice IMHO.

I'd like to see an example of when you think PID is an ideal choice? I've never found a real usecase. Whenever I've hacked one into anything, I've quickly replaced it with something simpler and better (thermostats being the most obvious example).

3 more replies

j / k navigate · click thread line to collapse

78 comments

lmilcin4y ago

Why not something more intelligent than PID?

Control theory knows a lot more algorithms. PID is arguably simple to implement but is not particularly good algorithm.

It kinda seems to me as if everybody red only the first page on control theory and decided they don't need to read further and base their solution on it.

PID will basically have you experience either large overshoots (which you will experience as overcorrecting to changes in demand) or slow adaptation to changes.

There is also possibility that your system changes and your PID parameters will cause the whole controller to misbehave.

jameshart4y ago

whatshisface4y ago

If you have enough of anything it starts acting continuous.

jcalvinowens4y ago

4 more replies

nine_k4y ago

I suspect that datacenters do not implement PID not because nobody there is aware of control theory. I suspect they tried and decided against it.

1 more reply

lmilcin4y ago

I guess progress must be made in small steps... sigh...

wenc4y ago

Just curious, how did you replace PID with MHE? PID is a control algorithm, while MHE is a state estimation algorithm -- MHE is not a control algorithm and does not perform control.

Did you mean MPC by any chance? (which MHE is often used in tandem with)

lmilcin4y ago

There are four temperature measurements being made:

- ambient temperature

- water reservoir temperature (the thermometer touches the reservouir), though I am pretty sure this one could be estimated from past operation of the boiler.

- boiler temperature (the thermometer glued to the outside of the boiler at a selected point below water line)

The way the model is being used depends on what the machine is waiting for. For example, when you want to brew coffee, it delays start of brewing until it can achieve stable temperature.

Mind that brew water temperature is not measured directly. I can only measure it experimentally with a modified portafilter.

---

2 more replies

plaguuuuuu4y ago

I had no idea this stuff existed. This sounds pretty exciting actually.

Is it really that hard/complex to calculate MPC that it can't be stuck inside a loop that runs at 1hz or less? Talking on decent generic or dedicated hardware. Some tiny SOC is a different story.

2 more replies

icegreentea24y ago

se4uOP4y ago

Good day :)

chillingeffect4y ago

But that's trivial to correct w an assyemtrical error function.

1 more reply

justapassenger4y ago

dahart4y ago

> Why not something more intelligent than PID?

> Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.

viraptor4y ago

> I'm curious if there exists something intelligent enough to be able to rely on it for a large-scale deployment without human oversight?

Any intelligent scaling without human oversight has a good chance of either burning your money or not optimising for what you care about.

whatever14y ago

Yes, Chemical and Oil manufacturing is running on fully automated economic model predictive control for literally decades.

haolez4y ago

PID is often good enough and more robust than ad hoc algorithms to do the same.

lmilcin4y ago

A model predictive controller will need much less iterations because it would actually try to predict number of servers necessary based on some kind of model of the server farm.

Parameters for that model can even be learned/adjusted over time, automatically.

2 more replies

srean4y ago

jzelinskie4y ago

lmilcin4y ago

I built this for Rancilio Silvia (see explanation of the model in another answer in this thread).

I chose Silvia mainly because it doesn't have its own electronics, it is all 230V AC wiring, thermostats, switches, etc.

I made myself a precise thermometer with PT 1000 probe inside coffee puck and I learned the temperature can vary as much as 10-15 degrees.

But there were still problems, for example pumping cold water into the boiler threw everything into chaos. Then the problem that you get temperature readouts with delays and offsets.

That's when I decided to just build a more complete model of the system that does not only take current but also past states of the system into account.

viertaxa4y ago

deepspace4y ago

1 more reply

se4uOP4y ago

Hi OP here.

To answer why PID, basically somebody I follow asked this question on twitter, and I thought yeah why not seems like a reasonable thing to try :)

Actually I do conclude that PID is not quite the right thing for this problem. For me the learnings from making a PID sort of work for this problem were:

Also I discuss the basic assumptions that a PID controller makes that are suboptimal in the video.

chillingeffect4y ago

mytailorisrich4y ago

> Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.

PID are simple to implement but tricky to tune.

In your case you could probably have solved most issues by improving tuning.

carlosf4y ago

Glawen4y ago

Well, you need control theory whenever you need to have something kept at a certain level. House temperature, cruise control speed, etc..

keithnz4y ago

Sr_developer4y ago

> PID will basically have you experience either large overshoots (which you will experience as overcorrecting to changes in demand) or slow adaptation to changes.

Did you try a "bang-bang" controller,I would no be surprised if you'd get the same results with 1% of the complexity.

lmilcin4y ago

The problem with espresso machine startup is that there is 1.5kg piece of metal (grouphead) through which flows around 60ml of water from the boiler.

What do you think is the relation between boiler water temperature and the water that actually reaches coffee?

When the machine starts, the grouphead is cold. If you want to get good brewing temperature you need to, either

a) Wait for 40 minutes until grouphead heats up slowly and everything stabilizes

b) Heat up water for 10 minutes and then pump a lot of water through the grouphead to get it hot from the water and pray everything works well

chrisbolt4y ago

Useful if you’ve never heard of PID controller: https://en.wikipedia.org/wiki/PID_controller

yjftsjthsd-h4y ago

Thanks; I hadn't heard of it before

naoru4y ago

Do you work in a datacenter by any chance?

1 more reply

magicalhippo4y ago

From the abstract of [2]:

Not my field so not sure if anything significant has been done using this in the past 10 years, or if it fizzled out.

[1]: https://www.duo.uio.no/handle/10852/8753

[2]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.304....

juangburgos4y ago

Useful if you need to adjust the PID gains https://pidtuner.com

jcalvinowens4y ago

Why? A PID controller is always a kludge, here extremely so. Something ad hoc could easily be both more optimal and mathematically simpler to analyze and test.

srean4y ago

> Something ad hoc could easily be both optimal and mathematically simpler to analyze and test.

Could you give an example ? I don't think PIDs are chosen for their optimality properties.

jcalvinowens4y ago

More optimal, you deleted a word. Not optimal, but closer to optimal. For the problem we're looking at, the bang bang controller is a far better choice IMHO.

3 more replies

j / k navigate · click thread line to collapse