Guesstimate – A Spreadsheet for things that aren't certain (opens in new tab)

(getguesstimate.com)

529 points666_howitzer7y ago67 comments

67 comments

Cofounder Here: Happy to see this on hnews again.

Update: Matthew (the other cofounder) and I got Guesstimate to a stage we were happy with. After a good amount of work it seemed like several customers were pretty happy with it, but there weren't many obvious ways of making a ton more money on it, and we ran out of many of the most requested/obvious improvements. We're keeping it running, but it's not getting much more active development at this time.

Note that it's all open source, so if you want to host it in-house you're encouraged to do so. I'm also happy to answer questions about it or help with specific modeling concerns.

Right now I'm working at the Future of Humanity Insitute on some other tools I think will compliment Guesstimate well. There was a certain point where it seemed like many of the next main features would make more sense in separate apps. Hopefully, I'll be able to announce one of these soon.

smallnamespace7y ago

Are you able to apply global correlations to all the variates?

One of the triggers for the financial crisis in '08 was that the Monte Carlo pricers assumed the various risks were much less correlated than they actually were.

For example, they largely assumed that it was unlikely for many mortgages or underlying MBS securities to simultaneously default (low correlation). This is how many AAA rated CDO securities ended up trading at 50%+ discounts.

IMHO, any multivariate Monte Carlo analysis that doesn't show your sensitivity to correlation is essentially useless, since your answers may change completely.

In the second example model (https://www.getguesstimate.com/models/316), Fermi estimation for startups, you would expect many of the inputs (deals at Series A, B, C, amount raised per deal) in real life to be highly correlated with each other since they all depend on 'how well is VC in general doing right now?'

The final estimate of 'Capital Lost in Failed Cos from VC' has a range of 22B to 39B, this seems way too low. The amount of VC money lost during a crisis (like in '01) can easily be an order of magnitude more.

ozgooen7y ago

I'd definitely agree that correlations can be a really big deal, especially in very large models like that one.

Guesstimate doesn't currently allow for correlations as you're probably thinking of them. However, if two nodes are both functions of a third base node, then they will both be correlated with each other. You can use this to make somewhat hacky correlations in cases where there isn't a straightforward causal relationship.

Implementing non-causal correlations in an interface like this is definitely a significant challenge. It could introduce essentially another layer to the currently 2-dimensional grid. It's probably the feature I'd most like to add, but the cost was too high so far.

I think Guesstimate is really ideal for smaller models, or for the prototyping of larger models. However, if you are making multi-million dollar decisions with hundreds of variables and correlations, I suggest more heavyweight tools (either enterprise Excel plugins or probabilistic programming).

2 more replies

DINKDINK7y ago

>any multivariate Monte Carlo analysis that doesn't show your sensitivity to correlation is essentially useless

This rational is why this tool shouldn't be used for anything consequential -- like business decisions that can sink you company.

1 more reply

m1017y ago

I would really like to know the answer to the correlations question. It's essential for doing some of the things we do at our company.

Also, are we able to adjust the distribution of each variable?

EDIT: I think you can actually manually create correlations on sheet so it should be fine

1 more reply

andrei_says_7y ago

I’ve used the product to “guesstimate” a few things like quality of life with a higher paying job with longer commute (not worth it!) and starting a business. Love how intuitive and clean the UI is and how it puts probability estimation at my fingertips, in simple, human language.

Thank you!

social_quotient7y ago

Curious what was the result specifics?

winrid7y ago

Can you share what you found?

cheez7y ago

Hey this is super cool, looking forward to your future tools to complement guesstimate.

I'd imagine an Excel plugin to do something similar would be valuable.

gimili7y ago

Great work; this looks awesome. I am wondering what products would look like, if hardware engineers applied this to the modeling of future products. At my startup valispace.com for now we only allow for a simple propagation of worst-case values (gaussian distribution or worst-case stacking), but I think that specially for early design phases it would be of huge help and foresee problems in complex projects early on. Do you know of anyone using guesstimate for hardware engineering purposes?

Breza7y ago

When this was originally on HN, I was impressed and signed up as a paying subscriber. Over time, I noticed development had stopped and the tiny things that I used to work around started to annoy me more. It's still a neat platform but I cancelled my subscription last year.

stanislavb7y ago

Seems like you've put some solid work here. Who would you name your top competitors?

ozgooen7y ago

On the question of "what are other ways of doing MC analysis", there are two approaches.

The first is to use Excel apps like Oracle Crystal Ball or @Risk. These are aimed at business analysists. They're pretty expensive, but also quite powerful.

The other option is to use probabilistic programming languages. Stan and PYMC3 are probably the best now, but hopefully, some others will become much better in the next few years.

That said, this is a pretty small space. The main "business competitor" is probably people just using google sheets or Excel without distributions to make models.

Crystall Ball: https://www.oracle.com/applications/crystalball/ @Risk: https://www.palisade.com/risk/default.asp Stan: https://mc-stan.org/ PYMC3: https://docs.pymc.io

social_quotient7y ago

Maybe kaggle but different?

1 more reply

carlob7y ago

Maybe it's just the short video and the FAQ, but I found it particularly difficult to find information about the distributions involved and how to choose that.

I imagine there a bunch of cases where the defaults would not work like you're trying to do error propagation (all normal distributions) or you're trying to compute interval arithmetic.

Is it the case that if you input a range which span multiple orders of magnitude then you get lognormal rather than normal?

I might not be exactly the target audience, but I would appreciate a more in-depth of the math and heuristics involved

EDIT: I found this on their blog

https://medium.com/guesstimate-blog/lognormal-normal-833bf41...

ozgooen7y ago

Thanks for the feedback.

We have some documentation [here](https://docs.getguesstimate.com/), and some in the sidebar entree.

Generally, we recommend lognormal distributions for estimated parameters that can't be negative. This works when you span multiple orders of magnitude, though it's possible you may want an even more skewed distribution (which is unsupported).

I may be able to make a much longer video introduction some-time soon.

iamwil7y ago

I saw this a couple of years ago, when it was just a project. Now that there's a price, how did you guys decide on a price? How did you find your first customers? For a broadly applicable tool, how did you know where to start looking?

ozgooen7y ago

We initially had a lot of uncertainty on how to price it but wanted to experiment with more users rather than fewer, with the premise that if it were very successful we could scale up.

I think if I were to start again or spend much time restructuring it, I'd probably focus a lot more on enterprise customers. That would be quite a bit of work though, so I don't have intentions of doing that soon.

reitzensteinm7y ago

I feel like this was a big missed opportunity to link to your exploratory pricing guesstimate.

EGreg7y ago

Wouldn’t it be cool if they got it from their own spreadsheet?

iamwil7y ago

Yeah, I wonder if they did. That would be an interesting spreadsheet to keep updated as people kept purchasing to update their model.

wcrichton7y ago

Does this permit Bayesian inference? e.g. looks like graphical probabilistic programming (hooking up various distributions and performing inference), except the key missing component is the ability to observe values for any given distribution beyond the prior.

marmaduke7y ago

> the key missing component is the ability to observe values

Spot on, it’d need to load some data on final predictions. Or, it could dump the model in a way that another software could use it.

I was thinking of the same thing.

zemlyansky7y ago

I'm developing a similar open-source app for statistical modeling and inference in the browser: https://statsim.com. You can create probabilistic models and then infer their parameters using algorithms such as MCMC or Hamiltonian Monte Carlo. The app is still in beta but it might be useful. Some models: https://github.com/statsim/models

samstave7y ago

This looks fantastic.

I love that it was a no BS signup and start using. Super clean and easy. It would be great to be able to show data on GIS as well - effectively showing the outcomes geographic representations. Ill see if the data I was looking to work with today will work with this tool meaningfully.

triggercut7y ago

I've used Palisade @Risk quite a bit, but for my use case, most of the time I feel like I'm taking a Lamborghini to the comer store. This is perfect for someone like me who is more of a "casual" estimator of things modelling with probability.

r0mulus7y ago

God dammit. This sort of thing pisses me off. Here I am, on vacation, waiting for my family to wake up. What better way to spend my time but pursue HN. I happen upon something like this. Something so damned useful that I have no choice but investigate.

wenc7y ago

Can this sample from an empirical distribution? (i.e. from a CDF, not from a known distribution family)

ozgooen7y ago

You can copy & paste an array of samples and Guesstimate will sample from that cluster. For instance, try pasting the following into the value field of a cell: [1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4,7,7,7,7]

You can use tools like distshaper6 to generate arbitrary distributions, then copy the samples into Guesstimate.

http://smpro.ca/pjs/distshaper/

Guesstimate doesn't yet support an input format for distributions outside of via samples.

wenc7y ago

I should add that I was thinking about this method -- Inverse Transform Sampling [1].

It's also just a fancy name for generating a pseudo-random number from a Uniform distribution in [0,1], and reading off the x-axis of the CDF.

[1] https://en.wikipedia.org/wiki/Inverse_transform_sampling

jmhnilbog7y ago

I would love to see this idea translated into event planning/calendaring. Probabilistic party planning. I want to see what might be happening tonight in addition to what is definitely happening.

"If 5 people show up at my house tomorrow evening, I'll hold a poker night." 10 people were invited and 4 of them RSVP yes and 2 of them RSVP no. It looks like there's a 95% chance I'm holding a poker night tomorrow.

"The X team has a monthly meeting on the 1st, never fail. They haven't decided on the location yet, just that it's on the North Side." As the team members pick possible locations, the possible locations appear more distinct until one is chosen.

paraschopra7y ago

It wasn’t obvious from the landing page but can you link estimates from different models? It would be super cool to directly import variables and their estimates from other models.

ken7y ago

This is terrific! The UI is very clever. I may have to steal some ideas from this.

ozgooen7y ago

Please do so. The UI is all open-source react, so you may be able to copy some components directly if you wanted. I'd be happy to help people out with this if you have requests.

crb0027y ago

I proposed writing something like this while working for DuPont's Encirca platform. Years later still little to no adoption in the farm IT field of these models.

dang7y ago

From 2015: https://news.ycombinator.com/item?id=10816563

hliyan7y ago

Is there a regular spreadsheet product that allows cells to have explicit names and descriptions like this? (in addition to the value)

knight177y ago

Excel has named cell ranges. So instead of

  =A5*A6

you could have

  =interest*principal

You can create them easily too -- can name the individually, can assign names from existing tables and so on. You can have constants too, that is, they don't have to point to any cell [1]. It is a godsend when working with bigger tables having lots of formulas.

[1] https://www.ablebits.com/office-addins-blog/2017/07/11/excel...

[2]https://www.contextures.com/xlNames01.html

hliyan7y ago

Yes, but I was hoping for a feature where the name is displayed on the interface...

1 more reply

ken7y ago

I assume all of them do -- Excel certainly lets you name cells, and attach notes to them.

dr1ggins7y ago

Excel lets you name the cells, I don't think you can describe them though...

chairmanwow7y ago

This looks incredible! Heads up: the mobile version of the site (Safari/iPhoneX) could use some TLC.

struct7y ago

Great tool! (I’m a regular user :D)

samstave7y ago

Can you give some examples on what you have built with it?

struct7y ago

Sure! Here's a particularly big one I built when deciding whether to invest in Walmart: https://www.getguesstimate.com/models/10935

Here's another one I did to ballpark a pension plan: https://www.getguesstimate.com/models/11133

Another thing I like is that you can do simple statistical reckoning for it. For my job, I often have to benchmark something several hundred times with or without a patch applied. It can be bit difficult to put "on average x% faster" in context when the benchmark is noisy, but Guesstimate allows you to answer questions like "assuming somebody ran one run of this benchmark with the patch, and one run without it, what's the expected range of performance improvement that they'd see?" with the actual numbers that you get out of the benchmark: https://www.getguesstimate.com/models/11850

1 more reply

leot7y ago

The mode of the distribution in the video appears to be zero. How is that possible?

jsharf7y ago

I think technically the mode is one of the middle buckets, it looks slightly taller to me.

Anyways, it's a histogram, so the x-axis is split into buckets. The bar all the way on the left is likely some range of hours from 0 to whatever the bucket size is

oferzelig7y ago

I love it! so well executed!

scottmcdot7y ago

Almost looks like it could be used in education, teaching stats?

ozgooen7y ago

I've heard of it being used in a few classes. There was one estimation session with one group of what I remember to be 8th-graders. Honestly, I really don't think you need to be great at statistics to understand the fundamental concepts.

Bootvis7y ago

Is this project still active? The activity on GitHub seems low.

ozgooen7y ago

We're keeping it running but aren't actively improving it.

Invictus07y ago

You should add a gaussian quadrature method as well.

flaque7y ago

This is super cool.

m0zg7y ago

This is such a no-brainer for Microsoft to acquire.

j / k navigate · click thread line to collapse

67 comments

ozgooen7y ago

Cofounder Here: Happy to see this on hnews again.

Note that it's all open source, so if you want to host it in-house you're encouraged to do so. I'm also happy to answer questions about it or help with specific modeling concerns.

smallnamespace7y ago

Are you able to apply global correlations to all the variates?

One of the triggers for the financial crisis in '08 was that the Monte Carlo pricers assumed the various risks were much less correlated than they actually were.

IMHO, any multivariate Monte Carlo analysis that doesn't show your sensitivity to correlation is essentially useless, since your answers may change completely.

ozgooen7y ago

I'd definitely agree that correlations can be a really big deal, especially in very large models like that one.

2 more replies

DINKDINK7y ago

>any multivariate Monte Carlo analysis that doesn't show your sensitivity to correlation is essentially useless

This rational is why this tool shouldn't be used for anything consequential -- like business decisions that can sink you company.

1 more reply

m1017y ago

I would really like to know the answer to the correlations question. It's essential for doing some of the things we do at our company.

Also, are we able to adjust the distribution of each variable?

EDIT: I think you can actually manually create correlations on sheet so it should be fine

1 more reply

andrei_says_7y ago

Thank you!

social_quotient7y ago

Curious what was the result specifics?

winrid7y ago

Can you share what you found?

cheez7y ago

Hey this is super cool, looking forward to your future tools to complement guesstimate.

I'd imagine an Excel plugin to do something similar would be valuable.

gimili7y ago

Breza7y ago

stanislavb7y ago

Seems like you've put some solid work here. Who would you name your top competitors?

ozgooen7y ago

On the question of "what are other ways of doing MC analysis", there are two approaches.

The first is to use Excel apps like Oracle Crystal Ball or @Risk. These are aimed at business analysists. They're pretty expensive, but also quite powerful.

The other option is to use probabilistic programming languages. Stan and PYMC3 are probably the best now, but hopefully, some others will become much better in the next few years.

That said, this is a pretty small space. The main "business competitor" is probably people just using google sheets or Excel without distributions to make models.

Crystall Ball: https://www.oracle.com/applications/crystalball/ @Risk: https://www.palisade.com/risk/default.asp Stan: https://mc-stan.org/ PYMC3: https://docs.pymc.io

social_quotient7y ago

Maybe kaggle but different?

1 more reply

carlob7y ago

Maybe it's just the short video and the FAQ, but I found it particularly difficult to find information about the distributions involved and how to choose that.

I imagine there a bunch of cases where the defaults would not work like you're trying to do error propagation (all normal distributions) or you're trying to compute interval arithmetic.

Is it the case that if you input a range which span multiple orders of magnitude then you get lognormal rather than normal?

I might not be exactly the target audience, but I would appreciate a more in-depth of the math and heuristics involved

EDIT: I found this on their blog

https://medium.com/guesstimate-blog/lognormal-normal-833bf41...

ozgooen7y ago

Thanks for the feedback.

We have some documentation [here](https://docs.getguesstimate.com/), and some in the sidebar entree.

I may be able to make a much longer video introduction some-time soon.

iamwil7y ago

ozgooen7y ago

We initially had a lot of uncertainty on how to price it but wanted to experiment with more users rather than fewer, with the premise that if it were very successful we could scale up.

reitzensteinm7y ago

I feel like this was a big missed opportunity to link to your exploratory pricing guesstimate.

EGreg7y ago

Wouldn’t it be cool if they got it from their own spreadsheet?

iamwil7y ago

Yeah, I wonder if they did. That would be an interesting spreadsheet to keep updated as people kept purchasing to update their model.

wcrichton7y ago

marmaduke7y ago

> the key missing component is the ability to observe values

Spot on, it’d need to load some data on final predictions. Or, it could dump the model in a way that another software could use it.

I was thinking of the same thing.

zemlyansky7y ago

samstave7y ago

This looks fantastic.

triggercut7y ago

r0mulus7y ago

wenc7y ago

Can this sample from an empirical distribution? (i.e. from a CDF, not from a known distribution family)

ozgooen7y ago

You can use tools like distshaper6 to generate arbitrary distributions, then copy the samples into Guesstimate.

http://smpro.ca/pjs/distshaper/

Guesstimate doesn't yet support an input format for distributions outside of via samples.

wenc7y ago

I should add that I was thinking about this method -- Inverse Transform Sampling [1].

It's also just a fancy name for generating a pseudo-random number from a Uniform distribution in [0,1], and reading off the x-axis of the CDF.

[1] https://en.wikipedia.org/wiki/Inverse_transform_sampling

jmhnilbog7y ago

I would love to see this idea translated into event planning/calendaring. Probabilistic party planning. I want to see what might be happening tonight in addition to what is definitely happening.

paraschopra7y ago

It wasn’t obvious from the landing page but can you link estimates from different models? It would be super cool to directly import variables and their estimates from other models.

ken7y ago

This is terrific! The UI is very clever. I may have to steal some ideas from this.

ozgooen7y ago

Please do so. The UI is all open-source react, so you may be able to copy some components directly if you wanted. I'd be happy to help people out with this if you have requests.

crb0027y ago

I proposed writing something like this while working for DuPont's Encirca platform. Years later still little to no adoption in the farm IT field of these models.

dang7y ago

From 2015: https://news.ycombinator.com/item?id=10816563

hliyan7y ago

Is there a regular spreadsheet product that allows cells to have explicit names and descriptions like this? (in addition to the value)

knight177y ago

Excel has named cell ranges. So instead of

  =A5*A6

you could have

  =interest*principal

[1] https://www.ablebits.com/office-addins-blog/2017/07/11/excel...

[2]https://www.contextures.com/xlNames01.html

hliyan7y ago

Yes, but I was hoping for a feature where the name is displayed on the interface...

1 more reply

ken7y ago

I assume all of them do -- Excel certainly lets you name cells, and attach notes to them.

dr1ggins7y ago

Excel lets you name the cells, I don't think you can describe them though...

chairmanwow7y ago

This looks incredible! Heads up: the mobile version of the site (Safari/iPhoneX) could use some TLC.

struct7y ago

Great tool! (I’m a regular user :D)

samstave7y ago

Can you give some examples on what you have built with it?

struct7y ago

Sure! Here's a particularly big one I built when deciding whether to invest in Walmart: https://www.getguesstimate.com/models/10935

Here's another one I did to ballpark a pension plan: https://www.getguesstimate.com/models/11133

1 more reply

leot7y ago

The mode of the distribution in the video appears to be zero. How is that possible?

jsharf7y ago

I think technically the mode is one of the middle buckets, it looks slightly taller to me.

Anyways, it's a histogram, so the x-axis is split into buckets. The bar all the way on the left is likely some range of hours from 0 to whatever the bucket size is

oferzelig7y ago

I love it! so well executed!

scottmcdot7y ago

Almost looks like it could be used in education, teaching stats?

ozgooen7y ago

Bootvis7y ago

Is this project still active? The activity on GitHub seems low.

ozgooen7y ago

We're keeping it running but aren't actively improving it.

Invictus07y ago

You should add a gaussian quadrature method as well.

flaque7y ago

This is super cool.

m0zg7y ago

This is such a no-brainer for Microsoft to acquire.

j / k navigate · click thread line to collapse