Prediction market: We're unhappy with Firestore. What will we switch to? (opens in new tab)

(manifold.markets)

43 pointsakrolsmir4y ago43 comments

43 comments

Unless you need realtime PostgreSQL is always the most obvious solution.

If you do need realtime you can build it on PostgreSQL yourself depending on your requirements either using LISTEN/NOTIFY or logical replication. There are tradeoffs to both if tbh if you are asking this question you probably don't want to go that path.

Non-realtime it's very easy to handle nested JSON in PostgreSQL but I would still avoid it like the plague unless it's user-supplied data without any real schema.

You might feel like schema-less lets you "move faster" but it's a load of horseshit that really starts to stink much sooner than you might think.

Schemas and by extension database integrity make it easier to move faster because migrations allow you to ensure there are no edge conditions related to stored data when upgrading your code to use an extended or otherwise modified data model.

The other main benefit of PostgreSQL is just the sheer body of resources available, with the exception of the other main RDBMS (MySQL/MSSQL) it just completely dwarfs what is available for other data stores. You will rarely if ever encounter a problem someone hasn't already solved.

searchableguy4y ago

Use realtime from supabase https://github.com/supabase/realtime for realtime Postgres.

You can also use the whole package if you need a firebase alternative.

akrolsmirOP4y ago

Yeah - Supabase would be my leading bet, if we had to pick a Postgres variant. Would be especially excited for their GraphQL plugin becomes ready: https://supabase.com/blog/2021/12/03/pg-graphql

kiwicopple4y ago

> GraphQL plugin becomes ready

landing in ~2 weeks

Feel free to reach out if you need earlier access (my email is in my profile)

SpaghettiX4y ago

Supabase's pricing is $25/project/month. I wish it scaled with use instead.

1 more reply

alphabettsy4y ago

Check out Hasura as well.

sofixa4y ago

> Unless you need realtime PostgreSQL is always the most obvious solution.

Or an easy HA setup...

com2kid4y ago

> - Related: Firebase cloud functions have extremely slow cold-start times (>5 seconds is common), and deploying new versions can take minutes.

This somehow has gotten a lot worse. When I first started with Firebase Cloud Functions, which to be clear are amazing and simple to get up and running with compared to anything competitors offer, it was a few seconds to deploy. Sadly it has gotten worse and worse as time goes on.

Still though, the paradigm around cloud functions is so simple compared to the nightmare that is AWS Lambdas. When trying to explain why cloud functions are better, AWS users just stare at me like I am talking in a foreign language about some sort of mythical land of make believe.

akrolsmirOP4y ago

We've started on Firebase Cloud Functions, but lately I've been interested in moving our serverless stack to Vercel's functions instead - the developer experience is much better (no idea about cold-starts though). We'd lose the neat integration with Firebase Auth, though...

akrolsmirOP4y ago

More context: The main thing I'm unhappy with is the extra developer burden imposed by needing to denormalize information. E.g: I have a user document in Firestore, with userId, name, and avatarUrl. If I want to be able to fetch a list of comments and have the name & avatarUrl of the creators, in Firestore I have to write those alongside userId. Then, if I later add isVerified to the user document, I need to either backfill my entire db and denormalize again; or client-side handle the missing case.

Then the other pain point is the "joins" use case; right now we do the equivalent of fetching all comments & users, then doing an in-memory join. Ideally, we could craft a single request that just says "get the 10 latest comments on this market, plus the associated avatars" without data duplication and without doing a bunch of up-front thinking about exactly how to structure indexes.

My hesitation with relational DBs comes from the mismatch between client data model (loosely, JSON objects of pointers) and how it's represented in the DB (in a row); plus the requirements of needing to specify your e.g. indices up front, and annoyance of doing migrations. I'm hopeful someone's found a graph-type solution to work really well for them!

anaccountexists4y ago

I mean, if you don’t want to denormalize your data, you’re going to need to think about indexes in some capacity regardless (this is true for non-relational DBs like Dynamo and Mongo too).

"get the 10 latest comments on this market, plus the associated avatars" couldn’t be better suited to a relational DB. That’s a textbook use case that Postgres would be amazingly well suited to.

Also: remember with Firestone that you’re paying for redundancy and availability that’s entirely Google managed. Most DB offerings you work with on your own are significantly more hands on as far as recovery / backups / replication go.

Engineering time is usually more expensive than server costs when you’re a startup, so think about how much time it’d take to do it yourself before you decide to optimize your server costs over R&D costs.

akrolsmirOP4y ago

Yeah - totally agreed re: eng time > server costs; the db costs are the least significant part of the equation.

Fundamentally, I'm trying to optimize for something like "developer happiness as we build out lots of new features quickly". My dream workflow would look something like: take the Typescript types we've defined on the client, and shove that somewhere in the cloud; then later query to pull out exactly the data we need to render any particular view of our site (ala GraphQL)

AND I'd really like to not have to spend a lot of up-front time knowing exactly which indices to set up, or to figure out complicated migrations later. And I'd like to not think about hosting/managing replications, etc. Maybe that's too many asks, and I'm being too greedy! I'm just hoping that someone's solved this pain point already, and I just haven't heard about it.

jacobmischka4y ago

We do something kind of similar, only in reverse, with Prisma and PostgreSQL (models are defined in Prisma and TypeScript types are automatically generated for use in the client). It's been a pleasant developer experience, though we do not have any realtime data needs yet (we just do basic polling for the few parts that need reactivity based on database updates). I wasn't aware of the Supabase realtime PG project which was discussed here, so thanks for bringing that to our attention!

1 more reply

marviel4y ago

Not a well-understood solution to me, but you may be interested in looking into FaunaDB. IIUC, It works by defining a GraphQL API and queries you want to use on it, and it will create the correct data structures behind the scenes to allow efficient queries of the form you've provided.

It is really new, though, from all I can tell.

simulate-me4y ago

Why not do something like:

1) Fetch the list of comments

2) Add a listener on the public user info for each comment poster

3) Render the comments immediately. When the user info is available, re-render with the avatar information.

The nice thing about this is that the avatar information will immediately update in real-time as soon as someone updates their avatar. Yes, with a KV-store, you need to do more reads because you can't join data (which implicitly will do a read btw), but it doesn't seem like that big of a deal to me. Immediately reflecting changes to the public user state seems nicer than the convenience of a join.

stickfigure4y ago

I'm a longtime user of Google App Engine / Cloud Datastore / Firestore and wrote a Java ORM for that environment that has achieved some popularity. I really like the datastore for some applications, but there's a pretty good chance that it's a bad fit for you. While I'm only casually familiar with your problem domain, it seems at first glance like the kind of thing that would scale reasonably well in a traditional RDBMS. You could would get a lot of value from joins and aggregations, and you don't really have zillions of elements changing all at once.

I could probably give you some advice on how to use the datastore better (most of which would be along the lines of "don't denormalize, store foreign keys and use batch key fetches instead") but it might just be the wrong tool for the job. If you want to talk about it, contact info is on my profile.

akrolsmirOP4y ago

Yeah - we do store foreign keys, but Firestore only supports fetching a batch of 10 keys at a time afaict. It might just be the wrong tool, like you said.

We do very much want real-time updating, but there are okay integrations for that with RBDMS's now (eg Supabase). Primarily, I'm curious about some of the newer/more modern DBs, and whether anyone has had good or bad experiences with them!

burggraf4y ago

Supabase developer here. I came from Firestore to Supabase due to running into a lot of limitations you're seeing. Just my biased opinion, but looking at "newer/more modern DBs" is not necessarily the route you want to take. That's why I looked at Firestore and ended up at Supabase. PostgreSQL is not "newer/more modern" but it's time-tested, battle-tested, and I know thousands of companies have used it in production for decades. I prefer to go with something I know works, will work at scale, and has tons of community and commercial support. FWIW

krikou4y ago

> developer burden imposed by needing to denormalize information.

> Then the other pain point is the "joins" use case;

We usually do that client side, with the aid of a web-component holding a ref to the (realtime db, not firestore) database path, and rendering its value. The payload is small as you only fetch data you use.

That works pretty well, even with long lists or grids; quotas/price on the realtime db are pretty generous.

AlchemistCamp4y ago

What initially motivated you to use Firestore?

bo10244y ago

1) Any thoughts on the conflict of interest? Any insider can easily take advantage of a market like this.

2) Is there any element of a decision market here? Are you asking users to just predict, or help you decide? The incentives change a lot in the latter case, since one can get cyclic dependency -- self-fulfilling prophecies.

jahooma4y ago

Great questions!

1) There are three insiders on this decision which would be the three cofounders of this site (including me and the OP). We wouldn't insider trade because we genuinely want to know the answer to this question. The market is also tied to what we actually do (and we're not going to lie!).

2) This is both for you to predict what we will do, and to convince us. If you propose a DB and make a case for it, you can gain in expectation because maybe there really is like a 7% chance we'd pick it. So buying up shares from 0-7% is a win for you.

Alternatively, if you think that one DB choice is much better than the others, and we would be somewhat likely to figure that out, then you might gain by buying shares in that answer.

Basically, I think the incentives are good! We're using a more rigorous mechanism — prediction markets! — to do Q&A better than Stack Overflow.

bo10244y ago

1) Cool. I'd add it's really important that everyone knows and believes that you won't insider trade, otherwise they might get scared off of participating. So the perception is important too.

2) Right, so, the way this can go wrong in theory (not saying it will happen in practice) is if a large group of users, or a small group with a lot of points, decide to all get together and focus on one alternative regardless of how good it is. They predict that alternative really strongly and vote against / predict against all the others.

Then, again in theory, you (the site operators) look at the votes and say 'wow, this database must be much better than all the others, we'd better use it.' Then the colluders all get their predictions proven true, so it's a self-fulfilling prophecy.

Here's one research study that looked at this kind of issue, although they end up not finding a lot of manipulation: https://www.researchgate.net/publication/315529106_Manipulat...

teruakohatu4y ago

But you are not asking what you should switch to, you are asking users to predict what you will be using on a certain date.

Prediction markets are supposed to have insiders buying in. Nothing wrong with that, but they are not supposed to be self fulfilling prophecies. Right? And certainly the prediction market itself should not be the insiders.

jahooma4y ago

>>> [Prediction markets] are not supposed to be self fulfilling prophecies. Right? And certainly the prediction market itself should not be the insiders.

We've thrown out all the rules. Why can't the prediction market be a self-fulfilling prophecy? Why can't the market outcome be decided by insiders?

What matters is what the incentives are. I say the incentives here are for you to point us to the most useful database and argue for it. That helps us.

Our incentive as insiders is to be truthful about our choice, because we want to incentivize you to help us choose a database.

nl4y ago

There is a lot of evidence that prediction markets work best when they are inside markets.

Yes there are problems with that!

latchkey4y ago

Firestore has very specific usecases that don't match a lot of things that Postgres is really meant to handle. Namely, as you are learning the hard way, it isn't a relational database. It doesn't make it bad, it just isn't what you thought it was. $1400 a month is nuts, it sounds like something isn't optimized well.

GCP Cloud SQL (Postgres) has been fantastic for me. Easy to connect to, easy to scale and not that expensive.

I have GCP Cloud Functions (golang) in front of that and they spin up, connect to the database and serve the whole request in under 1s. Hot requests are 80ms for submitting some simple data.

If you do it right, you can minimize persistent connections to the SQL backend (offload as much as possible to PubSub messages which a backend function can handle). This will keep your bills down too because you won't need as large of an instance to serve requests.

antifa4y ago

I wish GCP had 2 things. A dumber key-value-store/mongodb-clone thing. Firebase has too much going on with it's confusing await getCollection().getDocument().getSnapshot().data() thing, the half documented transition to composition API, and I didn't need realtime.

Also a serverless postgres offering. Not even the fancy kind. There are some great free tiers outside of GCP. I have some projects where my usage would, if given true pay-as-you-go pricing, would fairly be between $0.10/month and $25/year. But GCP starts at $9/month for a postgres and a lot of competitors just leap straight to $25/month when leaving the free tier.

breakingcups4y ago

Re: the serverless Postgres offering, if you allow me to cheat a little bit I might suggest CockroachDB Serverless. It's not Postgres, but it is Postgres wire-compatible. For some use-cases that might be enough.

tlarkworthy4y ago

Google cloud storage (and S3) are the dumb KV stores with great read latencies

latchkey4y ago

Dumb question: how is serverless postgres different from cloud sql postgres?

lf-non4y ago

Serverless solutions are typically priced based on usage - in case of db you may be charged based on actual iops, storage used etc. as opposed to pre-provisioned reserved capacity and infrastructure.

Depending on usage patterns the former can be cheaper for apps that don't need reserved capacity and need sporadic/occasional resource access or have unpredictable spikes with otherwise low usage.

antifa4y ago

The minimum price of existence involves reserving a server and 10GB of disk space (about $6/month). This cost is incurred if you leave it empty and do nothing with it the entire month. Serverless things typically can scale cost closer to zero if real usage reflects that.

latchkey4y ago

I'd love to know of a real world example of a serverless postgres.

I'm not sure how much closer to zero you can get at $6/month.

1 more reply

satyrnein4y ago

The site sounds neat, and using the site to answer this question about itself is amusingly meta. I'm not sure if M$ are 1:1 dollars or even real money, so I wanted a FAQ or something, but there's only a sign up form (on mobile), which I'm certainly not going to do for something that might just be a Hollywood Stock Exchange type of game. This is all probably intentional, but letting you know just in case!

linsomniac4y ago

You get M$1,000 on sign up, and can buy at 100:1 (USD$5 for M$500).

jasfi4y ago

You could use PostgreSQL with your own back-end. Related to this, I'm working on an SDK for Flutter that works via REST and is quite easy to use: https://nexusdev.tools/

tqkxzugoaupvwqr4y ago

Tip: You can avoid saving the avatar url if you use a predictable url like /avatars/:userId instead of something like /avatars/:randomToken

gerardnico4y ago

https://fly.io/docs/reference/postgres/

xrd4y ago

Do you mean Firebase or Firestore?

If you mean Firestore, what about minio? It's incredibly scriptable and awesome.

If you mean Firebase, use rxdb and connect over graphql (hasura is fantastic) to your postgres database. It can be a little work to understand how the models all map into the database, but once you get it, it's magical.

Both are easy to self host. I run my entire stack of all components on dokku, so I get easy logging, backup, and can migrate to a new host in a few standard commands.

j / k navigate · click thread line to collapse

43 comments

jpgvm4y ago

Unless you need realtime PostgreSQL is always the most obvious solution.

Non-realtime it's very easy to handle nested JSON in PostgreSQL but I would still avoid it like the plague unless it's user-supplied data without any real schema.

You might feel like schema-less lets you "move faster" but it's a load of horseshit that really starts to stink much sooner than you might think.

searchableguy4y ago

Use realtime from supabase https://github.com/supabase/realtime for realtime Postgres.

You can also use the whole package if you need a firebase alternative.

akrolsmirOP4y ago

Yeah - Supabase would be my leading bet, if we had to pick a Postgres variant. Would be especially excited for their GraphQL plugin becomes ready: https://supabase.com/blog/2021/12/03/pg-graphql

kiwicopple4y ago

> GraphQL plugin becomes ready

landing in ~2 weeks

Feel free to reach out if you need earlier access (my email is in my profile)

SpaghettiX4y ago

Supabase's pricing is $25/project/month. I wish it scaled with use instead.

1 more reply

alphabettsy4y ago

Check out Hasura as well.

sofixa4y ago

> Unless you need realtime PostgreSQL is always the most obvious solution.

Or an easy HA setup...

com2kid4y ago

> - Related: Firebase cloud functions have extremely slow cold-start times (>5 seconds is common), and deploying new versions can take minutes.

akrolsmirOP4y ago

anaccountexists4y ago

I mean, if you don’t want to denormalize your data, you’re going to need to think about indexes in some capacity regardless (this is true for non-relational DBs like Dynamo and Mongo too).

"get the 10 latest comments on this market, plus the associated avatars" couldn’t be better suited to a relational DB. That’s a textbook use case that Postgres would be amazingly well suited to.

akrolsmirOP4y ago

Yeah - totally agreed re: eng time > server costs; the db costs are the least significant part of the equation.

jacobmischka4y ago

1 more reply

marviel4y ago

It is really new, though, from all I can tell.

simulate-me4y ago

Why not do something like:

1) Fetch the list of comments

2) Add a listener on the public user info for each comment poster

3) Render the comments immediately. When the user info is available, re-render with the avatar information.

stickfigure4y ago

akrolsmirOP4y ago

Yeah - we do store foreign keys, but Firestore only supports fetching a batch of 10 keys at a time afaict. It might just be the wrong tool, like you said.

burggraf4y ago

krikou4y ago

> developer burden imposed by needing to denormalize information.

> Then the other pain point is the "joins" use case;

That works pretty well, even with long lists or grids; quotas/price on the realtime db are pretty generous.

AlchemistCamp4y ago

What initially motivated you to use Firestore?

bo10244y ago

1) Any thoughts on the conflict of interest? Any insider can easily take advantage of a market like this.

jahooma4y ago

Great questions!

Alternatively, if you think that one DB choice is much better than the others, and we would be somewhat likely to figure that out, then you might gain by buying shares in that answer.

Basically, I think the incentives are good! We're using a more rigorous mechanism — prediction markets! — to do Q&A better than Stack Overflow.

bo10244y ago

1) Cool. I'd add it's really important that everyone knows and believes that you won't insider trade, otherwise they might get scared off of participating. So the perception is important too.

Here's one research study that looked at this kind of issue, although they end up not finding a lot of manipulation: https://www.researchgate.net/publication/315529106_Manipulat...

teruakohatu4y ago

But you are not asking what you should switch to, you are asking users to predict what you will be using on a certain date.

jahooma4y ago

>>> [Prediction markets] are not supposed to be self fulfilling prophecies. Right? And certainly the prediction market itself should not be the insiders.

We've thrown out all the rules. Why can't the prediction market be a self-fulfilling prophecy? Why can't the market outcome be decided by insiders?

What matters is what the incentives are. I say the incentives here are for you to point us to the most useful database and argue for it. That helps us.

Our incentive as insiders is to be truthful about our choice, because we want to incentivize you to help us choose a database.

nl4y ago

There is a lot of evidence that prediction markets work best when they are inside markets.

Yes there are problems with that!

latchkey4y ago

GCP Cloud SQL (Postgres) has been fantastic for me. Easy to connect to, easy to scale and not that expensive.

I have GCP Cloud Functions (golang) in front of that and they spin up, connect to the database and serve the whole request in under 1s. Hot requests are 80ms for submitting some simple data.

antifa4y ago

breakingcups4y ago

tlarkworthy4y ago

Google cloud storage (and S3) are the dumb KV stores with great read latencies

latchkey4y ago

Dumb question: how is serverless postgres different from cloud sql postgres?

lf-non4y ago

Serverless solutions are typically priced based on usage - in case of db you may be charged based on actual iops, storage used etc. as opposed to pre-provisioned reserved capacity and infrastructure.

Depending on usage patterns the former can be cheaper for apps that don't need reserved capacity and need sporadic/occasional resource access or have unpredictable spikes with otherwise low usage.

antifa4y ago

latchkey4y ago

I'd love to know of a real world example of a serverless postgres.

I'm not sure how much closer to zero you can get at $6/month.

1 more reply

satyrnein4y ago

linsomniac4y ago

You get M$1,000 on sign up, and can buy at 100:1 (USD$5 for M$500).

jasfi4y ago

You could use PostgreSQL with your own back-end. Related to this, I'm working on an SDK for Flutter that works via REST and is quite easy to use: https://nexusdev.tools/

tqkxzugoaupvwqr4y ago

Tip: You can avoid saving the avatar url if you use a predictable url like /avatars/:userId instead of something like /avatars/:randomToken

gerardnico4y ago

https://fly.io/docs/reference/postgres/

xrd4y ago

Do you mean Firebase or Firestore?

If you mean Firestore, what about minio? It's incredibly scriptable and awesome.

Both are easy to self host. I run my entire stack of all components on dokku, so I get easy logging, backup, and can migrate to a new host in a few standard commands.

j / k navigate · click thread line to collapse