I have a screen which displays QR code on browser screen & at the same time it opens a Websocket connection with my backend(Spring). Once the payment is done a Webhook response comes to one of my backend endpoint with the payment state(success/failure) & other order details.
The application is working fine if it is running locally or with only single instance. But since we have our own Auto scaling group and load balancer configured over DNS, the connection is not getting established all the time.
So, how exactly shall this be architectured so as to scale the same horizontally? I don't have any DB configured as of now. I have thought of using SNS with SQS but it seems they are way too many overheads. How do big companies like WhatsApp scale?
You need to include information that comes back in the event that identifies the websocket, either directly (server id + an id the server would understand) or indirectly (something that you can look up in a table/hash to get direct information).
More likely than not, you'll actually want to do the indirect option with something that identifies the browser (session id?) in case the browser reconnects to the websocket before the event comes in. Connectivity is fragile, and clients roam across wifi access points or between wifi and lte, or sometimes between lte boundaries that necessitate different IPs, or sometimes their modem is reset while they wait and they get a new IP, etc. Or something in their network path hates the world and closes idle connections on a very aggressive timeout, or closes connections after a short timeout regardless of idle. Lots of scenarios where a reconnection is likely.
Finally, since you asked about horizontal scaling, my best advice is to scale vertically first. It's usually simpler to manage one server than can do 1M connections than 1k servers that can each do 1k connections. Depending on details, less servers, but larger can be less expensive than many smaller servers; although that changes when your large servers start getting exotic, more than two cpu sockets is a big inflection point, you most likely want to scale horizontally rather than get a quad socket monster (but they exist, and so do eight sockets)
no database, just have the user's websocket reach a simple websocket server which always sends on requests to a fuller API server which can speak back to the NATS server who triggers a push to the user since the websocket server is coupled to NATS. This gives horizontal scale to API servers (if they ignore users/work not for them) and websocket servers.
https://stackoverflow.com/questions/12102110/nginx-to-revers...
https://stackoverflow.com/questions/10550558/nginx-tcp-webso...
Though, here the problem with websocket is that they are stateful and whenever a connection is established it is directly getting established with one of the instance from the list of several instances due to loadbalancer. Now, whenever a new Webhook response comes as it's a normal post request and it doesn't have information regarding which instance was used earlier for making the websocket connection, it may send request to one of the instance where the connection was not established and thus our backend is not able to process this request from this particular instance.
Another approach could be to save the association between the server and the session in a database. When a webhook comes in, if the current server doesn't have the target session, lookup which server does and make a request to an internal endpoint on that server to send the message over.
You could also look into Redis for this. Have the server which is handling the websocket subscribe to key changes for a key associated with the user's payment. When a webhook is received, just update that key in Redis
Eg. from the article: " the WebSocket transport does not have this limitation, since it relies on a single TCP connection for the whole session. Which means that if you disable the HTTP long-polling transport (which is a perfectly valid choice in 2021), you won't need sticky sessions "
still has to exist and stay active. (ie, interact with the correct node)
"Now, whenever a new Webhook response comes ... it doesn't have information regarding which instance was used earlier for making the websocket connection, it may send request to one of the instance where the connection was not established and thus our backend is not able to process this request from this particular instance."
just include the internal ip of the websocket connection to the data you send to your billing operator and then forward the post request appropiately.
Websockets to me don't seem like the ideal approach here, since the communication is just from the server to the client (an update of a data payload once an event occurs in the backend system).
Websockets have quite a bit of technical complexity requiring significant architectural effort to ensure reliability of a service. Ably is a company that offers websockets as a service and has some good blog articles to start you out if you're sure about this path.
What I would recommend with the details provided so far is to either use SSE or long-polling. The "downsides" are often over-exaggerated, and there are lots of businesses that one would assume use websockets that really are just using SSEs because operationally and architecturally it is vastly simpler to reason about.
I can almost guarantee that the complexity of adding another API endpoint will be drastically less than standing up a reliable websocket infrastructure.