Regardless, one thing I find maddening about chess.com is the time architecture of the game. I haven't seen the underlying code, but it feels like the SERVER is tracking the time. This completely neglects transport time & latency meaning that 1s to move isn't really a second. Playing on the mobile client is an exercise in frustration if you are playing timed games and down to the wire. Even when you aren't, your clock will jump on normal moves and it is most obvious during the opening.
This could also be due to general poor network code as well. The number of errors I get during puzzles is also frustrating. Do they really not retry a send automatically?? <breath>
Chess.com has the brand and the names... but dang, the tech feels SO rough to me.
This is one of the many, many things but imo it's the most telling. They can't even add a clock counting down the 6 minutes to their web client.
>>I think I heard a streamer say it was for kicking out the cheaters
I can't see how it can possibly help. Maybe he meant something else?
TBH this is what I expected for all online chess. How else to reconcile the two players' differing clocks and also prevent client-side cheating?
I do recognize that fps games utilize predictive algorithms and planning to estimate future player positions but still, turn based networking with 100ms accuracy should be a solved problem
on lichess it does have an impact. lichess has a thing they call lag compensation where the server can add time to a player's clock after the server receives their move.
The goal is to make it fair for someone with high lag playing someone with low lag.
I don't know the exact cheating method used. I'll have a guess, though. What if someone spent a few seconds looking at the board before making their move, and then adding (edit: oops, subtracting) a few seconds to their clock in their response packet. The server would see the client made their move instantly based on the time in the response packet, but it took a few seconds for the server to receive the packet. i.e. lag. So it might add time to compensate for the perceived lag.
Lag compensation cheating is a frequent topic on the lichess forums.
Is there a point in preventing cheating, really? I can just make a bot...
Seems right.
If you export/download games from lichess, they use the .pgn (Portable Game Notation) format, which is a standard plain-text format circa 1993, used by pretty much everyone for describing a chess game.
Lichess follows the specification to the letter, and as it only technically allows one-second accuracy, lichess only record moves with one-second accuracy. It seems insane, but that's how they do it.
Chess.com also exports PGN files, but they add a decimal place, allowing subsecond accuracy. No one has a problem with this. There is no software which cannot handle this. But Lichess refuses to "break" the spec.
lichess PGN export example:
> 1. d3 { [%eval -0.15] [%clk 0:01:00] } 1... g6 { [%eval 0.04] [%clk 0:01:00] }
Chess.com PGN export example:
> 1. d4 {[%clk 0:02:58.6]} 1... b6 {[%clk 0:02:59.2]}
According to this blog post, this doesn't appear to be the case since at least 2017:
https://lichess.org/@/lichess/blog/a-better-game-clock-histo...
"Move times are now stored and displayed with a precision of one tenth of a second. The precision even goes up to one hundredth of a second, for positions where the player had less than 10 seconds left on their clock."
It's super annoying and the reason I only play blitz+ on chesscom.
[1]https://www.chess.com/forum/view/help-support/mate-in-one-qu...
chess.com confirmed the issue.
Interesting that they accumulate and periodically store game state. Unfortunately it is not very clear, where they store ongoing game state - in redis or on server itself. Also cost breakdown doesn't have server for redis, only for DB.
BTW, their github has better architectural picture, than overly simplified one in the article: https://raw.githubusercontent.com/lichess-org/lila/master/pu.... Unfortunately, I'm afraid, drawing something like that during interview may not land a job at faang =(
Note that they have cost per game fairly low: $0.00027, 3,671 games per dollar.
Their cost breakdown, for ones who are curious https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...
p.s. I'm not saying that Lichess's approach is the best or faang is the worst. Remember, lichess had 10 hours outage exactly because of the architecture chosen (single datacenter dependency). https://lichess.org/@/Lichess/blog/post-mortem-of-our-longes... . And outages like that are exactly the reasons why multi-datacenter and multi-region architectures are drilled down into faang engineers.
My point is is that there are cases when this approach is legit, but typical interview is laser focused on different things, and most probably won't appreciate the "old style" approach to the problem. I'm sure that if Thibault will ever decide to land in faang he will neither do whiteboard coding nor system design.
Yet another reason to be skeptical of the quality of hiring in faang if anything.
My criticism was mostly towards the very poor metrics these companies have introduced behind hiring, albeit I can understand that given the gigantic amount of applications they get a mechanism for removing false positives is acceptable even if missing on false negatives.
And even more that it spread to companies that do not have their problems and can't afford false negatives.
Looking at second-order effects, many companies look up to FAANG for "best practices", which often includes them blindly copying their hiring practices. Without feeling or calling out any healthy skepticism, the software hiring world becomes a worse place overall.
Just a wild guess: might be intended to lower the implementation barrier for new open-source software clients on new platforms, and/or preempt them from implementing subtle logic bugs that only show up much later.
The rules of chess are a bit tedious to implement, and you can easily get tired and code an edge-case bug that's almost invisible. Lichess itself did this—it once had a logic error that affected a very tiny number (exactly 7) of games,
https://github.com/lichess-org/database/issues/23 ("Before 2015: Some games with illegal moves were recorded")
(I apologize I couldn't find the specific patch that fixed this)
Naturally, it's not possible to view this move anymore, but this game (https://lichess.org/XDQeUk6j#48) has everything up until the last legal move right before the illegal castling happened.
Also that linked game is pretty entertaining. It's not a good game, but it can be fun watching lower ranked players make moves that you'd never see in higher level games. Like, who plays Bb5+ against the Scandinavian? Amazing stuff.
(the broken code checked that the only pieces on the king's path to its new position were kings and rooks of the appropriate color)
Validating a submitted move is distinct from listing valid moves. I assumed the server would need to validate regardless of providing a list to the client.
A bit of surprise consideration … is that even common in these days of overfancy web sites.
I will have to take a look, because whatever it's doing, it works very well!
Well, except for that one major outage where everything shit the bed due to some misconfiguration of IP multicast in the datacenters, or so I was told.
So, maybe if your mission isn't life critical, you can just wrongfully assume exactly-once delivery.
[1]: https://en.wikipedia.org/wiki/Pragmatic_General_Multicast
To do that, the server needs some measure of “how long does the client think the player actually took to make a move”, to later subtract latency not attributable to actual thinking from the clock.
I tried this and not all the messages I sent arrived.
At any rate my conclusion was disappointment that if I actually want reliability, I need to implement my own ACKs anyway, meaning I'm paying a pretty high overhead for no benefit.
At least now there's UDP in browser with WebTransport. I haven't tried it yet, but I hear it's a lot more pleasant than the previous option WebRTC, which was so convoluted (for the "I just want a UDP socket" usecase) that very few people used it.
Saw CF had some paying solution, but was wondering about a free solution
Cloudflare has a lower latency product called Argo Smart Routing [1]. When we tried Argo in 2020, we still saw 10+ ms increased latency across the board, which is unacceptable for competitive multiplayer games. That said, Discord voice still (or used to) uses Argo for voice, so there are certainly less latency-sensitive games where it would work well.
The other issue with sockets over Cloudflare (circa 2020 on business plan) is they get terminate liberally with the assumption you have a reconnection mechanism in place. I'd imagine this is acceptable for traditional WebSocket use cases, but not for games.
Services like OVH & Vultr also advertise "DDoS protection for games," but I've found these to be pretty useless in practice. We can only measure traffic that reaches our game servers, so I have no way of knowing if they're actually helping at all.
Your best bet is getting familiar with iptables and fine-tuning rules to match your game's traffic patterns. Thankfully, LLMs are pretty good at generating these rules for you nowadays if you're not already familiar with these tools. Make sure to set up something like node-exporter to be able to monitor attacks and understand where things go wrong. There have been a few other posts on HN in the past that go into more depth about game server DoS mitigation [2] [3].
I built something in the same vein for my startup (Apache 2.0 OSS, steal our code!) [4] that runs a series of load balancers in front of game servers in order to act like a mini-Cloudflare. In addition to the basics I already listed, we also have logic under the hood that (a) dynamically routes traffic to load balancers and (b) autoscales hardware based on traffic in order to absorb attacks. We're rolling out a dynamic bot attack & mitigation mechanism soon to handle more complex patterns.
[1] https://www.cloudflare.com/application-services/products/arg...
[2] https://news.ycombinator.com/item?id=35771466
What still isn't great is the ecosystem and the build-tooling compared to Rust (part of it because of the JVM). But just language-wise, it basically has all the goodies of Rust and much more. Ofc. it's easier for Scala to have that because it does not have to balance against zero-overhead abstraction like Rust does.
Still, Scala was hyped at some point (and I find it wasn't justified). But now, the language is actually one if not the best of very-high-level-languages that is used in production and not just academic. It's kind of sad to see, that it does not receive more traction, but it does not have the marketing budget of, say, golang.
> all the goodies of Rust
Does it prevent me from using a non-thread-safe object in multiple threads? Or storing a given object which is no longer valid after the call ends?
Does it have a unified error handling culture? In Scala some prefer exceptions (with or without `using CanThrow`), some prefer the `Either` (`Result`) type.
Does it have named destructuring?
Basically, you needed a good and experienced developer from the start of a project for it to be a nice code base.
> I'm very fluent in Scala 2, but I will avoid Scala if I can, mostly to stay away from purely functional programmers.
There is the whole [Li Haoyi](http://www.lihaoyi.com/) ecosystem in Scala that is much more python-like, but nicely designed, statically typed and using immutable datastructures by default. I think it's the best you can get nowadays if you want to have immutable datastructures on the JVM. Any other option I've ever tried was way worse.
If you are fine with Java's stdlib then I guess Kotlin is the better choice.
> Does it prevent me from using a non-thread-safe object in multiple threads?
I would answer the question with yes, but maybe in a different way than you might expect. Scala prevents problems/bugs from using a non-thread-safe object in multiple threads by simply having immutability by default. Rust cannot do that (due to performance) so it has to have another way (the borrow checker). I would argue that the Scala way is better if you don't need the performance / memory-efficiency of rust and can live with garbage collection. That reduces the domains that you can use Scala for, but in exchange the code will be simpler compared to Rust code, so in those domains Scala will have the advantage but it's a minor one.
> Or storing a given object which is no longer valid after the call ends?
To this one I would say "in practice yes". Rust is better here, but when using e.g. [ZIO Scope](https://zio.dev/reference/resource/scope/) then the problem isn't really existing. You can technically still do something like that, but you would basically have to do it intentionally. Rust has the advantage here though, but it's a minor one.
> Does it have a unified error handling culture?
No, Scala has no unified culture. Maybe the situation is better than in Rust, but then Rust has its own problems. [Just a few days ago I found a comment about a problem caused by a hardcoded panic that caused issues](https://github.com/orgs/meilisearch/discussions/532#discussi...).
> Does it have named destructuring?
Unless we are talking about two different things, yes it does. I would even argue that Scala is more powerful here, because it also supports local imports and (with Scala 3) exports. So not only can you extract fields of an object into a variable, you can also generally bring them into scope and alias them at the same time, but you can do the reverse as well: [you can export them as well](https://docs.scala-lang.org/scala3/reference/other-new-featu...).
And 2.) most people will go with sbt; and while it has improved a lot it is still comparably slow, has some annoying bugs and so on.
Compare that to Rust - I don't think those problems exist there.
I don't understand why the author didn't just look this up in the source code. Lichess is open source and we can see exactly what this field is here, it's the average lag:
https://github.com/lichess-org/lila/blob/45b5f0cfbbf6c045ad7...
send = (t: string, d: any, o: any = {}, noRetry = false): void => {
const msg: Partial<MsgOut> = { t };
if (d !== undefined) {
if (o.withLag) d.l = Math.round(this.averageLag);
if (o.millis >= 0) d.s = Math.round(o.millis * 0.1).toString(36);
msg.d = d;
}
if (o.ackable) {
msg.d = msg.d || {}; // can't ack message without data
this.ackable.register(t, msg.d); // adds d.a, the ack ID we expect to get back
}
const message = JSON.stringify(msg);
...
Which is calculated from how long the server takes to respond to ping messages that the client sends: private schedulePing = (delay: number): void => {
clearTimeout(this.pingSchedule);
this.pingSchedule = setTimeout(this.pingNow, delay);
};
private pingNow = (): void => {
clearTimeout(this.pingSchedule);
clearTimeout(this.connectSchedule);
const pingData =
this.options.isAuth && this.pongCount % 10 == 2
? JSON.stringify({
t: 'p',
l: Math.round(0.1 * this.averageLag),
})
: 'null';
try {
this.ws!.send(pingData);
this.lastPingTime = performance.now();
} catch (e) {
this.debug(e, true);
}
this.scheduleConnect();
};
private computePingDelay = (): number => this.options.pingDelay + (this.options.idle ? 1000 : 0);
private pong = (): void => {
clearTimeout(this.connectSchedule);
this.schedulePing(this.computePingDelay());
const currentLag = Math.min(performance.now() - this.lastPingTime, 10000);
this.pongCount++;
// Average first 4 pings, then switch to decaying average.
const mix = this.pongCount > 4 ? 0.1 : 1 / this.pongCount;
this.averageLag += mix * (currentLag - this.averageLag);
pubsub.emit('socket.lag', this.averageLag);
this.updateStats(currentLag);
};Yes the instance of chess is finite but the problem of computing moves is inherently in NP.
The key is that just because a problem is in NP it does't mean that its difficult to solve the instances with small parameters.
See the famous coloring, SAT, or any other equal NP problem...
Edit: also includes move count but not repetition.
https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...
Even though nowadays I hardly have time to play, I'm still happy to support such a delightfully honorable and usable(!) open-source project.
It's a weird trend. Altruism truly does not exist
(I donated btw) (Probably more than you) (But who's counting)
There are many aspects in which they are not the best.
Ad-free, compute intensive, non-CRUD, massively scaled, complex cheat moderation, infinite puzzles/analysis, educational (studies/tactics/openings explorer), etc. All this for free. I'm curious what's the best website in your opinion
What is the point of responding with any legitimate criticism when any potentially negative sentiment however mild, upfront, expressing disagreement, gets downvoted to the point where the mechanics of the website squelches the person and silences them (by purposeful intent).
Can you ever have any legitimate intelligent conversation after a participant has been harmed and effectively silenced in this way?
When you cannot speak freely, there can be no intelligent communications raising the bar objectively. The opposite occurs, and anything provided, even seemingly rational conversation falls after such a threat or action of violence, all conversation then falls into the gutter as a result of the added coercive cost imposed. You may contend that its not violence, but it meets the WHO definition for such which properly accounts for psychological torture and coercion (of which this is a common form).
It should go without saying, but you cannot have any intelligent conversation when those who embrace totalitarian methods prevent you from speaking (and yes these meet the criteria).
At the point this happens, regardless of valid criticism, or pointing out errors in methodology, it all dies on the vine, the communication is clear; you will be punished for disagreeing. That destructive behavior inevitably leads to ruin.
This is fairly basic stuff, in order to think and be intelligent, one must be able to risk being offensive. In order to learn something new, one must risk being offended.
When neither are possible because you or someone else muzzles any conversation expressing disagreement or corrosively add cost, even under such modest terms as here, the fallout is silent, yet devastating.
It might not seem like much, but the light goes out of the world as those with intelligence withdraw their support, and the natural consequences which were held at bay by these people, albeit slow moving, become inevitable.
Best of luck to you. There is only the possibility of harm by continuing any discussion under these circumstances.
I'd suggest remembering this when you start wondering, "where have all the intelligent and competent people gone?".
Silence doesn't indicate agreement. It is indicative of the best and brightest no longer contributing to the same systems that seek to destroy or enslave them.
Redundancy, scalability, decoupling, resilience, best possible handling of errors, cost optimization, etc. may be more important at the scale Netflix operates at.