From what I remember, the way I approached it was to have the client stepping the game world forward every frame based on whatever the current state was of all the players.
So for example if a player is going left, every time the game steps forward one frame move the player left.
Then when a message comes from the server with the current game state, update everything to the server positions and then step your game world forward to now.
So a simplified example - we have one object in our game that has an x position and a delta x to move it.
Local
Frame x server_message info
0 ____ no message no real idea of what the game state is
1 ____ no message
2 ____ no message
3 ____ no message
4 ____ no message
5 100 Frame=5 x=100 dx=5 we can start displaying where things are
6 105 no message
7 110 no message
8 115 no message
9 120 no message
10 125 no message
11 126 Frame=7 x=110 dx=4 (x = 110 + 4 * 4)
So we set our local state to what the server says and then step forward by 4 frames to get what we think the current state should be.
12 130 no message
13 134 Frame=6 x=105 dx=5
We ignore this message from the server as it's out of order (we've already received frame 7)
14 138 no message
15 142 Frame=15 x=142 dx=4
No need to do any stepping forward from this server state as we are in sync.
One thing to be careful of is that normally it's not just a case of saying it's 4 frames so I just multiply everything by 4 to get the new positions. With physics engines and more complete calculations you probably want to step the world forward frame by frame to get a better simulation of where everything is.
Hope that all makes sense.