I finished a project with a wonderful group of people and part of that involved working directly with WhatsApp, unofficially though, and through that I think I have gotten a good idea of how it works (I could tell from metrics when their daily deployments happened… if that is enough confidence) and the design is brilliant and stupid all at once… but importantly it is not something I would have designed; so I thought it might be nice to share how I think it works. Before we start, to be clear, I have no proof of this (except for a few thousand hours of working on it) and I never spoke to anyone from Meta… this could be all wrong.
Getting to the first tick
Let’s say Alice is sending a message to Bob, the message goes from her device to the WhatsApp server, where the metadata on the message is checked and ultimately the message is stored on their server. The message itself is encrypted, so if you could access the database of messages that is useless as only Bob can decrypt it, but the metadata is not encrypted. This does include information like who is involved in the conversation, message type (for example, image or voice note or edit or just a text message) and version info (and you can get rejected for incorrect version info). This (I assume) is stored encrypted at rest on WhatsApp’s side, but if you could use their tools, it would be available to you.
The second tick
WhatsApp now uses the established connection with Bob’s device and pushes the message to the device. The message and any attachments are now on the device that responds to WhatsApp and initiates sending the second tick to Alice. At this point, WhatsApp removes the message from their servers. This is why you cannot restore your messages from WhatsApp unless you have a backup because they do not keep them. This is so brilliant because it lowers disk space usage on the WhatsApp side and reduces the risk of a breach or someone asking WhatsApp for messages because they do not have them.
WhatsApp Web can restore your messages
If you have ever used WhatsApp web, you know when you connect to it you see all your existing conversations and messages; it is live info! Yet, I just told you that WhatsApp does not have this info on their servers… so where do these come from?
They come from the device! This blew my mind, but when WhatsApp Web starts, it uses the WhatsApp servers to establish a proxy connection to the device. The device then does an export of data and WhatsApp web imports that, and that is how you get the data.
(I do think Signal works the same way because if you do run a transfer to a new device, it needs to connect to the original device to ingest the messages).
What did surprise me and is the stupidest thing, the export format for Android and iOS is different! In other words, what you get (and how many messages) depends on what device you have, and what version of WhatsApp you have on that device. This means that WhatsApp web needs to support multiple import formats because there is no standardisation.
Once the import is done, the source device can be turned off and everything will still be available which tells us that WhatsApp web is storing all your messages in your browser (likely encrypted at rest).
End-to-End encryption
One of the most common questions I had when I talked about my work, was “Isn’t WhatsApp end-to-end encrypted?” and it is. There is transport-level encryption, like TLS in your browser and the message is encrypted in a way that only the recipient can decrypt; but once the message is delivered then it is in plain text for a bit while it is shown to the reader. It is also encrypted in a way for storage that the recipient can decrypt. This means that end-to-end encryption prevents a man-in-the-middle attack but does not prevent a man-at-the-end attack. A man-at-the-end is where you either do something which works directly with the WhatsApp app (accessibility tools are a great example of a legitimate example of this) or you build your WhatsApp client which lets you do anything you want.
FAQ
-
How does this work with multiple devices? WhatsApp stores the message on the server until it is delivered to all devices. This is likely why there is a limit on how many connected devices you can have at once; which is 5 at the time of writing… if you add a sixth then one of the others will be disconnected.
-
Are groups any different? No, they are not and again the message seems to remain on the WhatsApp server until each member of the group receives it.
-
Do messages ever die on the server? My gut says yes since there is some wordage in WhatsApp about logging in every 30 days, but I never tested this. It would make sense that they should expire.