This article gets into technical details about internet security.
In order to use WhatsApp Web you are to scan a QR-code using your mobile phone. What would happen if someone standing behind you scanned the QR-code with his own mobile phone? In that case, instead of seeing your contacts you will see his. Although this appears to be benign, this can be used to con you to reveal sensitive information.
Think about it. What if the con man created in his own WhatsApp account the same contacts as you have? He doesn’t even need to know their phone numbers. He can register the same names under new phone numbers. He can then write on behalf of one of the pretend contacts, asking you for some sensitive info.
For instance, he could pretend to be you wife and ask you for your home wifi password. Equipped with that, he can stand under your house window and infiltrate your home network.
What I have described is just one variant of the con called shoulder surfing, which itself is under the umbrella of more general cyberattacks termed “social engineering”. So, how can we design a security system that is resilient to it? Imagine that the con man can see either directly, or through a hidden camera, all of the following: your keyboard, your phone display, and your desktop display. (I have described such cameras in the article on password shoulder surfing.)
Also, he can perform all actions as you with his own mobile phone, particularly scanning a QR-code shown on your monitor. The one thing he can’t do is type on your desktop keyboard or physically operate your phone.
In such unsecured environment, it would not be safe to display a symmetric encryption key as a QR-code. That’s because the shoulder surfer can steal it by scanning your monitor from afar. Instead, private information should be exchanged using a Diffie-Hellman (DH) scheme. In this case, the QR-code can display the DH public part of the web-page side. The mobile phone can send its DH public part through a server tunnel.
However, how would the web page know that the DH data that it has received belongs to the legitimate mobile phone, and not shoulder surfer’s mobile phone? Here the physical differentiator between the legitimate user and the shoulder surfer comes to the rescue: the legitimate user must type on the keyboard a checksum of the data that his phone sent.
This kind of checksum is called Short Authentication String (SAS) and should be easy to communicate. The mobile phone should display the SAS and the user must type it on the desktop keyboard.
Notice that the shoulder surfer also can see the SAS checksum. Also, because the checksum is short, with many trials he could generate a DH part that has the same checksum. He could then send it to the desktop webpage too. The solution is for the webpage to accept only the first DH data package that it receives and ignore any additional incoming data. If the user can’t type the correct checksum, the web page resets, and asks the user to repeat all actions from the beginning.
The SAS checksum could be merely the first several characters of a SHA-256 hash code of the DH part that he sent, but an improvement would be to convert it to digits. Digits are much easier to copy and type. One way to do it, is to convert the hexadecimal number given by the first few bytes of the hash code into a decimal radix. (For instance the byte value 2B in hexadecimal is 2*16 + 11 or 43 in decimal.)
Furthermore, the SAS checksum should not be be displayed on the mobile device before the webpage has received the DH package from it. Otherwise, the onlooker can try constructing his own DH package that would have the same SAS, and try to deliver it to the webpage before the legitimate one arrives. Therefore, how shall the mobile device know that the webpage has received its DH package? It can’t trust the server’s response because the server operator may be colluding with the shoulder surfer.
Consequently, the solution is for the webpage to prove to the mobile device that it was able to generate shared secret. One way to do this, is to have the webpage sign a fixed string “received” with HMAC with the shared secret as input, and then deliver the signature to the mobile device through the server.
However, another solution now becomes possible. Instead of using HMAC, the webpage can generate a random One-Time-Password (OTP), encrypt it with the key derived from the shared secret, and send it to the mobile device, through the server. Upon receipt of the random number, the phone will display it. Doing it this way removes the requirement that the webpage ignore any additional DH packages it receives, and simplifies implementation. If the encryption keys do not match, the operator of the physical keyboard will not type the right OTP code.
So, do we have the perfect solution? Not quite. In both variants we have a user experience (UX) problem. If the webpage responds to the onlooker’s mobile device, and not to the legitimate user’s mobile device, our legitimate user will think that there is a bug in the app. He will be waiting for the mobile app to display a code, but nothing is showing up. It this point the shoulder-surfing con man can lend a helping hand, and offer to type the code on the physical desktop keyboard. Alternatively, the user may leave the workstation to seek help from someone in the room, giving the chance to the shoulder surfer to type the code.
To solve the UX problem of waiting for the code to show up, we can calibrate a timeout. If the mobile app does not receive a timely response from the webpage, it must show a warning. Similarly, if the webpage doesn’t see that the user is typing in the code, it should restart the whole pairing process.
A timeout in network communication can happen due to legitimate reason of internet connection loss. It is unlikely, however, for this to happen between the two legs of communication. The first leg of communication, that of sending the DH package to the webpage from the mobile phone, if it succeeds, is the thing that causes for the webpage to prompt the user to type in the code. If there is a problem in network connecting in this initial stage, there is no “social engineering” vulnerability. The webpage then swiftly responds and either sends HMAC or encrypted OTP in the second leg of communication. The chance of legitimately loosing internet connectivity in that short interval is low, but not impossible for the con man to orchestrate by turning off local WIFI router quickly.
The ability for the attacker to cause a network timeout is the reason why both sides — the mobile device and webpage — must restart the device pairing process if something is off.
In conclusion, it possible to defend from shoulder surfing, even if the onlooker can see your keyboard, phone and the screen, even in real-time, but it does take careful software engineering.