Which layer for encryption? Which for VPNs?

As more people work from home, VPNs are in peak season. Recently I discussed with a network engineer the advantages and disadvantages of IPsec VPNs versus TLS-based VPNs.

With this discussion in mind I started to gather some of the relevant information, summarize and structure it to provide input on the following questions:

Which is the best layer for encryption?
Which layer and technology should be used for virtual private networks?

There is a very good and comprehensive overview from Klaus Schmeh in his book “Kryptografie” (cryptography). Unfortunately, this 900-pages encyclopedia is only available in German, so I hope that the information here, that in parts was taken from that book, will also benefit readers who will not be able to read said book due to the language barrier.

Table of Contents

Signing only on layer 7

As you might know, a digital signature is very similar to asymmetric encryption, because the hash of the to-be-signed entity is encrypted with the signer’s private key. Someone who knows the signer’s public key can then verify the signature and check if the document was sent by the real person. Also, the receiver will have a proof that the sender indeed sent this document and the sender cannot claim anything else (non-repudiation). In a discussion about the best layer for encryption, we should therefore not forget digital signatures.

Obviously, you would want to have the whole document signed and not just chunks of it. For this reason, signing only makes sense on layer 7 – only on that layer you can ensure that data that was sent in packets over the network will be completely reassembled again. What’s more: Even if you did sign individual packets, no one would probably ever store these packets for long (unless someone saves everything as a pcap…).

Encryption possible on “all” layers

Now, when it comes to encryption, it is a different story. There are different advantages and disadvantages on each respective layer when chosen for encryption.

As a rule of thumb you can say: The lower the layer chosen for encryption, the better you can encrypt everything that goes beyond the bare minimum need-to-know for any intermediary. This is also known as the principle of minimum disclosure. The reason is simple: If you start your encryption on a lower layer, all layers above will automatically be part of the encryption and remain unreadable up to the point where data traverses over to the next layer.

On the other hand, you gain flexibility when choosing a higher layer for encryption, with layer 7 being the most flexible one. It’s also easier to maintain end-to-end encryption: While layer 2 encryption ends abruptly at a layer 3 router, you can ensure data protection all the way to the destination when choosing encryption to be performed on the application layer (layer 7). Let’s have a closer look at the different options.

Encryption on layer 7

As already stated, this layer is your go-to option when flexibility is priority. Any logic can be implemented in software, as long as both sender and receiver support it. The receiver does not need to decrypt data in the moment of reception, but can simply store it in encrypted form, or even forward it without having had a look . This is especially important for e-mail gateways: In most cases you would not want to have your gateway already decrypt all mail and then send it in cleartext (or re-encrypted) to the intended receiver. Instead, you would want to have your mail provider’s or your domain’s e-mail gateway receive any mail destined to your account and simply forward it without taking a look first. Now, sure enough most emails are anyway not encrypted, but that’s a different story…

Apart from the disadvantage that both the sender’s and the receiver’s software needs to support the chosen form of encryption, it should also be mentioned that the principle of minimum disclosure is not fulfilled: Encryption on layer 7 means that all information on lower layers remains in the clear: IP addresses (layer 3) for example will allow an attacker to understand which person is communicating with whom, with ports (layer 4) even software solutions involved in the information exchange could be identified (e.g. TeamViewer usually used destination port 5938).

Encryption on layer 4

While in theory the ISO OSI model distinguishes 7 different layers, in practice there is very little distinction between layers 5 to 7. Often people will simply talk about “layer 5+” and count all of them as “application layer”. This is then similar to the internet protocol suite. Going below the application layers has the advantage that for the running program encryption happens transparently: The application itself does not have to support encryption in any way, it is simply handled “below”.

TLS is actually an interesting protocol, because when trying to fit it into the OSI model, it lies somewhere between layer 4 and 5. While there are different opinions on where to locate that protocol exactly, it is clear that it remains above layer 3. This means that the IP address of a packet will remain visible after encryption because it is required for routing. The port might be encrypted, though.

One disadvantage is that TLS only works with TCP (layer 4), but many important protocols are running on UDP: SNMP, DNS and DHCP are three of the most well-known examples. There is a UDP-enabled alternative for TLS which is called DTLS. It basically adds all features that are necessary to upgrade UDP to TCP: Instead of UDP-style “fire and forget” it takes care of packet loss and retransmissions etc. Since we are dealing with two popular protocols on layer 4 (TCP and UDP), securing this layer also implies touching two protocols, which can be seen as a disadvantage.

This is different for layer 3, what we will be looking at next.

Encryption on layer 3

When implementing encryption on layer 3, it means that we are touching the IP protocol – changing one protocol is less than the changes for two protocols on layer 4 which can be seen as an advantage. Another point to consider is that attacks on the IP layer can be pretty dangerous: IP spoofing, source routing, ICMP probing and others are relevant techniques that an attacker could use to gain a foothold in your network. Only encryption on layer 3 can effectively prevent these attacks, choosing a layer above will not help.

Using IPsec is a popular option to protect data exchange on layer 3. IPsec comes more or less “bundled” with IKE for the initial key exchange. The combination of several not exactly straightforward protocols makes encryption on this layer rather complex to implement. Moreover, the IP address cannot be encrypted on this layer, either – because without it routing will not work.

This last issue can be circumvented when going down to layer 2.

Encryption on layer 2

Finally, if you go down all the way to layer 2, IP addresses are not a concern anymore – simply because IP is not yet in place at layer 2. However, on layer 2 you are simply hopping from switch to switch, there is no end-to-end encryption. Whenever traffic is hitting a router (layer 3), traffic will be unencrypted and readable.

Encryption at this layer mainly makes sense for infrastructure providers who can fully decide by themselves how to handle encryption among their own devices. Since the whole network on this layer is most likely managed by the same provider, they enjoy a lot of flexibility here. For example, they might decide to keep a wired connection without encryption and only employ it on wireless routes where eavesdropping is more likely.

From a protocol-perspective, encryption on layer 2 will probably extend PPP. For the initial key exchange, CHAP (or the Microsoft implementation MS-CHAP) or EAP can be used. Afterwards, data can be exchanged using ECP (or the Microsoft implementation MPPE). When it comes to VPNs, PPTP & L2TP are popular choices.

WEP, WPA and Bluetooth are also located on layer 2, with their own implementations for encryption.

Encryption on layer 1

Encryption on this layer is very similar to layer 2. Since layer 1 anyway uses checksums for error detection and correction it might seem unnecessary to encrypt these hashes. However, on critical interfaces – e.g. for ISDN, GSM and UMTS – encryption is indeed in use to make man in the middle attacks and eavesdropping more difficult.

IPsec vs TLS

For the “usual” company (i.e. not an ISP or internet backbone maintainer), IPsec on layer 3 and TLS on layer “4.5” are pretty much the two practical options to choose from. As I wrote in the introduction to this post, your decision will not only depend on specific needs (after all, both solutions provide strong encryption), but also will be influenced by your current infrastructure and by how tech-savvy both your own IT team, as well as your end users are. This is especially true when building up VPNs, as we will discuss later.

Independent from specific implementations like VPNs, just looking at the two solutions itself, you can say: IPsec is more complex, since it is basically a bundle of ESP and AH for the data exchange and IKE for the initial key exchange. In the process of building this bundle, the IETF had to reconcile very different interests of several parties. Well-known cryptographers Bruce Schneier and Niels Ferguson in December 2003 described IPsec as:

“a disappointment–our primary complaint is with its complexity–it is the best IP security protocol available at the moment.”

Compared to TLS, IPsec is more difficult to understand, implement and maintain, but it is also more flexible.

One remark on IPsec being a “layer 3 encryption”: Since IKE is a “layer 5+” protocol, IPsec implementations are actually not pure layer 3 solutions. While this breaks with the ISO OSI layered paradigm of staying within your layer, it is not relevant in practice. Moreover, as we said before, TLS itself cannot be clearly put into one specific layer either, so if anything, this would be a “tie” between the two technologies.

Since however TLS is in fact “somewhere in between” the layers 4 and 5, it does make integration in existing infrastructures easier. From a firewall perspective, ports used by (D)TLS are standard ports (443 TCP / UDP) that will be opened by default anyway, while IPsec will most likely require adjustment at of your firewall rules: It uses port 500 UDP for IKE, and protocol – not port! – 50 (ESP) and 51 (AH) for IPsec.

TLS as higher-layer encryption protocol can also be adjusted more closely to your application’s specific needs and ensure end-to-end encryption. On the other hand, sometimes you might only want to encrypt the parts between two routers instead of all the way to the client machine, so this is then a scenario where IPsec would come into play.

IPsec is usually a good idea if all involved endpoints are homogenous and managed by one central entity, for example a company’s IT department. With IPsec encryption on layer 3, intranet applications will be used remotely just as they are used internally, which is obviously very convenient for the end user. On the other hand, if you are dealing with a very heterogeneous endpoint infrastructure, providing a TLS-enabled web portal might be the best option. This is for example the case for a B2C business like a bank where the clients’ computers will differ greatly among each other and still should be able to access the bank’s online banking services.

IPsec VPN

Now, when talking specifically about VPNs, we need to be aware that IPsec does not automatically imply that communication will be encrypted. To understand this let’s have a quick look at how IPsec will be set up.

As for any encrypted connection where session keys or no pre-shared keys are in place, you first will need to have the key exchange. As written before, IPsec uses IKE for this. The protocol is completely relying on ISAKMP messages for communication. ISAKMP is basically a toolkit developed by the NSA to build network protocols that support encryption.

IKE will create so-called “security associations” (SA). An SA defines the different parameters for the two VPN endpoints that will be used to set up the VPN tunnel. There are five parameters that need to be negotiated and that can be remembered by their initial letters: H.A.G.L.E. Since saying “HAGLE” sounds a lot like “haggle”, which obviously is a synonym for “negotiate”, I find it quite a good mnemonic.

Hash: Which hashing algorithm should be used (e.g. MD5, SHA)
Authentication: Should parties authenticate using asymmetric cryptography or with a pre-shared key?
Group: Which Diffie-Hellman (DH) Group to use for the key exchange (e.g. DH2, DH19, DH24)? Each group defines different “sizes” of the primes to choose and if DH with modulo or with elliptic curve cryptography (ECC) should be used for calculation. Both has security implications.
Lifetime: How long should the SA be valid before it needs to be re-negotiated? Since site-to-site VPNs are often up for several years, it is important to re-negotiate the SAs to avoid successful brute force attacks.
Encryption: Which algorithm should be used to encrypt the VPN tunnel (AES, DES, 3DES)?

These settings are negotiated in a first phase using slower asymmetric cryptography. With this (bidirectional) “ISAKMP-SA” – including authentication and an initial exchanged key pair – in place, we can then start with the second phase. That will build up further (unidirectional) SAs using faster symmetric cryptography for the upcoming data exchange.

After this two-phased initiation, IPsec will then start with the actual data exchange. There are 2 protocols, each with 2 modes that can be used:

Protocol 1: Encapsulating Security Payload (ESP)
- For confidentiality (encryption), authenticity (shared key for the HMAC) and integrity (HMAC):
Protocol 2: Authentication Header (AH)
- For authenticity and integrity

Both protocols come with 2 modes: Tunnel Mode (wrapping your actual IP packet into another “tunnel packet”) and Transport Mode (keeping the original header, i.e. not creating a second packet). The following is the description for these modes when used in combination with ESP. In real-world situations, AH does not play an important role anyway because of the lack of confidentiality (no encryption).

Tunnel Mode (description for ESP): Choosing this option with ESP allows to maintain “minimum disclosure”, because you basically put an IP packet in another IP packet. This means that the “inner” packet containing your actual message and the target’s IP can be completely encrypted. Only the header of the outer packet that contains the routing information to the target’s VPN gateway will be visible for anybody listening. The real target’s destination will be part of the inner packets header and does not need to be routable via the internet. ESP with tunnel mode is what people understand by “Site-to-Site IPsec VPN”. It is also by far the most popular choice for IPsec VPNs.
Transport Mode: Here, the header of your packet will remain unencrypted, only the message part will be encrypted. This allows routing to a specific target workstation without the need for a VPN gateway in between. ESP with transport mode is hence what people understand by “End-to-End IPsec VPN”.

So, why do we have AH at all, if ESP also provides authenticity and integrity but additionally confidentiality?

Well, AH does provide one thing that ESP does not and this is: Its HMAC, i.e. its integrity check spans across the whole IP packet, including the outer IP header. This means that manipulations of the outer IP header can be spotted. ESP does not include the outermost header into its integrity check and manipulations would not immediately be noticed. Then again, thanks to the encrypted channel in ESP, changing just the outer layer would not bring any practical benefit to an attacker. Of course, a packet could be stolen or rerouted, but this is also the case for AH and with an ESP packet the contents at least could not be read.

Moreover, since the AH HMAC contains the outer header, NAT traversal is only possible with ESP. Due to the fact that NAT changes the outer header, it will invalidate the HMAC for AH. For all these reasons, AH is mostly obsolete nowadays – but it does have a feature that ESP does not have. In this YouTube video there is an easy-to-understand whiteboard that shows the difference.

TLS VPN

When using TLS for your VPN, it will have one major advantage and this is ease of integration: Almost all firewalls in your infrastructure will allow the TLS default port 443, so very little adjustment is needed to set things up. Another advantage is that TLS is above the IP layer, so it does not care about IP addresses- as a consequence, NAT will not be an issue here. Since TLS is quite a well-known protocol used in many situations I will not go into the details of its inner workings.

Where things get complicated is that there is not a clear definition of what is exactly meant with a “TLS VPN”. In certain implementations TLS can be used to protect layer 2 VPNs. This however has the issue that PPP on layer 2 uses PPTP and L2TP for its VPN. These use UDP as layer 4 protocol, but TLS does not work with UDP. While DTLS could be used (as discussed above), many vendors use proprietary tunnelling protocols that might come with their own issues for implementation and security.

As you can see, it’s not immediately clear where implement encryption for your network traffic. There are many different options, and each comes with advantages and disadvantages. I hope this overview helps in understanding these alternatives a bit better and maybe also to support decision making.

Purple Serendipity

Mais les braves gens n'aiment pas que l'on suive une autre route qu'eux