Wired for sound: how SIP won the VoIP protocol wars
Updating: We are in the final phase of the 2019 winter break, which means that most Ars home office telephones can remain inactive for a few more days. As such, we have brought back some classics from the archives – the last one is this look at how SIP (Session Initiation Protocol) once won the VoIP protocol wars. This story first appeared on December 8, 2009 and appears unchanged below.
As an industry grows, it is very common to find multiple solutions that all try to meet similar requirements. This evolution dictates that these proposed standards go through a phase of selection – over time we see some becoming more dominant than others. These days, the Session Initiation Protocol (SIP) is clearly one of the dominant VoIP protocols, but that clearly did not happen overnight. In this article, the first of a series of in-depth articles exploring SIP and VoIP, we will look at the most important factors that led to this result.
A brief history of VoIP
Let’s go back to 1995 in the days prior to Google, IM and even broadband. Mobile phones were large and bulky, Microsoft had developed a new Windows interface with a “Start” button, and Netscape had the most popular web browser. The growth of the internet and data networks led many to realize that it is possible to use the new networks to meet our voice communication needs, while significantly reducing the associated costs. The first commercial solution of Internet VoIP came from a company called VocalTec; with their software two people could talk to each other via the internet. You could use a 28.8K or 36.6K modem to call an ISP locally and talk to friends, even if they lived far away. I remember trying this software, and the sound was definitely below acceptable quality. (It often sounded like you were trying to talk while immersed in a swimming pool.) However, the software successfully connected two people and introduced a real-time voice call for a network with limited bandwidth.
It was immediately clear to the first VoIP implementers that there are different differences between the telephone network and the data network. One is the design for exchanging messages. The telephone system works in circuit switch, where a circuit is the full path between two end points. It is therefore possible to guarantee a single path for all messages in a single communication. The data network works with packages, with different hops along the way helping to route the packages to their final destination, and this path can change from one package to another. Because of this structure, the data network cannot guarantee that the packages of a single session will follow the same path. VoIP therefore required some new innovations before it could really get off the ground.
To make a call, you need a VoIP signaling protocol. The term “signaling” comes from the world of circuit-switch telephone communication. In this system we have sent signals from one end to the other to communicate and allow us to talk over great distances. The role of a signaling protocol is to define the way these messages are structured and the rules with which we can start, configure and end a conversation. It is worth pointing out that signal messages do not contain the voice that someone hears (the media of the call). The signaling protocol can contain the media streams of information and their attributes, but the speech itself in a voice call is not a signaling message. If you are looking for a very high-level statement, consider signaling, since the messages a device sends when you call or hang up the phone.
So the race was busy making a new signal protocol. Some of these protocol specifications could be implemented by anyone and others were solutions developed by the supplier. And that race is still not completely over, because we are constantly seeing new proposals that try to convince everyone that there is a better way to do things. A VoIP signaling protocol must show how it integrates with the data network; this includes aspects such as defining a method for locating the communication devices, specifying server behavior, introducing new services and security design.
SIP protocol design
SIP is an Internet Engineering Task Force (IETF) protocol and as such is designed as an open internet protocol. The first release was in 1999, defined by RFC 2543, but the first versions date from 1996. Some of the definitions were revised later in 2002 by RFC 3261.
Let’s look at a simple SIP request:
INVITE SIP: [email protected] SIP / 2.0
Via: SIP / 2.0 / UDP home.mynetwork.org; branch = z9hG4bK8uf35f
To: Jon Stokes
From: Gilad
Call ID: [email protected]
CSeq: 59164 INVITE
Contact: gulp: [email protected]
Max Forward: 70
SIP is text based. Note that the addresses are very similar to e-mail addresses. Although SIP can support telephone numbers, the basic idea is that the addresses do not have to be telephone numbers, just as you would not expect your e-mail address to look like your home or work address. A SIP message can look like the following (partial) example:
GET / reviews / HTTP / 1.1
Host: arstechnica.com
User-Agent: Gecko / Firefox / 3.5.5
SIP is therefore quite similar to HTTP. The first line is the request line, which contains information regarding the type of request (GET in HTTP and INVITE in SIP for these examples) and the intended address, while the following lines are headers with additional information. Of course, responses in SIP are very similar to HTTP responses. The idea is to use the structure of one of the most popular internet protocols and make it easier for software developers and network administrators to work with SIP.
These attempts to make SIP as simple as HTTP have worked out to some extent, but the requirements for SIP addresses are more complex than HTTP, so the protocol is more complex. For example, it is a basic requirement in SIP to be able to have two-way symmetrical communication, whereas a typical HTTP scenario is a client making requests to a server and sending a response. Even without prior HTTP knowledge, learning this message structure is a very simple task.
For those who are wondering, the above SIP example is the first package you can send when calling from a SIP phone to Jon Stokes, Ars Technica’s deputy editor. I will not comment on the technical details of the content of the message because this is the subject of a separate article.
Reuse and keep it simple
The role of a signaling protocol is to define the way these messages are structured and the rules with which we can start, configure and end a conversation.
Another important factor in the design of SIP was the decision to reuse other existing internet standards as much as possible. Address location uses DNS, user authentication uses HTTP digest authentication, setting the call media streams uses the Session Description Protocol (SDP), encryption uses TLS and, if applicable, users send each other XML information. This integration helped further establish SIP as part of the world of internet protocols, and suppliers were able to reuse existing implementations in their SIP applications. On the other hand, in some cases the IETF had to make additional definitions in other protocols to meet SIP needs.
Keeping the complexity of the servers, especially the proxies, along the calling path as minimal as possible is also an emphasis in the design of SIP. SIP Proxies route the messages between the calling parties. The proxies defined in the standard are not aware of the call status, but rather work at the transaction level and can also be stateless. This helps with scalability, because fewer devices can handle more calls. To do that, the protocol itself was separated into different layers, a common practice that programmers use to break down a complex system. This design helps to further simplify SIP and to implement it more easily. Sometimes maintaining this minimum status forced some limitations (and later some changes to the protocol), but these by-products were kept to a minimum.
Finally, and perhaps most importantly, SIP was not only built as a replacement for the telephone system. It allows extensions and it depends on them to offer additional services that go beyond simple calls. For example, you can use SIP to maintain user status information in an IM client and to set up IM sessions. Another extension allows the transfer of a call to a third party, something that was simply not defined by the basic specification. This is possible thanks to the fact that SIP supplies the necessary basic structures and limits these structures only when necessary. SIP defines the concept of “dialogue”, which is a two-way communication, but does not limit dialogues to calls. Two-way communication also includes setting your IM status and receiving updates from your IM friends. Extensions can also easily define new request or response types and new headers if needed.