In this article, Alexey Andrushchenko, an experienced Full-Stack developer, will reveal some of the features of using WebRTC and consider the advantages and disadvantages of this technology.
WebRTC (Web Real Time Communications) is a standard that describes the transmission of streamed audio, video, and content between browsers (without the installation of plugins or other extensions) or other applications that support it, in real-time. This technology allows the browser to be transformed into a video conferencing terminal. To initiate communication, simply open the conference's web page.
Consider the operation of the technology using the example of a call between two subscribers through a browser:
WebRTC codecs can be divided into mandatory (browsers that implement this technology must support them) and optional (not included in the standard, but added by some browsers).
To compress audio traffic in WebRTC, mandatory codecs (Opus and G.711) and additional ones (G.722, iLBC, iSAC) are used.
⮞ Opus
Opus is an audio codec with low encoding latency (from 2.5ms to 60ms), variable bitrate support and high compression, which is ideal for audio streaming over variable bandwidth networks. It is the main audio codec for WebRTC. Opus is a hybrid solution that combines the best features of SILK (Voice Compression, Human Speech Distortion Elimination) and CELT (Audio Data Encoding) codecs. The codec is freely available, developers who use it do not need to pay royalties to copyright holders. Compared to other audio codecs, Opus certainly wins in many ways. In a number of parameters, it surpasses quite popular low bitrate codecs, such as MP3, Vorbis, AAC LC. Opus restores the "picture" of sound closer to the original than AMR-WB and Speex.
⮞ G.711
G.711 is an obsolete high bit rate (64 kbps) voice codec that is most commonly used in traditional telephony systems. The main advantage is the minimal computational load due to the use of lightweight compression algorithms. The codec has a low level of compression of voice signals and does not introduce additional audio delay during communication between users.
G.711 is supported by a large number of devices. Systems that use this codec are easier to use than those based on other audio codecs (G.723, G.726, G.728, etc.). In terms of quality, G.711 received a score of 4.2 in MOS testing (a score of 4-5 is the highest and means good quality, similar to the quality of voice traffic in ISDN and even higher).
⮞ G.722
G.722 is an ITU-T standard adopted in 1988 and is currently free. It can operate at 48, 56 and 64 kbps, providing sound quality at the level of G.711. And likewise G.711 is obsolete. Supported in Chrome, Safari and Firefox.
⮞ iLBC
iLBC (internet Low Bitrate Codec) is an open source narrowband speech codec. Available in Chrome and Safari. Due to the high compression of the stream, when using this codec, the load on the processor increases.
⮞ iSAC
iSAC (internet Speech Audio Codec) is a wideband speech audio codec, formerly proprietary, which is currently part of the WebRTC project, but is not required to be used. Supported in Chrome and Safari. The implementation for WebRTC uses an adaptive bitrate from 10 to 52 kbps with a sampling rate of 32 kHz.
The issues of choosing a video codec for WebRTC took developers several years, as a result, VP8 and H.264 were included in the standard. There are also implementations of optional video codecs (H.265, VP9, AV1).
⮞ VP8
VP8 is a free video codec with an open license, featuring high video stream decoding speed and increased resistance to frame loss. The codec is universal, it is easy to implement it into hardware platforms, so developers of video conferencing systems often use it in their products. Compatible with Chrome, Edge, Firefox and Safari (12.1+) browsers.
The paid H.264 video codec became known much earlier than its counterpart. This is a codec with a high degree of compression of the video stream while maintaining high video quality. The widespread use of this codec among hardware video conferencing systems suggests its use in the WebRTC standard. Compatible with Chrome (52+), Edge, Firefox (deprecated for Android 68+), and Safari.
⮞ VP9
VP9 is an open and free video compression standard developed in 2012 by Google. It is a development of the ideas embodied in VP8 and was subsequently expanded within the framework of AV1. Compatible with Chrome (48+) and Firefox browsers.
⮞ H.265
H.265 is a paid video codec that is the successor to H.264, providing the same visual quality at half the bitrate. This is achieved with more efficient compression algorithms. This codec currently competes with the free AV1.
⮞ AV1
AV1 is an open-source video compression codec designed specifically for delivering video over the Internet. Supported in Chrome (70+) and Firefox (67+).
WebRTC does not provide a way for browsers to find each other. We can generate all the necessary meta-information about our loved ones, but how does one browser know about the existence of another? How to connect them?
A WebRTC signaling server is a server that manages the connections between peers. It is just used for signaling. It helps with enabling one peer to find another in the network, negotiating the connection itself, resetting the connection if needed, and closing it down.
WebRTC does not specify a signaling protocol, you need to develop it yourself or use ready-made solutions. Also, the transport for the signaling protocol is not specified. You can use HTTP, WebSocket or datachanal. Commonly used WebSocket in case it is based on persistent connection and it can transmit data close to real time.
We cannot get the names and characteristics of the cameras until the connection is established. If more than one camera is installed on the client system. For example mobile devices. We can only offer the user the choice of Camera 1 or Camera 2 but do not call these cameras (for example “Logitech”, “Front Camera”, “FullHD Camera”)
If the client connects a new device during the session, the web application will not be informed about this until the user refreshes the page. That is, if you have already opened the conference page and then connected a new USB camera, the application will not know about it.
For security reasons, the browser does not provide direct access to camera drivers.
Therefore, we cannot insist on the camera, choose the resolution, frame rate, and so on.
Also, we cannot do video post-processing, adjust brightness, mirror video, and other things that are usually included in the camera driver settings.
There is also no single standard solution for desktop sharing. You may have seen in video conferencing applications that when you start sharing a desktop, an additional conference participant is often created and stream the desktop or application window you selected. The problems that we face when working with cameras (the inability to specify the camera name and the inability to get characteristics of device) also apply to working with monitors when broadcasting the Desktop.
The API for generating SDP is asynchronous, so there may be situations when the parameters of the media stream described in the incoming SDP packet do not correspond to what the client actually sends.
There are two formats of SDP: Plan B used by Chromium-based browsers and Unified Plan used by Firefox.
Plan B has all media streams in the same format. If we do not use an external media server, then there is the possibility that some conference participants will not understand the format of our media stream and will not be able to display it.
The Unified Plan allows you to select a codec for each media stream.
For example, Encode the desktop broadcast with one codec and the camera broadcast with another.
You can teach the signal server to translate one SDP to another but its increases server loading.
How to live if there is NAT, if computers stick out under one IP address, but inside they know about each other by others? The ICE framework comes to the rescue - Internet Connectivity Establishment. It describes how to bypass NAT, and how to establish a connection if we have NAT.
This framework uses the STUN server. This is such a special server, referring to which you can find out your external IP address. Thus, in the process of establishing a P2P connection, each of the clients must make a request to this STUN server in order to find out its IP address, generate additional information, IceCandidate, and exchange this IceCandidate using the signaling mechanism. Then the clients will know about each other with the correct IP addresses, and will be able to establish a P2P connection. However, there are more complex cases. For example, when the computer is hidden behind double NAT. In this case, the ICE framework mandates the use of a TURN server.
This is such a special server that turns the client-client connection, P2P, into a client-server-client connection, that is, it acts as a relay. The good news for developers is that regardless of which of the three scenarios the connection was established, whether we are on the local network, or whether we need to contact the STUN or TURN server, the API technology for us will be identical. We simply specify the configuration of the ICE and TURN servers at the beginning, indicate how to access them, and after that the technology does everything for us under the hood.
Here we face some more difficulties. The first is the need to have STUN and TURN servers, respectively, the cost of their support and maintenance. The TURN server, although it is a simple proxy server and does not process video, must have a high-speed Internet connection in order to distribute a real-time media stream to all conference participants.
To date, WebRTC is the second most popular video communication protocol after the proprietary Zoom protocol and is ahead of all other standards (H.323 and SIP) and proprietary (Microsoft Teams and Cisco Webex) protocols.
WebRTC technology has had a strong impact on the development of the video conferencing market. After the release of the first browsers with WebRTC support in 2013, the potential number of video conferencing terminals around the world immediately increased by 1 billion devices. In fact, each browser has become a videoconferencing terminal with basic capabilities for participating in videoconferencing.
Use in specialized solutions
The use of various JavaScript libraries and cloud service APIs with WebRTC support makes it easy to add video support to any web projects. In the past, real-time data transmission required developers to learn how the protocols worked and to use the work of other companies, which most often required additional licensing, which increased costs. Already, WebRTC is actively used for organizing video contact centers, holding webinars, etc.
WebRTC and HTML5 were a death blow for Flash technology, which was already going through its far from the best years. Since 2017, the leading browsers have officially stopped supporting Flash and the technology has finally disappeared from the market.
Google Meet is an instant messaging service, as well as video and audio calls, released in 2017 by Google. Chromium-based browsers (Google Chrome, etc.) use many hidden WebRTC features that are not documented and periodically appear first in its Meet solutions (as in its Hangouts predecessor). So it was with screen capture, background blur, support for hardware encoding on some platforms.
Jitsi Meet is an open source app released by 8x8. Jitsi technology is based on the Simulcast architecture, which means unstable operation on weak communication channels and high connection speed requirements on the server side. Allows you to conduct web conferences only in a browser and does not have full-fledged client applications for collaboration, conferences with a maximum of 75 participants are supported (up to 35 with high call quality). To fully use Jitsi in a corporate environment, you need to independently develop and install additional software.
BigBlueButton is free video conferencing software. The developers place a special emphasis on distance education (there are such functions as an interactive whiteboard, displaying content, supporting surveys, etc.). Supports web conferencing up to 100 participants.
Contrary to popular belief, Zoom does not use WebRTC technology to transmit and decode media data. This was done to save server resources. On the browser side, other web technologies are involved - low-level WebAssembly and WebSocket. When using such non-standard approaches for transmitting a video stream, some participants may experience problems with picture quality.
WebRTC is revolutionizing real-time communication by allowing voice, video, and data to be exchanged directly in browsers without the need for plugins. Its ease of integration and broad compatibility make it ideal for modern communications solutions. Ready to implement WebRTC in your project? Contact Moravio today to find out how our team can provide a fully customized communications platform that meets your business needs.
Recommended Reads for You
New blog posts you may be interested in
We help startups, IT companies and corporations with digital products.