Standards for Videoconferencing

Release Date:2004-12-15 Author:Sun Mingjun, Sun Zhibin Click:

For videoconferencing, networking protocols and its standardization are always playing a key role no matter which development stage it is in. The protocols covers all parts of videoconferencing system networking from overall architecture design to the definition of a single parameter. It can be said that only when the relevant protocols are standardized can products be made, services be interoperated, and the videoconferencing field be developed.

    Reviewing the development course of videoconferencing, it should be noted that the standards for conferencing have changed along with the development of communication networks. The standards developed by the International Telecommunications Union (ITU) have been evolving from H.320 based on circuit switching, to connection-oriented H.321 based on packet switching, and then to nonconnection-oriented H.323 based on packet switching. At present H.323 has already found more and more applications.

    Meanwhile, owing to the flexibility and freedom of the packet switching network, more standard organizations are paying attention to videoconferencing protocols research and making. For example, the Session Initiation Protocol (SIP), made by the Internet Engineering Task Force (IETF), is now a hot topic.

1 H.320 Protocol
H.320 is the first videoconferencing protocol, proposed by ITU-T in 1990. It is the most matured protocol for conferencing at present. It supports the Integrated Service Digital Network (ISDN), E1 and T1, working at 64 kb/s-2 Mb/s. By the year 2000, nearly all videoconferencing systems from different vendors had supported
H.320, and now even a lot of videoconferencing systems based on IP also support it. As the first protocol suite developed for a specific network, H.320 specifies the main technological patterns of the ISDN-based videoconferencing system, offering the basic guarantee of the interoperability of videoconferencing services and interconnection between products from different vendors. H.320 is a milestone in the history of videoconferencing development, and made a foundation for its development and popularization.
An H.320 system mainly consists of Multipoint Control Units (MCU) and terminal equipment. Although it was a perfect system when it came into being, now constraints and problems come out since it is a system based on circuit switching.


    (1) Impacts Related to the Connection Modes of Circuit Switching
    Circuit switching is connection-oriented with stable transmission rate, little and stable time delay and low bit error ratio, so the quality of the videoconferencing service can be easily ensured. However, circuit switching has a fixed connection, which may cause users’ relatively high cost in line usage. The E1 leased line or ISDN 2B+D is usually used for use access, where the leased line ensures stable and quality video signals at the cost of low network utilization rate while ISDN is vulnerable to the line quality and demands high line synchronization. Therefore, both them limit the scale and expansibility of videoconferencing.

    (2) Impacts Related to the Networking Modes of Circuit Switching
    Since there is no virtual circuit connection in a circuit switching system, so the exchange of information (including audio, video, data and signaling) between a terminal and the videoconferencing system is implemented just through the only circuit. A star topology with a single MCU or two-level cascaded MCUs are accordingly adopted. The former means to adopt one MCU and multiple meeting terminals to make up a network with star structure. When the number of videoconferencing terminals exceeds the ports of an MCU, the MCU cannot be connected with all terminals. So cascaded MCUs are used to form a Client/Sever structure to expand the network capacity. But the stability of the network is bad since each client MCU has a fixed connection with the server MCU.

 

    As for the networking topology, this single star structure will cause instability of the whole network once there is something wrong with the server MCU. Three-level cascaded MCUs cannot be employed in this system due to the BAS code length limitation. Even if the three-level cascaded structure could be implemented in some special ways, the demands for the minimum time delay and synchronization would not be met, that is to say, practically the network cannot be expanded.

    In sum, H.320 is developed on the basis of circuit switching, so it can only support videoconferencing with limited terminals, and cannot be expanded to support interactive multimedia communications.

2 H.323 Protocol
With the development of packet switching, ATM was greatly promoted in a certain period of time. So ITU-T has issued two ATM-based protocols: H.310 and H. 321 since 1995. However, as time goes on, the possibility of "ATM to desktop" becomes lower and lower and so does the utilization of H.310 and H.321.

    In 1996, ITU-T introduced H.322 and H.323 for LAN. After four-year development, the fourth version of H.323 brought forward in 2000 gained the support of most videoconference manufactures. H 323 is not a single protocol, but a complete protocol suite for real-time multimedia applications in the IP environment. It has strict and perfect specifications on call establishment, call management, media transmission format and other aspects.

    H.323 helps develop a lot of multimedia applications that have nothing to do with the bottom transmission network, such as the videoconferencing, multimedia monitoring, multimedia production scheduling, remote enterprise training and education, multimedia call center, online IP telephone, online IP fax, online Video on Demand (VoD) and video broadcasting. H.323 can also be used to integrate different services into a single transmission network. By adopting the advanced TCP/IP technology, the H.323 system can greatly reduce the user terminal cost and the leased line charges, and possesses very high performance-cost ratio while offering the same performance and more functions.

    Basically H.323 is an open standard system developed on the basis of taking both the traditional PSTN call flow and the characteristics of IP network into consideration, and it represents the main development trend of telecom multimedia services. It succeeds in absorbing experience on networking, interconnection and operation, so an H.323 system can interconnect and interwork with PSTN, narrowband video service network and other data networks. Basically, VoIP operators who have H.323 systems can inherit the management and operation models adopted by traditional operators, which is quite important for the VoIP network construction.

    Though H.323 has succeeded in VoIP services, there is no successful application in videoconferencing system networking and operation yet. While H.323 inherits the experience of networking, interconnection and operation of the H.320 system, it also keeps many weaknesses of H.320 caused by the limitation of circuit switching network, such as small capacity and poor scalability.

    In China, H.323 VoIP and video service networks are generally the nationwide networks, requiring multi-layer and multi-domain networking, covering hundreds of cities and with traffic of hundreds of millions of minutes per month, so the scalability and stability of the network are highly demanded. For establishing such a nationwide network, the H.320 technology is not competent, and the H.323 technology cannot fulfill the task either, because it inherits too much from H.320.

    In the traditional conference system, since audio, video, data, signaling are all exchanged through a circuit line, MCU exists as a device, and the control module and media module who are functionally independent to each other are physically combined, which is the main shortcoming of the H.320 system. However, in an H.323 system, media and signaling use two separated channels, but it still runs in the old way of H.320, so it cannot take the advantage of packet switching.

 

    In fact, the multipoint controller is mainly in charge of the control of the videoconferencing, so the amount of information it deals with is not much, and software may be used to control a lot of terminals. However, the multipoint processor is mainly in charge of media handling, and needs to deal with a large number of media flows, so hardware is necessary. The hardware should be close to the user’s end as much as possible, and the number of terminals to be handled may vary. Such an H.323 videoconferencing system can not only meet the demands of layered telecom network, but also be in accordance with the current understanding of NGN, i.e., the separation of control and media. ITU is promoting this research in order to adopt H.248 for communication after the multipoint controller and the multipoint processor are separated.

3 SIP Protocol
SIP, the abbreviation of Session Initiation Protocol, first emerged in the Internet application in 1996, and proposed by the Multiparty Multimedia Session Control Workgroup of the IETF. In June of 2002, the SIP Workgroup of IETF released the upgraded version of RFC3261 on SIP framework and mechanism. Present applications are all based on RFC3261 recommendations. SIP defines the signaling mechanism of multimedia communication and conferencing, and involves other IETF protocols including HTTP, SDP, MIME, RTP and RTCP. Similar to WWW, SIP makes use of the Internet structure, offers services through intelligent SIP terminals, and can implement dynamic networking by using Uniform Resource Identifiers (URI). A SIP system can be divided into 4 parts: the user agent, SIP agent server, relocation server and SIP register server according to logic functions.

    A SIP terminal usually includes the User Agent Client (UAC) and User Agent Server (UAS). UAC is used for initiating a request, while UAS for accepting and answering the request. The registration and location of a SIP terminal may make use of such network resources as the register server, agent server and relocation server, and its name and address may be obtained by the location server, Dynamic Host Control Protocol (DHCP) server, Electrical Numbering (ENUM) server, Telephony Routing over IP (TRIP) server and Domain Name System (DNS) server. Strictly speaking, SIP is a signaling standard that supports real-time multimedia applications. It adopts text-based coding, which makes its system have good flexibility, scalability and cross-platform compatibility in application, especially in point-to-point application environments.

    SIP is designed as a general operation protocol for common session initiation and termination, so it doesn’t completely define the frameworks of videoconferencing and data conferencing systems. IP telephony and multimedia communication are just two instances of SIP applications. Besides the applications of IP telephony and multimedia communication, SIP also has other applications with quite simple session processes.

4 Development of Networking Protocols
From H.320 to H.323 and then to SIP, we can find out that the development track of videoconferencing system networking protocols is very clear: from close to open, and from "noble" to "people". In the era of circuit switching, the videoconferencing service was a high-end service, and was unknown for most people, but now we have already realized face-to-face communication through free software such as MSN. The development of networking protocols certainly brings more and more convenient communication with less and less charges. The videoconferencing system networking protocols should be regarded as a part of the whole networking architecture, except for in the circumstances of independent applications. In fact, the videoconferencing service is an indispensable part of the Next Generation Network (NGN).

    There always exist two different strategies for the development of NGN architecture. One emphasizes intelligent terminals and edge equipment, but simplifies network deployment, since the driving force of the point-to-point multimedia services lies in the terminal and edge equipment, which ensures the rapid development and prosperity of NGN services. The success of the Internet has proved this strategy is very important to the growth of multimedia services. The other strategy supports simple terminals and edge equipment, and intelligent network infrastructure. It thinks that only simple and unified terminals and edge equipment can well support large-scale operation management and control. The voice service provided by PSTN has shown that the business model guided by this strategy is reliable.

    To some extent, ITU H.323 and IETF SIP are exactly the technologies supporting the first strategy, while the Softswitch system supported by MGCP and H.248 protocols is the implementation of the second strategy.
 Obviously, a real NGN needs both diversified intelligent services and large-scale system operation management. However, as for videoconferencing services, both H.323 and SIP are based on intelligent terminals. Even the terminals of H.320 are far more intelligent than PSTN telephones. In fact, ITU is promoting the convergence of H.248 with H.323, which proves that there is no conflict between intelligent network equipment and intelligent terminals. On one hand, intelligent terminals ensure the service growth; on the other hand, the intelligent network makes large-scale network operation management and control possible.

    At present, for videoconferencing services, the networking model of H. 323 + H. 248 can not only meet the existing demands but also be fit for their evolution towards NGN. However, it is very difficult for H.323 to become a real videoconferencing standard since there are many unsolved problems with firewall traversal, expandability and mobility support.

    The videoconferencing service based on SIP has got the support of many videoconferencing system vendors who have declared more or less that their equipment supports SIP. One reason is that they are much concerned about videoconferencing networking technologies, and another reason is that the SIP system is simple and easy to develop. In fact, due to its simplicity and high efficiency, SIP has advantages over the control of non-IP telephony services. It can be flexibly combined with other PC applications to develop more creative services. Therefore, the good expandability of the SIP system makes it an effective supplement to the H.323 system. As for videoconferencing services, SIP has no such problems as faced by H.323 (e.g., traversal through firewall, expansibility), but it cannot be regarded as a networking standard for setting up a complete videoconferencing network because it abandons all ideas of H.320. In addition, since different services have different networking requirements, the bottom-to-top research mode of IETF may be a hidden barrier for the development of multiple SIP services.

    Though it is currently impossible to tell which protocol would become the networking protocol for the next generation videoconferencing, it is definite that the development direction must follow that of NGN, so we can tell some points which are clear to us. First, the terminal should be intelligent. With good flexibility, it can support multiple services. Second, the service network should support large-scale operation and management, offering reliable guarantee for services. Third, good expandability is necessary, for example, easily supporting equipment mobility on demands and launching new supplementary services. Last, QoS must be guaranteed. Any actions at the cost of QoS to gain profits will finally lost more.

    Obviously, H.323 and SIP cannot independently meet all the above demands. But fortunately, it is delightful that, the two standards and their related protocols are now learning from each other. It is believable that their functions will coexist and jointly enjoy their flourish in a durable period. It is likely that, under different circumstances and through different practices, the two systems will converge and give birth to a new protocol that is more suitable for the development of videoconferencing service in the future.

References
[1] Jiang Lintao. Evolution and Architecture of Video Communication System[J]. Telecommunications Technology, 2002 (9).
[2] Yin Kang. Brief Discussion on Multimedia Protocols H.323 and SIP[J]. China Multi-media Video Communication, 2003 (3).
[3] Sun Mingjun. Application of H.248 in Videoconferencing System[J]. Communications Weekly, 2004 (3).

Manuscript received: 2004-09-17