Cloud-network convergence is a deep transformation of network architecture driven by service requirements and technology innovations. It involves both the cloud and the network: the cloud computing services need the support of strong network capabilities; and the network resources need to be optimized based on the concepts of cloud computing. With the increasing adoption of cloud computing services, network infrastructure needs to better accommodate the demands of cloud computing applications to ensure network flexibility, intelligence and maintainability.
Telecom Cloud Network's Requirements
A telecom cloud network has its own features and quite different requirements for cloud-network convergence from an IT network.
Diverse VMs with Different Communication Modes
Telecom cloud needs to support different communication modes simultaneously and has complicated network functions.
A signaling plane NE carries little traffic and consumes little bandwidth; it has strong control function and uses lots of virtual machine (VM) CPU resources. OVS VM ports are recommended for communication, such as vPCRF. A user plane NE handles large traffic, has high bandwidth demands and is latency-sensitive, such as vFW. SR-IOV deployment is recommended. A hybrid NE has both the features of a signaling plane NE and a user plane NE. Multiple different ports need to be deployed on the same VM. That is, SR-IOV ports and OVS ports coexist on one VM, such as vGW (the virtualized form of PGW).
To meet special requirements of operators, some NEs need to be deployed in physical servers. Thus, it needs to support bare-metal server’s communication. Some NEs need to be deployed in containers, and therefore container network communication needs to be supported.
High Requirements for Reliability and Disaster Recovery
Unlike IT NEs, telecom NEs have strict requirements for reliability and disaster recovery and need to achieve the carrier-class 99.999% reliability. This requires the telecom cloud network to provide reliability and disaster recovery protection at all levels, at least 1+1 active/standby redundancy for devices including servers, network cards, switches, switch links and gateways.
In addition, the telecom cloud network needs to provide capabilities for highly efficient backup & recovery and remote disaster recovery. For the virtual layer, it needs to support automated network adjustments for VM rebirth and self-healing.
Strict Requirements for Fault Detection and Switchover
The telecom cloud network has ultra-high requirements for fault detection and device switchover time. For example, the service interruption time caused by network device version upgrades or active/standby switchovers should not exceed 50 ms. The network device is required to enable BFD mechanism to realize fast bidirectional link detection. When a fault occurs, the network can timely switch the service to the standby device and link. These features are not possessed by the IT cloud.
High Network Security Requirements
Ultra-high security requirements of telecom NEs extend from the physical network layer to the virtual network layer.
In general, strict network isolation is applied to a physical network. Some operators require multi-layer isolation. For example, the service network, storage network and management network are isolated at the first layer and firewalls are used to protect inter-network communication.
A service network is further divided into an exposed zone, a DMZ zone and a core zone. Different types of firewalls are used between different zones to implement layered protection. The management domain and storage domain are further divided into different network planes, among which, mutual access is strictly prohibited.
Special Requirements for Traffic Detection
The monitoring requirement of a traditional telecom network has been inherited by a cloudified telecom network. In addition, to realize automated O&M and network analysis, the telecom cloud requires real-time collection and analysis of signaling and data in the network. Driven by these two requirements, the telecom cloud network should support precise and automated data distribution and collection. In particular, traffic collection policies can be automatically adjusted during migration and rebirth of VM terminals.
High Requirements for Network Service Quality
Various communication modes coexist in the telecom cloud network, and different communication planes have different requirements for network service quality. In general, the signaling plane has higher QoS requirements than the media plane. Moreover, the media plane has different service levels for communication, such as VoIP and internet services. Therefore, the telecom network needs to support the QoS mechanism at the transport layer and assigns different service classes to different QoS levels so that high-priority services will be transported first in case of network congestion.
High Requirements for Bandwidth
The media plane of the telecom cloud has route-type NEs, such as vGW and vBRAS. When the growing 5G services and fiber-to-copper transition bring traffic explosion, the bandwidth requirements of these NEs increase exponentially. Meanwhile, these NEs are deployed on servers, and their traffic is finally transported to the backbone network or the internet by the telecom cloud network. This puts high bandwidth requirements on the telecom cloud network. Especially when there are service function chains, the same traffic will traverse within the DC of the telecom cloud two to six times, which results in multi-fold growth in traffic and brings huge challenges to the forwarding capability of the telecom cloud network.
Recommended Telecom Cloud Network Architecture
As a scenario of cloud-network convergence, the telecom cloud network has the above-mentioned unique features and requirements in addition to general features of a cloud network. According to the telecom cloud features, we recommend the network architecture as shown in Fig. 1.
The telecom cloud network architecture is vertically divided into Border Leaf (also known as DC GW), Spine and Leaf layers. Firewalls and external networks, as the special network services, are connected to the Border Leaf, and physical servers are connected to the Leaf switches. The switches use virtual switch cluster (VSC) and equal cost multiple path (ECMP) to provide high reliability. The network is horizontally divided into computing, storage and management domains. To meet some special requirements, certain out-of-band management domains are divided from the management domain. To guarantee security, communication between the management, storage and computing domains needs to be filtered and protected by the management firewall.
The computing domain's servers employ VMs, bare-metal servers or containers to install VNFs. Communication among VNFs and that between a VNF and an external network is performed by a SDN-controlled overlay network. VxLAN encapsulation dynamically adjusts the virtual network to meet VNF communication requirements, according to the dynamic creation, destruction, scale up/down, rebirth and self-healing of VMs. Based on the security level of VNFs, the network is divided into an exposed zone, a DMZ zone, and a core zone. VNFs in the exposed zone are mainly route-type NEs for external communication, such as the vGW in a core network. In the DMZ zone, it is recommended to deploy VNFs that are not directly exposed to the outside but need to communicate with VNFs in the exposed zone, such as vMME. In the core zone that has the highest security level, core subscription data NEs (e.g. HSS) get deployed the most. Communication between zones with different security levels needs to be protected by the firewall.
The storage domain is mainly deployed with storage servers or disk arrays to provide cloud storage for VNFs to use on demand. The management domain is mainly installed with cloud control nodes and SDN controllers to implement coordinated control and orchestration for the cloud and the network.
The industry has reached a consensus on using SR as the technology for evolution of P backbone network, mobile backhaul, MAN and DCI underlay network outside the DC. VxLAN technology is implemented within the DC to provide overlay virtual networks. The DC can be considered as an L2 VPN domain, and a VNF a client device of the vDC L2 VPN. In particular, vRouter-type VNFs such as vGW or vBRAS are a part of a WAN, and SR path orchestration for WAN should include these VNFs.
The industry is considering using the same technology to achieve unified evolution of internal and external networks of vDC. ZTE recommends to use SR over VxLAN (Fig. 2) to achieve interconnection and unified orchestration of external and internal overlay networks of DC. This enables unified coordination and control for service networks, minimizes the impact on internal physical networks of DC, and avoids excessive dependence on the underlying network functions.
Due to the telecom cloud network's special architecture and requirements, deployments of NFV/SDN in the network will be varied. Operators and equipment vendors need to develop specialized solutions and functions.
Telecom cloud virtualization technologies keep evolving. Smart NIC offload and container technologies are rapidly maturing and increasingly applied to virtualization. In-band network telemetry, intelligent traffic collection and analysis, and network self-test and self-healing technologies are also developing rapidly, providing operators new ideas for intelligent O&M. On the other hand, we must realize that network cloudification cannot be done at one stroke; instead, it requires more trials and explorations.
Telecom cloud network, high requirements, diverse VMs, reliability and disaster recovery, fault detection and switchover, traffic detection, network QoS