iFLOW: Enabling Flow-Based Insight and Precise Fault Diagnosis

Release Date:2021-09-22  Author:By Zhang Junhui  Click:

With the gradual commercial use of 5G, customers require higher service quality for new 5G services, and fast fault location and recovery are becoming increasingly critical to ensure service reliability. To make 5G network O&M precise, ZTE's insight Flow (iFLOW) solution provides multi-level in-depth perception and accurate fault diagnosis. 

Current Network O&M
The traditional network monitoring tool uses out-band or low-precision in-band measurement for network O&M. Due to a lack of real-timeness and low detection accuracy, the obtained information is limited in guiding network fault diagnosis. The problems include:

—Inability to monitor and analyze dynamic IP routes of wireless base stations and core networks, nor perceive and trace the service flow path changes and root causes.

—Unmeasurable service SLA and low measurement accuracy: The network quality is detected indirectly by sending analog packets that may not be consistent with the real service path, thus failing to completely reflect the service-level SLA. The service packet loss detection accuracy is only 0.1%. Faults may not be detected when services like AR/VR incur packet loss.

—No real-time fault perception and poor passive O&M experience: With the traditional minute-level information collection, the transport network cannot perceive network changes in real time, and only responds passively when the service fails or the quality has deteriorated.
—Difficult fault localization & delimitation and no self-healing capability: It is hard to delimit faults accurately since hop-by-hop detection cannot be performed and service status during the fault period cannot be recovered. Therefore, teams on the wireless side, network side, and core network side need to cooperate with each other to locate the faults, which takes several days or even weeks. 

iFLOW Solution
To address low-precision network status perception and long fault localization & recovery in traditional network O&M, the iFLOW solution (Fig. 1) enables in-depth perception and precise fault diagnosis at three service layers. First, at the service routing layer, VPN route information and status are monitored through the BGP monitoring protocol (BMP) in real time for precise analysis. Second, at the service path layer, the path computation element (PCE) is used to uniformly compute and optimize LSP paths of the entire network and precisely control service paths for rapid service self-healing. Finally, at the service forwarding layer, the in-band OAM (IOAM) is used to accurately monitor service flows, and analyze, diagnose their performance (delay, packet loss, and jitter).

Precise Network Insight
The iFLOW solution supports in-depth network perception and automatic monitoring through end-to-end multidimensional visualization and accurate service parameter collection and analysis. 
The iFLOW collects VPN routes through the intelligent management and control system ZENIC ONE, presents the routes of the whole network in real time, and monitors and counts their changes, including peer up/down, incremental route advertisement and recycling, status report, statistics, path mirroring, TOPN route and timestamp. In addition, it displays the service quality and traffic information related to base station, signaling plane and data plane in multiple dimensions, helping users to quickly learn the service quality of the network.
It also carries out security analysis by monitoring VPN prefix route information and status, analyzes the corresponding paths through route drilling to quickly learn about the path information of service flows, and accurately locates the path adjustment and root cause by tracing the historical paths.
The solution offers end-to-end and hop-by-hop measurement at the IP service level, identifies the base station, tunnel and NE that the service passes through, and rapidly recovers the real-time service path. In combination with active monitoring and all-network status & big data analysis, it can perceive and handle possible faults in advance to allow fast service self-healing and ensure network transport quality. 
Intelligent Fault Localization & Delimitation 
The iFLOW solution collects statistics on the number of real service packets through the GTP tunnel, identifies the packet feature fields through the SCTP signaling, and precisely locates faults in real time through path restoration, hop-by-hop detection and SLA analysis. 
It can rapidly delimit transport/wireless faults based on the E2E detection at the base station flow level or quickly locate fault points based on the hop-by-hop detection at the flow level. It can also rapidly find out historical and root causes of the service flow deterioration by backtracking historical paths and performance of service flows. 

When a fault point is accurately located, the controller's all-network data and multi-constrained path algorithm can be combined to compute a TE path that meets the service SLA requirement and bypasses the fault point, and switch service flows to a new path for their fast self-healing. 

iFLOW Application
Targeting the pain points of the existing network O&M, the iFLOW solution accurately presents the network information that the customer is most concerned about through three ways (route insight, service performance insight and fast fault insight), and enables quick service self-healing in case of a failure. 
Precise Route Insight and Security Analysis
The iFLOW monitors VPN route information and status in real time through BMP, and precisely analyzes service routes. 
When a route is added abnormally, the BMP can be used to monitor the changes of VPN prefix route information and status and perform security analysis. Another function of the BMP is to supervise the IP routes of the whole network and detect the address conflict, so as to find the base station IP address planning error and provide timely warnings. 
With network cloudification and cloud-network synergy, the BMP can be used to further monitor the VNF changes and status in the DC cloud and raise the end-to-end intelligent O&M capability of cloud-network synergy. 
Precise Service Performance Insight
The IOAM is used to monitor and analyze the precise performance of service flows, and perform hop-by-hop precise analysis and rapid diagnosis of the real service flow performance (delay, packet loss and jitter). 
The user subscribes to the statistical data of the NE in the service flow path through the controller, and enables the calculation of packet loss and delay (the controller can know which NEs are configured and subscribed to according to service configuration or other auxiliary means, such as the trace function of the SR tunnel). After the subscription, the device reports the statistical data, and the controller calculates it and presents the results to the user. Finally, visualized analysis of historical performance data is realized based on the big data platform.
Precise Fault Insight for Quick Diagnosis
The solution adopts source IP + destination IP + color id or tunnel policy configuration to drill down to the corresponding slice and tunnel/SR policy/VPN, and precisely analyzes according to the associated path/SR policy the SLA attributes and connectivity such as passing nodes, links, hops, cost/metric, BW and E2E delay. 
When the network fault causes path adjustment and SLA deterioration (e.g. the increased delay), the affected services and routes are found through the color id and destination address of the SR policy, and visually displayed. The bandwidth, delay, jitter, and packet loss of the passing nodes and links are further analyzed based on the path information to find out the causes affecting the service SLA and precisely locate the fault source. 
After the fault source is located, the service can be rapidly diagnosed through tools such as ping/trace/twamp/ioam and configuration check. The specific fault cause and location (e.g. node, link, port and queue) are analyzed with the diagnosis results and troubleshooting suggestions given. 
Fast Service Self-Healing Based on Flow
When the service performance deteriorates, the controller can quickly locate the fault cause with the accurate iFLOW network insight, and automatically recalculate a new path through the SLA performance algorithm. In this way, the service flow is diverted to a new path that meets the SLA requirements, leading to fast service self-healing, lower complexity in manual process and better user experience. 

The iFLOW solution employs multiple innovative technologies to enhance the real-time monitoring and analysis of global service routes and allows in-depth perception at multiple layers. Combined with historical information backtracking and recovery, it can fulfill rapid and accurate fault localization and root cause analysis, thus strengthening the control of network information by the O&M personnel, significantly shortening the time of network fault processing, and effectively improving customer service quality.