What is the Futureof O&M?

Release Date:2017-07-25 By Wang Rui, Liu Junjie Click:

 

 

The Transformation Trend

Carriers need to answer the market demand for network transformation. Customers always ask for better services and lower prices. SDN/NFV development offers carriers the opportunities to create intelligent and automated networks. With continuous innovations occurring in the telecom industry, new technologies and concepts such as big data analysis, artificial intelligence (AI), centralized, policy-driven management and user experience management have become a focus of the O&M.
With the cloudfication of networks, resources can be orchestrated and scheduled flexibly, and network element equipment can become virtualized and carried by virtualized machines. The cloud platform aggregates computing, network and storage resources, delivering unified management, that is, infrastructure as code (IaC). Through cloudification, NFV decouples hardware from software, network functions from dedicated hardware. It allows for flexible resources sharing and fast service creation and deployment, and supports auto deployment, elastic scalability, fault isolation, and self-healing based on actual service demands. Considering the telecom service trends, standards making progress and major global operators' service trials, cloudification, digitization, automation, and intelligentization technologies have bright prospects. But they are also facing new challenges.
According to a report published by Technology Business Research (TBR) in 2016, the main challenges to NFV O&M include: adapting process methods and procedures for hybrid, physical and virtual systems; deepening models for proactive network assurance; developing new SLAs that feature real time, contextual and location-aware assurance methods; creating traffic visibility between and within physical and virtual network; and navigating a more complex multivendor environment.
Based on the trends and challenges mentioned above, ZTE has put forward the next-generation cloud-based O&M solution vMaster. This solution focuses on automated user experience while taking into consideration carriers' experience. It aims to improve O&M efficiency and quality.

 

Cloud-Based Intelligence

The vMaster global information (GI) system provides cloud-based centralized management. The unified information model management encompasses policy, service, and resources models and enables network-wide, multi-vendor, and end-to-end management. The unified portal management provides a centralized view of alarms, performance, and resources.
The vMaster global assurance (GA) system is an open service assurance management platform that helps carriers centrally manage different vendors and SDN/NFV networks. It provides centralized alarming, unified performance management, unified O&M process, and unified portal management to implement network O&M quickly, flexibly, and efficiently and at a low cost.
All the functional components of the vMaster solution are based on microservices and operate in virtual PaaS environments for easy scalability and agile and flexible deployments. Through the centralized network management, the vMaster solution provides carriers with complete analysis and visibility of end-to-end faults and performance and one set of data to save resources and facilitate correlation analysis. It helps carriers achieve closed-loop management from orchestration to service assurance by utilizing the O&M, and orchestration and scheduling policies in the policy center.
Through AI-based big data analysis, the vMaster solution realizes intelligent analysis. It adjusts policies in real time, monitors the effectiveness of the policies, and continuously optimizes them. Based on the big data collected from a large number of historical events, the AI analyzes rules and forms policies. Driven by events, the policies automatically optimize network settings (including automatic scale-out, scale-in and automatic bandwidth scheduling). This O&M method has evolved from decision-making based on analyzing and computing real-time data to proactive network maintenance, from planning to prediction. Through empirical analysis, the AI learns from the operators' instructions and improves its future actions. This can help carriers automate network configuration and monitoring, reduce operation expenditures, and improve network usage and maintenance.
In an O&M scenario, alarm filtering and root cause failure analysis are the core tasks of alarm handling. Because of the complexity of the network and the large amount of alarm data, a lot of manpower is needed in traditional O&M to analyze root cause failures. This problem is solved after the AI is introduced. Historical alarms are analyzed, and rules of root cause failures are identified through machine learning. Alarm recovery verifies the effectiveness of the rules, and the manual inspection after unsupervised machine learning ensures the accuracy of the rules.
By working with a carrier, ZTE collected 10 million historical alarms in the carrier's network, and performed alarm filtering and alarm correlation analysis through unsupervised machine learning. The high-performance big data cluster formed by several servers needs a few hours at the beginning to generate 100 rules (including many invalid rules) ; and then it only needs 10 minutes to generate 62 valid alarm correlation patterns after delimiting the algorithm parameters and continuously tuning them. The patterns cover many fields including the bearer network, core network, and wireless access. Traditionally six experts are required to work several weeks to create alarm correlation patterns, and there is great difficulty in developing cross-field alarm correlation. ZTE's next-generation O&M system reduces the time to 10 minutes, and can be quickly introduced into the existing network to improve the efficiency greatly.
The video quality on mobile phones has been gradually upgraded to HD. The improvement of video quality increases the network traffic. When the next-generation O&M system becomes aware of the increased forwarding load on the pre-deployed virtualized network elements and predicts that the I/O throughput of the network elements will exceed the baseline, its policy center will automatically determine that it is necessary to scale out and instruct the orchestration system to scale out. Within a time window after sending the instruction, the O&M system checks whether the KPIs of virtualized network elements are reduced to a proper range. If scale-out fails, scale-out is continued, and the system instructs technicians to check the scaling policies manually.
An automatic closed-loop is capable of automatic service recovery but not capable of solving hardware-related faults automatically. When a hardware fault occurs during O&M, the vMaster identifies the fault and assigns a trouble ticket to the technicians. Automated ticketing and resource scheduling, designing policies like O&M path, when combined with the AI, allows the efficiency of a manually controlled closed-loop to be improved.

 

The Future Road

The age of intelligence has arrived when AlphaGo beat a human. The AI will play a greater role in more fields. In the telecom field, the development of cloud, SDN/NFV, and 5G is accelerating. We can foresee that the related O&M will also be optimized for improved cost savings and efficiency. The AI has begun to show effects in O&M. ZTE's next-generation O&M system will further improve the O&M experience and give more value to users. ZTE is actively participating in the development of 5G standards and related technologies. When new technologies such as cloudification and SDN/NFV are triggering the rapid development of new types of networks, ZTE will continue to take the lead in this technology tide.