Large AI Model-Based Solution for Intelligent Optical Network O&M

Release Date:2026-03-26 By Wang Zhenyu

With the rapid expansion of OTN networks and increasing architectural complexity, traditional optical networks urgently require AI technologies to address cost and efficiency challenges from fault prediction and prevention, as well as complicated and repetitive O&M operations, improving O&M quality and driving the evolution of network technologies.

To meet the digital and intelligent transformation requirements of optical network O&M, introducing large AI models to deliver intelligent capabilities across the full lifecycle—planning, construction, maintenance, optimization, and operation—has become a widely recognized approach to accelerating the evolution towards higher-level autonomous optical networks.

Focusing on the new intelligent O&M paradigm empowered by large AI models for optical networks, two typical application solutions developed by ZTE are presented: intelligent network fault diagnosis and network traffic analysis & prediction.

Intelligent Network Fault Diagnosis

Optical network architectures are becoming increasingly complex, with diverse fault symptoms and a large number of alarms, making rapid root cause localization difficult. Delimiting and locating equipment and line faults remains challenging, and locating fiber line faults requires additional equipment. Traditional manual analysis involves multiple steps, is time-consuming, and suffers from low accuracy.

 To address these network O&M issues, ZTE has launched a network fault diagnosis system based on large AI models, as shown in Fig. 1. This system implements AI-based root cause analysis of network alarms and large model-based automated fault diagnosis. In addition, knowledge graph technology is introduced to enhance fault diagnosis accuracy and improve the operational efficiency of the network management system (NMS) through natural language interaction.  

AI-Based Root Cause Analysis of Network Alarms

For OTN networks, the system makes comprehensive root cause analysis of hardware faults, including equipment boards and optical modules, as well as potential faults of network fibers such as fiber breaks, fiber deterioration, and co-cable routing. Compared to traditional manual expert analysis, the next-generation intelligent system preprocesses massive alarms and achieves rapid and accurate alarm root cause analysis using small-model AI algorithms.

The procedure consists of three main steps. First, raw alarms are processed by applying high-frequency filtering and expert-defined rules. Then, massive alarms are automatically aggregated based on a time-space correlation clustering algorithm. Finally, the correlated alarms are analyzed using a fault propagation graph and a graph neural network algorithm to identify the root cause alarms.

With the introduction of AI analysis, the system achieves over 90% accuracy in root alarm identification, significantly improving work order dispatch accuracy, reducing the number of work orders, and ultimately enhancing efficiency.

Large Model-Based Automated Fault Diagnosis

Traditional fault diagnosis is performed manually using tools, which can take a long time to diagnose a single fault. In complicated fault scenarios, the process highly relies on the experience of the O&M personnel, making it difficult to ensure timely fault recovery. Leveraging large model technologies, the system intelligently generates diagnosis solutions and implements automated scheduling, eliminating the reliance on operator expertise for fault diagnosis and significantly enhancing O&M efficiency.

Using the powerful natural language processing and knowledge inference capabilities of ZTE's Nebula Telecom Large Model, the system can accurately identify fault symptoms described in natural language input or fault work orders, and generate corresponding diagnosis solutions. With the orchestration and scheduling capabilities of the large model, the system automatically performs alarm analysis, fault location, solution generation, and recovery execution through internal API invocation, based on the generated diagnosis solution. This capability reduces the average fault diagnosis time from hours to less than five minutes. Moreover, based on the continuous learning capability of the large model, the solution can better adapt to different network environments and fault scenarios, further improving both the efficiency and accuracy of fault diagnosis.

Knowledge Graph for Improved Diagnosis Accuracy

Knowledge graph technology is introduced into the fault diagnosis system to construct a corresponding knowledge graph based on information and knowledge related to fault O&M, including resource information and troubleshooting knowledge. For example, resource information is structured into a resource knowledge graph and troubleshooting-related knowledge into a fault knowledge graph. The system then performs inference based on the constructed resource knowledge graph, incorporates relevant knowledge from the fault knowledge graph, and utilizes the existing rule library to generate the fault diagnosis result.

At present, the combination of large models and knowledge graphs has become a consensus in the industry. Large models enable language understanding, while knowledge graphs enrich the way knowledge is represented. Together, these technologies complement each other to improve inference capabilities and further improve the fault diagnosis accuracy.

Natural Language Interaction for Enhanced NMS Operation Efficiency

Large model technologies are driving the evolution of network management operation from a graphical user interface (GUI) to an artificial intelligence user interface (AUI). During routine O&M, users can add, delete, modify, and query network information and configurations using natural language, without the need to learn and remember specific usage methods and function entries of the NMS, improving the NMS operation efficiency by more than 90%.    

Network Traffic Analysis & Prediction

The service traffic carried by each optical channel in an optical network varies over time, making it difficult for users to promptly and accurately identify traffic bottlenecks during network O&M. Additionally, users are unable to analyze or predict traffic trends for each channel in the future phases of the network, nor can they plan bandwidth resources in advance.  

To solve the above network O&M challenges, ZTE has launched the network traffic analysis & prediction system (as shown in Fig. 2), enabling fine-grained network O&M and intelligent service operation guidance. It also supports the visual display of a digital map.  

Fine-Grained Network O&M

Based on AI modeling and prediction algorithms for network traffic, the solution overcomes the traffic perception blind spots of traditional OTN networks. By combining the advantages of traditional O&M and traffic management, it provides reference for network capacity expansion, avoiding both user experience degradation caused by delayed capacity expansion and the waste of investment from blind capacity expansion.

Compared to traditional OTN networks, the intelligent system represents an innovative leap in traffic management, transforming it from "nonexistent" to "available" and from "zero" to "one". It has transformed the rigid-pipe O&M philosophy of OTN. By identifying port traffic-related indicators, it enables soft analysis of hard channels, delivering fine-grained management and predictive analysis for network O&M.

  • Real-time traffic analysis

The solution provides multi-dimensional traffic analysis capabilities. When traffic reaches a threshold, it triggers a threshold-crossing alarm to guide service diversion and avoid service impairments. The solution provides port traffic analysis at the minute, hour, day, and month levels. When peak bandwidth utilization reaches 90%, a threshold-crossing alarm is triggered to guide the maintenance personnel to divert services timely, thus preventing service quality degradation. In addition, based on current traffic analysis, the system can identify network bottlenecks and recommend targeted capacity expansions.

  • Intelligent traffic prediction

By using AI algorithms such as linear regression and time series, along with long-term big data analysis and prediction, the system implements traffic prediction curve assessment, detects network bottlenecks and service overload risks in the future, and discovers bandwidth requirements in advance. This helps guide global network traffic optimization and capacity expansion planning.

Intelligent Service Operation Guidance

Based on OTN network service traffic data, the system is capable of building models of network service usage patterns. By analyzing behaviors such as zero traffic, traffic decline, traffic surges, and traffic fluctuations, and incorporating traditional OTN port and user status analysis, it enables intelligent service O&M.

  • Service fault management

Through time-based traffic analysis, the system builds a data model of user service usage patterns and evaluates service reliability based on traditional OTN performance analysis data (such as port status, optical power, and bit errors). It can quickly identify zero-traffic user faults and respond to interruptions, minimizing the impact on services.

  • Service anomaly alert

The system analyzes traffic changes over time based on user service usage patterns to generate early warnings of potential customer churn. It also proactively monitors long-term "zero-traffic" user behavior to prevent customer churn risks and the invalid occupation of network resources.

  • Package change alert

The system analyzes the quality degradation of services that exceed package limits and dynamically adjusts the bandwidth for customers in a timely manner to prevent impacts on service quality. It also detects customer service growth trends timely and provides package expansion alerts to the front-end teams.

Visual Display of Digital Map

Real-time traffic monitoring data for network ports and services can be visualized and managed on a digital network map.

  • Multi-dimensional analysis: Provides multi-dimensional statistical analysis based on port, user rate, bandwidth utilization, peak/valley value, average value, and TOPN.
  • Traffic map: Displays the entire network traffic data in real time, including traffic/utilization rankings, coloring, map drill-down, and service proportions.
  • Trend analysis: Displays trends for peak and average values of rates, traffic, and bandwidth utilization for ports and services in the near future, based on historical traffic statistics.

 

ZTE's intelligent OTN network fault diagnosis system uses AI technologies to achieve minute-level diagnosis efficiency with an accuracy rate exceeding 90%. Based on real-time traffic analysis, the network traffic analysis & prediction system provides timely alerts for traffic bottlenecks, improves O&M efficiency by over 30%, and predicts future traffic trends based on AI big data analysis, thus facilitating the transformation and upgrade of O&M from passive maintenance to proactive prevention.