ACTap: Integrating Attention and Convolution for Network Modal Recognition

Release Date:2026-06-26 Author:Ling Zihan, Zhang Tianwei, Ning Yuwei, Cao Yang, Shen Can

Abstract: With the sustained growth of live video streaming, the demand for high-quality video services for mobile users across diverse net‑ work environments is increasing rapidly. In this paper, we define the network fluctuation characteristics in different environments as network modality. To comprehensively investigate network modality across different environments, we construct a network modality dataset by collect‑ ing multi-dimensional network metrics from various real-world scenarios and suggest that network modality exhibits separability. Therefore, network modality recognition, which aims to distinguish the scenarios where a user is located based on network modality sequences, is fea‑ sible and can be formulated as a multivariate time series (MTS) classification problem. To address this problem, we propose a novel neural network (NN)-based classification model called ACTap. Specifically, the model first integrates a two-stage attention (TSA) mechanism and a convolutional neural network (CNN) to extract features from network modality sequences. Then, it filters out noisy feature representations to learn discriminative class prototypes, and finally recognizes network modality based on the distance between their feature representations and class prototypes. Experimental results validate the separability of network modality and show that ACTap outperforms four benchmark models in terms of classification accuracy on the network modality dataset.
Keywords: network modality; multivariate time series classification; feature fusion; class prototype learning

download: PDF