Abstract: The increasing demand for high throughput and low latency in Wi-Fi 7 necessitates a robust receiver design. Traditional receiver architectures, which rely on a cascade of complex, independent signal processing modules, often face performance bottlenecks. Rather than focusing on semantic-level tasks or simplified Additive White Gaussian Noise (AWGN) channels, this paper investigates a bit-level end-to-end receiver for a practical Wi-Fi 7 Multiple-Input Multiple-Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM) physical layer. A lightweight Transformer-based encoder-only architecture is proposed to directly map synchronized OFDM signals to decoded bitstreams, replacing the conventional channel estimation, equalization, and data detection. By leveraging the multi-head self-attention mechanism of the Transformer encoder, our model effectively captures long-range spatial–temporal dependencies across antennas and subcarriers, thus learning to compensate for channel distortions without explicit channel state information. This mechanism eliminates the need for explicit channel estimation, enabling the direct extraction of crucial channel and signal features. Experimental results validate the efficacy of the proposed design, demonstrating the significant potential of deep learning for future wireless receiver architectures.
Keywords: Transformer; receiver design; Wi-Fi 7; deep learning