Behavior targeting (BT) based on individual web⁃browsing history has become more valuable in precision marketing for many companies through capturing users ’interest and preference. It is common in practice that the behavior data collected from different online shopping applications are inconsistent since they are labelled by different item taxonomy, where the same behavior could have different representations and therefore analysis confusion arises. To address this issue, we propose a semantic similarity based strategy to transform the heterogeneous behavior extracted from deep packet inspection (DPI) data of a telecommunication operator into a unique standard one. The Word Mover’s Distance algorithm is exploited to evaluate the semantic similarity of the distributed representations of two web⁃browsing histories. Moreover, the architecture of the behavior targeting platform on Hadoop is implemented, which is capable of processing data with size of PB level every day.
BT; online shopping application; DPI; Word Mover’s Distance; hierarchical taxonomy