Using a pre-trained BERT model, we generate embeddings for each token:

Tokenized text:

deep_feature = [0.23, 0.41, ..., 0.57]

Let's use mean pooling:

The input text is tokenized into subwords:

['varicad', '-', 'v2', '-', '07', '-', 'crack', '-', 'keygen', '-', 'full', '-', 'torrent', '-', 'free', '-', 'download', '-', 'latest', '-', '2022']