Download PDFOpen PDF in browser

Enhancing DarkWeb Activities Classification Using Embedding Methods

EasyChair Preprint 15803

8 pagesDate: February 11, 2025

Abstract

The Dark Web is widely recognized for facilitating illicit activities such as drug trafficking, weapons trade, cybercrime, and hacking services. Tackling these activities poses a significant challenge for law enforcement and cybersecurity experts. Recent advancements in Artificial Intelligence (AI) and machine learning have shown great potential in various domains, including the analysis of the Dark Web. This article presents an innovative system designed to classify 10 types of illicit activities on the Dark Web. The system leverages the publicly available DUTA dataset, which provides a structured foundation for analyzing Dark Web content. Ensemble learning techniques, including Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), and CatBoost, were employed to achieve this objective. To process textual data, three word embedding techniques were utilized to convert text into vector representations. Notably, our approach demonstrates exceptional performance: the combination of the ELMo word embedding model and the XGB classifier achieved superior results, with an accuracy and precision rate of 99.95%. These findings highlight the effectiveness of our system in identifying and classifying illegal activities within the Dark Web ecosystem.

Keyphrases: Balancing data, Classification, Dark Web, Illegal activities detection, Security., embedding, machine learning

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15803,
  author    = {Meriem Sennad and Zineb Ellaky and Faouzia Benabbou},
  title     = {Enhancing DarkWeb Activities Classification Using Embedding Methods},
  howpublished = {EasyChair Preprint 15803},
  year      = {EasyChair, 2025}}
Download PDFOpen PDF in browser