Tutorials

Leonidas Akritidis

International Hellenic University, Greece

“Learning from Imbalanced Data”

 

Abstract: The problem of data imbalance is related to the uneven distribution of the training examples to the involved classes. In such cases, the vast majority of the input samples are associated with just one class (the majority class), whereas the rest of the classes are significantly underrepresented (minority classes). Nowadays, a broad variety of application areas suffer from class imbalance, including Cybersecurity, Bioinformatics, Natural Language Processing, Image and Multimedia data processing, and so on.

Using such data to train machine learning models is almost always problematic, since the produced models are strongly biased towards the majority class and cannot learn the minority classes sufficiently. As a consequence, the accuracy and the generalization capabilities of these models are significantly degraded.

In this tutorial the most modern advances in the field of classification with imbalanced data will be presented. The underlying techniques will be categorized according to the approach that they apply to confront the problem. The entire family of resampling techniques (over-sampling, under-sampling, hybrid-sampling, etc.) will be reviewed in details. In the sequel, algorithm-based approaches and cost-sensitive learning methods will be analyzed. The second part will summarize the current conclusions and it will include an inspiring description of the most important challenges in the area which that are still left open. Finally, some insights for the ongoing and future research will be discussed.

Short Bio: Leonidas Akritidis is a post-doctoral research fellow in the Department of Science and Technology of the International Hellenic University. He is also a contracted lecturer in the same department since 2020. He holds a diploma in Electrical and Computer Engineering (2003) and a PhD in Electrical and Computer Engineering (2013). His research activity is focused on the fields of deep learning from text data, Natural Language Processing, Data Mining and Knowledge Discovery, data engineering, optimal rank aggregation, and parallel/distributed algorithms. He has published multiple research articles in leading international journals and scientific conferences. Moreover, he has designed and developed a broad collection of scientific and commercial applications and systems. He has contributed to the successful preparation and completion of various research projects with national and international funding.