NORMA eResearch @NCI Library

A critical analysis of Sampling Techniques for imbalanced data classification: An application to Social Media

Nagarajan, Vinitha (2017) A critical analysis of Sampling Techniques for imbalanced data classification: An application to Social Media. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview

Abstract

Imbalanced training datasets appear in a number of real-life problems, such as anomaly detection, network monitoring and social media. While some classes will normally have a sizeable amount of records, other classes will be underrepresented. Constructing efficient classifiers for minority classes is a challenge which has been addressed in various ways, but basically grouped into undersampling the majority class(es) and oversampling the minority class(es), or a combination of both techniques. This thesis will focus on imbalanced classification techniques for social media data from Twitter. The classification task at hand is the identification of spam tweets. Social networks are a rich source of information, but also attracts many illegitimate users who spread spam tweets. A machine learning approach is presented whereby analytical models are learnt from highly skewed datasets to predict spam messages. A range of techniques to tackle class imbalance are analysed in detail by controlling the class imbalance ratio. It is indeed possible to identify techniques of superior performance according to the imbalance ratio they can cope with. It is shown that classification performance heavily depends on the imbalance degree of the dataset and their sampling techniques.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites > Online social networks
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites > Online social networks
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Caoimhe Ní Mhaicín
Date Deposited: 30 Aug 2018 12:35
Last Modified: 30 Aug 2018 12:35
URI: https://norma.ncirl.ie/id/eprint/3104

Actions (login required)

View Item View Item