A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts

Jafar Pouramini; Behrouz Minaei-Bidgoli

doi:10.25518/0037-9565.5414

Bulletin de la Société Royale des Sciences de Liège

0037-9565 1783-5720

More Statistics

since 05 February 2011 :
View(s): 1160 (51 ULiège)
Download(s): 355 (2 ULiège)

Jafar Pouramini & Behrouz Minaei-Bidgoli

A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts

(Volume 85 - Année 2016 — Actes de colloques — Special edition)

DOI: 10.25518/0037-9565.5414

Article

Attached document(s)

original pdf file

Abstract

Ever-growing extension of textual data has increased the necessity of processing textual data. Data imbalance in classification of textual data is one of the cases that decrease efficiency. In order to confront with imbalance problem, various methods are suggested. Some of the methods are: data-based, cost-based, algorithm-based and feature selection methods. In recent researches, some methods are considered into account using ensemble methods. In this research, a new oversampling method is suggested. In the new method the number of minor class samples is increased using ontology and then random oversampling is performed for minor class. Finally, using the methods of feature selection, appropriate features are selected. New ensemble method was tested using Hamshahri data. The results show that the ensemble method on Hamshahri collection, despite decreasing number of features, causes the improvement of classification results for polynomial Naïve Bayes and decision tree.

Keywords : feature selection, imbalanced, ontology, oversampling

To cite this article

Jafar Pouramini & Behrouz Minaei-Bidgoli, «A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts», Bulletin de la Société Royale des Sciences de Liège [En ligne], Volume 85 - Année 2016, Actes de colloques, Special edition, 358 - 375 URL : http://popups.ulg.be/0037-9565/index.php?id=5414.

About: Jafar Pouramini

Department of Information Technology, Faculty of Engineering, University of Qom, j_pouramini@pnu.ac.ir

About: Behrouz Minaei-Bidgoli

Faculty of Computer Engineering, Iran University of Science and Technology, b_minaei@iust.ac.ir

A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts

Attached document(s)

Abstract

To cite this article

About: Jafar Pouramini

About: Behrouz Minaei-Bidgoli

Bulletin de la Société Royale des Sciences de Liège

Index