Portal Perguruan Tinggi
Select Language
Simple Search

Advanced Search
Title :
Author(s) :

Subject(s) :

GMD : Collection Type : Location :

Katalog Online Perpustakaan Universitas Ma Chung Villa Puncak Tidar N-01 Malang - Jawa Timur.

DDC v.22

Klasifikasi & Katalogisasi DDC versi 22 Indonesia ICT Award 2009


Valid XHTML 1.0 Transitional
Valid CSS

Title An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
Edition Volume 36, Numbers 1-2
Call Number
ISBN/ISSN 0885-6125
Author(s) BAUER, ERIC
Chan, Philip
Stolfo, Salvatore
Wolpert, David
Series Title Machine Learning
GMD Electronic Journal
Language English
Publisher Springer Netherlands
Publishing Year 1999
Publishing Place Netherlands
Collation 35p
Abstract/Notes Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be
very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review
these algorithms and describe a large empirical study comparing several variants in conjunction with a decision
tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding
of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect
classification error. We provide a bias and variance decomposition of the error to show how different methods
and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable
methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods
but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently
than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants,
some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates,
weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic
estimates in conjunction with no-pruning are used, as well as when the data was backfit. We measure tree sizes
and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its
success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and
showthat the voting methods lead to large and significant reductions in the mean-squared errors. Practical problems
that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We
use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only “hard” areas but
also outliers and noise.
Specific Detail Info
File Attachment
  Back To Previous