|
|
Select Language
Simple Search
Advanced Search
OPAC
Katalog Online Perpustakaan Universitas Ma Chung
Villa Puncak Tidar N-01 Malang - Jawa Timur.
DDC v.22
Klasifikasi & Katalogisasi DDC versi 22
Validated
|
Title |
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants |
Edition |
Volume 36, Numbers 1-2 |
Call Number |
|
ISBN/ISSN |
0885-6125 |
Author(s) |
BAUER, ERIC KOHAVI, RON Chan, Philip Stolfo, Salvatore Wolpert, David
|
Subject(s) |
|
Classification |
|
Series Title |
Machine Learning |
GMD |
Electronic Journal |
Language |
English |
Publisher |
Springer Netherlands |
Publishing Year |
1999 |
Publishing Place |
Netherlands |
Collation |
35p |
Abstract/Notes |
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be
very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review
these algorithms and describe a large empirical study comparing several variants in conjunction with a decision
tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding
of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect
classification error. We provide a bias and variance decomposition of the error to show how different methods
and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable
methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods
but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently
than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants,
some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates,
weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic
estimates in conjunction with no-pruning are used, as well as when the data was backfit. We measure tree sizes
and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its
success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and
showthat the voting methods lead to large and significant reductions in the mean-squared errors. Practical problems
that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We
use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only “hard” areas but
also outliers and noise. |
Specific Detail Info |
|
Image |
 |
File Attachment |
LOADING LIST... |
Availability |
LOADING LIST... |
|
Back To Previous |
|