BazEkon - The Main Library of the Cracow University of Economics

BazEkon home page

Main menu

Author
Zdravevski Eftim (Ss. Cyril and Methodius University in Skopje), Lameski Petre (Ss. Cyril and Methodius University in Skopje), Kulakov Andrea (Ss. Cyril and Methodius University in Skopje), Gjorgjevikj Dejan (Ss. Cyril and Methodius University in Skopje)
Title
Feature selection and allocation to diverse subsets for multi-label learning problems with large datasets
Source
Annals of Computer Science and Information Systems, 2014, vol. 2, s. 387 - 394, rys., tab., bibliogr. 51 poz.
Keyword
Uczenie maszynowe, Analiza danych, Maszyny i urządzenia, Modele bayesowskie
Machine learning, Data analysis, Machinery and equipment, Bayesian models
Note
summ.
Abstract
Feature selection is important phase in machine learning and in the case of multi-label classification, it can be considerably challenging. In like manner, finding the best subset of good features is involved and difficult when the dataset has significantly large number of features (more than a thousand). In this paper we address the problem of feature selection for multilabel classification with large number of features. The proposed method is a hybrid of two phases - preliminary feature selection based on the information value and additional correlation-based selection.We show how with the first phase we can do preliminary selection of features from tens of thousands to couple of hundred, and then with the second phase we can make fine-grained feature selection with more sophisticated but computationally intensive methods. Finally, we analyze the ways of allocating the selected features to diverse subsets, which are suitable for ensemble of classifiers.(original abstract)
Full text
Show
Bibliography
Show
  1. "Aaia'14 data mining competition, howpublished = https://fedcsis.org/2014/dm_competition, note = Accessed: 2014-05-30."
  2. Almuallim H. and Dietterich T. G., "Learning with many irrelevant features," in Proceedings of the Ninth National Conference on Artificial Intelligence - Volume 2, ser. AAAI'91. AAAI Press, 1991. ISBN 0-262-51059-6 pp. 547-552. [Online]. Available: http://dl.acm.org/citation.cfm?id=1865756.1865761
  3. Anderson R., The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford: Oxford University Press, 2007. ISBN 9780199226405
  4. Bekkerman R., El-Yaniv R., Tishby N., and Winter Y., "Distributional word clusters vs. words for text categorization," J. Mach. Learn. Res., vol. 3, pp. 1183-1208, Mar. 2003. [Online]. Available: http://dl.acm.org/citation.cfm?id=944919.944969
  5. Ben-Hur A. and Guyon I., "Detecting stable clusters using principal component analysis," in Functional Genomics, ser. Methods in Molecular Biology, M. Brownstein and A. Khodursky, Eds. Humana Press, 2003, vol. 224, pp. 159-182. ISBN 978-1-58829-291-9. [Online]. Available: http://dx.doi.org/10.1385/1-59259-364-X%3A159
  6. Blum A. L. and Langley P., "Selection of relevant features and examples in machine learning," Artificial Intelligence, vol. 97, no. 1â˘A ¸ S2, pp. 245 - 271, 1997. doi: http://dx.doi.org/10.1016/S0004-3702(97)00063-5 Relevance. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0004370297000635
  7. Bradley A. P., "The use of the area under the {ROC} curve in the evaluation of machine learning algorithms," Pattern Recognition, vol. 30, no. 7, pp. 1145 - 1159, 1997. doi: http://dx.doi.org/10.1016/S0031-3203(96)00142-2. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320396001422
  8. Bruce L. and Brotherton D., "Information value statistic," in Midwest SAS User Group 2013 Conference Proceedings. Marketing Associates, LLC, 2013, pp. 1-18.
  9. Das S., "Filters, wrappers and a boosting-based hybrid for feature selection," in Proceedings of the Eighteenth International Conference on Machine Learning, ser. ICML '01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001. ISBN 1-55860-778-1 pp. 74-81. [Online]. Available: http://dl.acm.org/citation.cfm?id=645530.658297
  10. Dash M., Liu H., and Motoda H., "Consistency based feature selection," in Knowledge Discovery and Data Mining. Current Issues and New Applications, ser. Lecture Notes in Computer Science, T. Terano, H. Liu, and A. Chen, Eds. Springer Berlin Heidelberg, 2000, vol. 1805, pp. 98-109. ISBN 978-3-540-67382-8. [Online]. Available: http://dx.doi.org/10.1007/3-540-45571-X_12
  11. Duda R. O., Pattern classification, 2nd ed. New York: Wiley, 2001. ISBN 0471056693
  12. Fayyad U. M., Piatetsky-Shapiro G., Smyth P., and Uthurusamy R., Advances in knowledge discovery and data mining. Menlo Park, Calif.: AAAI Press : MIT Press, 1996. ISBN 0262560976 9780262560979
  13. Finlay S., Credit scoring, response modeling, and insurance rating: a practical guide to forecasting consumer behavior, 2nd ed. Houndmills, Basingstoke, Hampshire ; New York: Palgrave Macmillan, 2012. ISBN 9780230347762
  14. Fleming P. J. and Wallace J. J., "How not to lie with statistics: The correct way to summarize benchmark results," Commun. ACM, vol. 29, no. 3, pp. 218-221, Mar. 1986. doi: 10.1145/5666.5673. [Online]. Available: http://doi.acm.org/10.1145/5666.5673
  15. Forman G., "An extensive empirical study of feature selection metrics for text classification," J. Mach. Learn. Res., vol. 3, pp. 1289-1305, Mar. 2003. [Online]. Available: http://dl.acm.org/citation.cfm?id=944919.944974
  16. Grassberger P., Hegger R., Kantz H., Schaffrath C., and Schreiber T., "On noise reduction methods for chaotic data," Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 3, no. 2, 1993.
  17. Guyon I. and Elisseeff A., "An introduction to variable and feature selection," J. Mach. Learn. Res., vol. 3, pp. 1157-1182, Mar. 2003. [Online]. Available: http://dl.acm.org/citation.cfm?id=944919.944968
  18. Hall M. A., "Correlation-based feature selection for machine learning," Ph.D. dissertation, The University of Waikato, 1999.
  19. Hancock A. A., Bush E. N., Stanisic D., Kyncl J. J., and Lin C., "Data normalization before statistical analysis: keeping the horse before the cart," Trends in Pharmacological Sciences, vol. 9, no. 1, pp. 29 - 32, 1988. doi: http://dx.doi.org/10.1016/0165-6147(88)90239-8. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0165614788902398
  20. Hermes L. and Buhmann J., "Feature selection for support vector machines," in Pattern Recognition, 2000. Proceedings. 15th International Conference on, vol. 2, 2000. doi: 10.1109/ICPR.2000.906174. ISSN 1051-4651 pp. 712-715 vol.2.
  21. Huang J. and Ling C., "Using auc and accuracy in evaluating learning algorithms," Knowledge and Data Engineering, IEEE Transactions on, vol. 17, no. 3, pp. 299-310, March 2005. doi: 10.1109/TKDE.2005.50
  22. Jebara T. and Jaakkola T., "Feature selection and dualities in maximum entropy discrimination," in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, ser. UAI'00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000. ISBN 1-55860-709-9 pp. 291-300. [Online]. Available: http://dl.acm.org/citation.cfm?id=2073946.2073981
  23. John G. H., Kohavi R., and Pfleger K., "Irrelevant features and the subset selection problem," in Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 1994, pp. 121-129.
  24. Kira K. and Rendell L. A., "A practical approach to feature selection," in Proceedings of the Ninth International Workshop on Machine Learning, ser. ML92. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1992. ISBN 1-5586-247-X pp. 249-256. [Online]. Available: http://dl.acm.org/citation.cfm?id=141975.142034
  25. Kohavi R. and John G. H., "Wrappers for feature subset selection," Artif. Intell., vol. 97, no. 1-2, pp. 273-324, Dec. 1997. doi: 10.1016/S0004-3702(97)00043-X. [Online]. Available: http://dx.doi.org/10.1016/S0004-3702(97)00043-X
  26. Kononenko I., "Estimating attributes: Analysis and extensions of relief," in Machine Learning: ECML-94, ser. Lecture Notes in Computer Science, F. Bergadano and L. De Raedt, Eds. Springer Berlin Heidelberg, 1994, vol. 784, pp. 171-182. ISBN 978-3-540-57868-0. [Online]. Available: http://dx.doi.org/10.1007/3-540-57868-4_57
  27. Kullback S. and Leibler R. A., "On information and sufficiency," The Annals of Mathematical Statistics, pp. 79-86, 1951.
  28. Langley P., Elements of machine learning. San Francisco, Calif: Morgan Kaufmann, 1996. ISBN 1558603018
  29. Lee C. and Lee G. G., "Information gain and divergence-based feature selection for machine learning-based text categorization," Inf. Process. Manage., vol. 42, no. 1, pp. 155-165, Jan. 2006. doi: 10.1016/j.ipm.2004.08.006. [Online]. Available: http://dx.doi.org/10.1016/j.ipm.2004.08.006
  30. Ling C. X., Huang J., and Zhang H., "Auc: A statistically consistent and more discriminating measure than accuracy," in Proceedings of the 18th International Joint Conference on Artificial Intelligence, ser. IJCAI'03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003, pp. 519-524. [Online]. Available: http://dl.acm.org/citation.cfm?id=1630659.1630736
  31. Little R. J. A., Statistical analysis with missing data, 2nd ed., ser. Wiley series in probability and statistics. Hoboken, N.J: Wiley, 2002. ISBN 0471183865
  32. Liu H. and Motoda H., Feature Extraction, Construction and Selection a Data Mining Perspective. Boston, MA: Springer US, 1998. ISBN 9781461557258 1461557259. [Online]. Available: http://dx.doi.org/10.1007/978-1-4615-5725-8
  33. Madjarov G., Kocev D., Gjorgjevikj D., and Džeroski S., "An extensive experimental comparison of methods for multi-label learning," Pattern Recognition, vol. 45, no. 9, pp. 3084 - 3104, 2012. doi: http://dx.doi.org/10.1016/j.patcog.2012.03.004. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320312001203
  34. Matheus C. J. and Rendell L. A., "Constructive induction on decision trees," in Proceedings of the 11th International Joint Conference on Artificial Intelligence - Volume 1, ser. IJCAI'89. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1989, pp. 645-650. [Online]. Available: http://dl.acm.org/citation.cfm?id=1623755.1623857
  35. Mays N. E., Lynas, Credit scoring for risk managers: the handbook for lenders. S.l.: CreateSpace], 2010. ISBN 9781450578967 1450578969
  36. Mitchell T. M., Machine Learning, 1st ed. McGraw-Hill Science/Engineering/Math, 3 1997. ISBN 9780070428072. [Online]. Available: http://amazon.com/o/ASIN/0070428077/
  37. Mladenic D. and Grobelnik M., "Feature selection for unbalanced class distribution and naive bayes," in Proceedings of the Sixteenth International Conference on Machine Learning, ser. ICML '99. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999. ISBN 1-55860-612-2 pp. 258-267. [Online]. Available: http: //dl.acm.org/citation.cfm?id=645528.657649
  38. Ng A. Y. and Jordan M. I., "Convergence rates of the voting gibbs classifier, with application to bayesian feature selection," in In 18th International Conference on Machine Learning. Morgan Kaufmann, 2001.
  39. Osborne J. W. and Overbay A., "The power of outliers (and why researchers should always check for them)," Practical assessment, research & evaluation, vol. 9, no. 6, pp. 1-12, 2004.
  40. Quinlan J. R., C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993. ISBN 1-55860-238-0
  41. Raman B. and Ioerger T. R., "Instance based filter for feature selection," Journal of Machine Learning Research, vol. 1, no. 3, pp. 1-23, 2002.
  42. Royston P., "Multiple imputation of missing values," Stata Journal, vol. 4, pp. 227-241, 2004.
  43. Rrnyi A., "On measures of entropy and information," in Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1961, pp. 547-561.
  44. Schervish M. J., "P values: what they are and what they are not," The American Statistician, vol. 50, no. 3, pp. 203-206, 1996.
  45. Siddiqi N., Credit risk scorecards: developing and implementing intelligent credit scoring. Hoboken, N.J: Wiley, 2006. ISBN 9780471754510
  46. Sola J. and Sevilla J., "Importance of input data normalization for the application of neural networks to complex industrial problems," Nuclear Science, IEEE Transactions on, vol. 44, no. 3, pp. 1464-1468, Jun 1997. doi: 10.1109/23.589532
  47. Talavera L., "An evaluation of filter and wrapper methods for feature selection in categorical clustering," in Advances in Intelligent Data Analysis VI, ser. Lecture Notes in Computer Science, A. Famili, J. Kok, J. Pena, A. Siebes, and A. Feelders, Eds. Springer Berlin Heidelberg, 2005, vol. 3646, pp. 440-451. ISBN 978-3-540-28795-7. [Online]. Available: http://dx.doi.org/10.1007/11552253_40
  48. Vehtari A. and Lampinen J., "Bayesian input variable selection using posterior probabilities and expected utilities," Report B31, 2002.
  49. Yang P., Liu W., Zhou B., Chawla S., and Zomaya A., "Ensemblebased wrapper methods for feature selection and class imbalance learning," in Advances in Knowledge Discovery and Data Mining, ser. Lecture Notes in Computer Science, J. Pei, V. Tseng, L. Cao, H. Motoda, and G. Xu, Eds. Springer Berlin Heidelberg, 2013, vol. 7818, pp. 544-555. ISBN 978-3-642-37452-4. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-37453-1_45
  50. Yu L. and Liu H., "Feature selection for high-dimensional data: A fast correlation-based filter solution," in ICML, vol. 3, 2003, pp. 856-863.
  51. Zdravevski E., Lameski P., and Kulakov A., "Weight of evidence as a tool for attribute transformation in the preprocessing stage of supervised learning algorithms," in Neural Networks (IJCNN), The 2011 International Joint Conference on, July 2011. doi: 10.1109/IJCNN.2011.6033219. ISSN 2161-4393 pp. 181-188.
Cited by
Show
ISSN
2300-5963
Language
eng
Share on Facebook Share on Twitter Share on Google+ Share on Pinterest Share on LinkedIn Wyślij znajomemu