BazEkon - Biblioteka Główna Uniwersytetu Ekonomicznego w Krakowie

BazEkon home page

Meny główne

Autor
Šulc Zdeněk (University of Economics in Prague, Czech Republic), Řezanková Hanna (University of Economics in Prague, Czech Republic)
Tytuł
Evaluation of Selected Approaches to Clustering Categorical Variables
Źródło
Statistics in Transition, 2014, vol. 15, nr 4, s. 591-610, rys., tab., bibliogr. s. 608-610
Słowa kluczowe
Analiza statystyczna, Klastry, Zmienne jakościowe, Miara podobieństwa
Statistical analysis, Business cluster, Qualitative variables, Similarity measure
Uwagi
summ.
This work was supported by the University of Economics, Prague under the project IGS F4/104/2014.
Abstrakt
This paper focuses on recently proposed similarity measures and their performance in categorical variable clustering. It compares clustering results using three recently developed similarity measures (IOF, OF and Lin measures) with results obtained using two association measures for nominal variables (Cramer's V and the uncertainty coefficient) and with the simple matching coefficient (the overlap measure). To eliminate the influence of a particular linkage method on the structure of final clusters, three linkage methods are examined (complete, single, average). The created groups (clusters) of variables can be considered as the basis for dimensionality reduction, e.g. by choosing one of the variables from a given group as a representative for the whole group. The quality of resulting clusters is evaluated by the within-cluster variability, expressed by the WCM coefficient, and by dendrogram analysis. The examined similarity measures are compared and evaluated using two real data sets from a social survey. (original abstract)
Dostępne w
Biblioteka Główna Uniwersytetu Ekonomicznego w Krakowie
Biblioteka SGH im. Profesora Andrzeja Grodka
Biblioteka Główna Uniwersytetu Ekonomicznego w Katowicach
Biblioteka Główna Uniwersytetu Ekonomicznego w Poznaniu
Biblioteka Główna Uniwersytetu Ekonomicznego we Wrocławiu
Pełny tekst
Pokaż
Bibliografia
Pokaż
  1. ANDERBERG, M. R., (1973). Cluster Analysis for Applications. Academic Press, New York.
  2. BORIAH, S., CHANDOLA, V., KUMAR, V., (2008). Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 8th International Conference on Data Mining. SIAM, pp. 243-254
  3. CHANDOLA, V., BORIAH, S., KUMAR, V., (2009). A  framework for exploring categorical data. In: Proceedings of the 9th International Conference on Data Mining. SIAM, pp. 187-198.
  4. CHAVENT, M., KUENTZ, V., LIQUET, B., SARACCO, L., (2012). ClustOfVar: An R package for the clustering of variables. Journal of Statistical Software, 50(13):1-16. Available at: [Accessed: 16 October 2014].
  5. CHAVENT, M., KUENTZ, V., SARACCO, J., (2010). A partitioning method for the CLUSTERING of categorical variables. In: Locarek-Junge, H., Weihs, C., eds, Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin Heidelberg, pp.91- 99.
  6. D'ENZA, A. I., GREENACRE, M. J., (2012). Multiple correspondence analysis for the quantification and visualization of large categorical data sets. In: Advanced Statistical Methods for the Analysis of Large Data-Sets. Springer, Berlin Heidelberg, pp. 453-463.
  7. EVERITT, B. S., LANDAU, S., LEESE, M., STAHL, D., (2011). Cluster Analysis, 5th edn, Wiley, Chichester.
  8. GAN, G., MA, C., WU, J., (2007). Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM, Philadelphia.
  9. GORDON, A. D., (1999). Classification, 2nd edn, Chapman & Hall/CRC, Boca Raton.
  10. GREENACRE, M. J., (2010). Correspondence analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(5):613-619.
  11. JOLLIFFE, I. T., (2002). Principal Component Analysis, 2nd edn, Springer, New York.
  12. LIN, D., (1998). An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, pp. 296-304.
  13. PALLA, K., KNOWLES, D. A., GHAHRAMANI, Z., (2012). A nonparametric variable clustering model. In: Pereira, F., Burges, C. J. C., Bottou, L., Weinberger, K. Q., eds, Advances in Neural Information Processing Systems 25. NIPS Foundation. Available at: [Accessed 16 October 2014].
  14. PAYNE, T. R., EDWARDS, P., (1999). Dimensionality reduction through correspondence analysis. Available at: [Accessed 16 October 2014].
  15. REZANKOVA, H., LÖSTER, T., HÜSEK, D., (2011). Evaluation of categorical data clustering. In: Mugellini, E., Szczepaniak, P. S., Pettenati, M. C. et al., eds, Advances in Intelligent Web Mastering 3. Springer Verlag, Berlin, pp. 173-182.
  16. REZANKOVA, H., (2014). Nominal variable clustering and its evaluation. In: Proceedings of the 8th International Days of Statistics and Economics. Melandrium, Slany, pp. 1293-1302. Available at: < http://msed.vse.cz/msed_2014/article/276-Rezankova-Hana-paper.pdf > [Accessed 5 November 2014].
  17. SPARCK-JONES, K., (1972, 2002). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1): 11-21. Later: Journal of Documentation, 60(5):493- 502.
  18. SULC, Z., REZANKOVA, H., (2014). Evaluation of recent similarity measures for categorical data. In: Proceedings of the 17th International Conference Applications of Mathematics and Statistics in Economics. Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu, Wroclaw, pp. 249-258. Available at: < http://www.amse.ue.wroc.pl/papers/Sulc,Rezankova.pdf> [Accessed 5 November 2014].
Cytowane przez
Pokaż
ISSN
1234-7655
Język
eng
Udostępnij na Facebooku Udostępnij na Twitterze Udostępnij na Google+ Udostępnij na Pinterest Udostępnij na LinkedIn Wyślij znajomemu