- Author
- Rozmus Dorota (Uniwersytet Ekonomiczny w Katowicach)
- Title
- Miara stabilności w wyborze liczby grup w taksonomii zagregowanej z zastosowaniem analizy spektralnej i metody propagacji podobieństwa
Stability Measure in the Selection of the Numer of Groups in Aggregate Taxonomy Using Spectral Analysis and the Method of Affinity Propagation - Source
- Wiadomości Statystyczne, 2024, nr 3, s. 1-17, wykr., tab., bibliogr. s. 15-17
- Keyword
- Taksonomia, Metody grupowania, Rozwój zrównoważony
Taxonomy, Grouping methods, Sustainable development - Note
- JEL Classification: C38
streszcz., summ. - Abstract
- Od lat 90. XX w. częstymi tematami rozważań w dziedzinie taksonomii są podejście zagregowane i stabilność metod grupowania. Dotychczas były one rozpatrywane osobno, ale w ostatnim czasie pojawiła się w literaturze propozycja połączenia tych dwóch pojęć - miara stabilności (ang. proportion of ambiguously clustered pairs - PAC), którą można zastosować w podejściu zagregowanym w taksonomii i która ma służyć jako kryterium wyboru optymalnej liczby grup. Celem badania omawianego w artykule jest porównanie wyników wyboru optymalnej liczby grup w taksonomii zagregowanej na przykładzie realizacji trzech Celów Zrównoważonego Rozwoju w krajach UE. Wykorzystano miarę PAC i wybrane klasyczne indeksy: Calińskiego-Harabasza, Dunna i Daviesa-Bouldina. Jako metody bazowe w podejściu zagregowanym zastosowano propagację podobieństwa (ang. affinity propagation method) i taksonomię spektralną (ang. spectral clustering). Badanie opierało się na danych z bazy Eurostatu za 2019 r. Uzyskane rezultaty świadczą o tym, że zarówno wybór kryterium ustalania liczby grup, jak i metody bazowej w taksonomii zagregowanej wpływają na ostateczne rozstrzygnięcie dotyczące ustalenia liczby grup. Bez względu na to, czy stosowano metodę propagacji podobieństwa czy taksonomię spektralną z klasycznymi indeksami, albo też metody te wykorzystywano jako bazowe w podejściu zagregowanym i wybierano liczbę grup za pomocą miary PAC, rozbieżności we wskazywanej liczbie grup okazywały się bardzo duże. (abstrakt oryginalny)
Since the 1990s, the aggregate approach and the stability of grouping methods have been concepts frequently discussed in the field of taxonomy. So far, they have been considered separately, but recently, there has been a postulate in the literature to combine these two concepts in the form of the PAC (proportion of ambiguously clustered pairs) stability measure, which can be used in the aggregate approach to taxonomy and which is intended to serve as a criterion for selecting an optimal number of groups. The aim of the research presented in this article is to compare the results of the selection of the optimal number of groups in aggregate taxonomy on the example of the attainment of three Sustainable Development Goals (SDGs) by EU countries. The PAC measure and selected classic indices, namely the Caliński-Harabasz, the Dunn and the Davies-Bouldin indices were applied for this purpose. The affinity propagation method and spectral clustering served as base methods in the aggregate approach. The study used Eurostat data for 2019. The obtained results demonstrate that both the choice of the criterion for determining the number of groups and the choice of the base method in aggre- gate taxonomy have an influence on the final decision on how to determine the number of groups. Regardless of whether the affinity propagation method or spectral taxonomy with classic indices was used, or whether these methods were used as base ones in the aggregate approach and the number of groups was selected using the PAC measure, the differences between the indicated numbers of groups were very large. (original abstract) - Accessibility
- The Main Library of the Cracow University of Economics
The Library of University of Economics in Katowice - Full text
- Show
- Bibliography
- Ben-Hur, A., Guyon, I. (2003). Detecting stable clusters using principal component analysis. W: M. J. Brownstein, A. B. Kohodursky (red.), Functional Genomics: Methods and Protocols (s. 159-182). Humana press. https://doi.org/10.1385/1-59259-364-X:159.
- Bodenhofer, U., Kothmeier, A., Hochreiter, S. (2011). APCluster: an R package for affinity propagation clustering. Bioinformatics, 27(17), 2463-2464. https://doi.org/10.1093/bioinformatics/btr406.
- Brock, G., Pihur, V., Datta, S., Datta, S. (2008). clValid: An R Package for Cluster Validation. Journal of Statistical Software, 25(4), 1-22. https://doi.org/10.18637/jss.v025.i04.
- Chiu, D. S., Talhouk, A. (2018). diceR: an R package for class discovery using an ensemble driven approach. BMC Bioinformatics, 19(11), 1-4. https://doi.org/10.1186/s12859-017-1996-y.
- Dudoit, S., Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19(9), 1090-1099. https://doi.org/10.1093/bioinformatics/btg038.
- Fang, Y., Wang, J. (2012). Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56(3), 468-477. https://doi.org/10.1016/j.csda.2011.09.003.
- Fred, A. L. N., Jain, A. K. (2002). Data clustering using evidence accumulation. W: 2002 International Conference on Pattern Recognition (s. 276-280). IEEE. https://doi.org/10.1109/ICPR .2002.1047450.
- Fred, A. L. N., Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835-850. https://doi.org/10.1109/TPAMI.2005.113.
- Frey, B. J., Dueck, D. (2007). Clustering by Passing Messages Between Data Points. Science, 315(5814), 972-976. https://doi.org/10.1126/science.1136800.
- Henning, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52(1), 258-271. https://doi.org/10.1016/j.csda.2006.11.025.
- Hornik, K. (2005). A CLUE for CLUster Ensembles. Journal of Statistical Software, 14(12), 1-25. https://doi.org/10.18637/jss.v014.i12.
- Kannan, R., Vempala, S., Vetta, A. (2004). On clustering: Good, Bad and Spectral. Journal of the ACM, 51(3), 497-515. https://doi.org/10.1145/990308.990313.
- Kuncheva, L. I., Vetrov, D. P. (2006). Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798-1808. https://doi.org/10.1109/TPAMI.2006.226.
- Leisch, F. (1999). Bagged Clustering (SFB Working Papers No. 51). https://doi.org/10.57938/9b129f95-b53b-44ce-a129-5b7a1168d832.
- Leone, M., Sumedha, Weigt, M. (2007). Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics, 23(20), 2708-2715. https://doi.org/10.1093/bioinformatics/btm414.
- Lord, E., Willems, M., Lapointe, F. J., Makarenkov, V. (2017). Using the stability of objects to determine the number of clusters in datasets. Information Sciences, 393, 29-46. https://doi.org/10.1016/j.ins.2017.02.010.
- Marino, V., Presti, L. L. (2019). Stay in touch! New insights into end-user attitudes towards engagement platforms. Journal of Consumer Marketing, 36(6), 772-783. https://doi.org/10.1108/JCM-05-2018-2692.
- Meng, J., Hao, H., Luan, Y. (2016). Classifier ensemble selection based on affinity propagation clustering. Journal of Biomedical Informatics, 60, 234-242. https://doi.org/10.1016/j.jbi.2016.02.010.
- Monti, S., Tamayo, P., Mesirov, J., Golub, T. (2003). Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52(1-2), 91-118. https://doi.org/10.1023/A:1023949509487.
- Ng, A. Y., Jordan, M. I., Weiss, Y. (2001). On Spectral Clustering: Analysis and an algorithm. W: T. G. Dietterich, S. Becker, Z. Ghahramani (red.), Advances in Neural Information Processing Systems 14. The MIT Press.
- Rozmus, D. (2011). Porównanie stabilności zagregowanych algorytmów taksonomicznych opartych na macierzy współwystąpień. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu. Research Papers of Wrocław University of Economics, (176), 212-220.
- Rozmus, D. (2013). Porównanie dokładności taksonomicznej metody propagacji podobieństwa oraz zagregowanych algorytmów taksonomicznych opartych na idei metody bagging. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu. Research Papers of Wrocław University of Economics, (279), 106-114.
- Rozmus, D. (2021). The Number of Groups in an Aggregated Approach in Taxonomy with the Use of Stability Measures and Classical Indices - A Comparative Analysis. Acta Universitatis Lodziensis. Folia Oeconomica, 6(357), 55-67. https://doi.org/10.18778/0208-6018.357.04.
- Rozmus, D. (2022). Cluster Ensemble Stability in Clustering of EU Members in Terms of Sustainable Development Goals. W: K. Jajuga, G. Dehnel, M. Walesiak (red.), Modern Classification and Data Analysis. Methodology and Applications to Micro- and Macroeconomic Problems (s. 289-301). Springer. https://doi.org/10.1007/978-3-031-10190-8_20.
- Șenbabaoğlu, Y., Michailidis, G., Li, J. Z. (2014). Critical limitations of consensus clustering in class discovery. Scientific Reports, 4, 1-13. https://doi.org/10.1038/srep06207.
- Shamir, O., Tishby, N. (2008). Cluster stability for finite samples. W: J. C. Platt, D. Koller, Y. Singer, S. T. Roweis (red.), Advances in Neural Information Processing Systems 20 (NIPS 2007) (s. 1297-1304). Curran Associates.
- Shi, J., Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888-905. https://doi.org/10.1109/34.868688.
- Suzuki, R., Shimodaira, H. (2006). Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22(12), 1540-1542. https://doi.org/10.1093/bioinformatics/btl117.
- Volkovich, Z., Barzily, Z., Toledano-Kitai, D., Avros, R. (2010). The Hotteling's metric as a cluster stability measure. Computer Modelling and New Technologies, 14(4), 65-72. http://www.cmnt.lv/upload-files/ns_3914_4_cmnt2010.pdf.
- Yu, Z., Li, L., Liu, J., Zhang, J., Han, G. (2015). Adaptive noise immune cluster ensemble using affinity propagation. IEEE Transactions on Knowledge and Data Engineering, 27(12), 3176-3189. https://doi.org/10.1109/TKDE.2015.2453162.
- Cited by
- ISSN
- 0043-518X
- Language
- pol
- URI / DOI
- http://dx.doi.org/10.59139/ws.2024.03.1