- Autor
- Woźniak Rafał (Lodz University of Technology), Ożdżyński Piotr (Lodz University of Technology), Zakrzewska Danuata (Lodz University of Technology)
- Tytuł
- Cluster Analysis of Medical Text Documents by Using Semi-Clustering Approach Based on GRAPH Representation
- Źródło
- Information Systems in Management, 2018, vol. 7, nr 3, s. 213-224, rys., tab., bibliogr. 14 poz.
Systemy Informatyczne w Zarządzaniu - Słowa kluczowe
- Analiza skupień, Eksploracja tekstu
Cluster analysis, Text mining - Uwagi
- summ.
- Abstrakt
- The development of Internet resulted in an increasing number of online text repositories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may significantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semi-clustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multi-label classification process. The performance of the method is examined by experiments conducted on real medical datasets. (original abstract)
- Pełny tekst
- Pokaż
- Bibliografia
- Tsoumakas G., Katakis I., Vlahavas I. (2008) Effective and Efficient Multilabel Classification in Domains with Large Number of Labels, Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD'08, 30-44.
- Balasubramanian K., Lebanon G. (2012) The Landmark Selection Method for Multiple Output Prediction, Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, 983-990.
- Read J., Pfahringer B., Holmes G. (2008) Multi-label Classification Using Ensembles of Pruned Sets, Proceedings of 8th IEEE International Conference on Data Mining, 995-1000.
- Bi W., Kwok J. (2013) Efficient Multi-label Classification with Many Labels, Proceedings of the 30th International Conference on International Conference on Machine Learning 28, Atlanta, Georgia, USA, III-405-III-413.
- Hsu D., Kakade S.M., Langford J., Zhang T. (2009) Multi-label Prediction via Compressed Sensing, Bengio Y., Schuurmans D., Lafferty J.D., Williams C.K.I., Culotta A. [eds]: Advances in Neural Information Processing Systems 22, Curran Associates Inc., 772-780.
- Lin Z., Ding G., Hu M., Wang J. (2014) Multi-label Classification via Feature-aware Implicit Label Space Encoding, Proceedings of the 31st International Conference on International Conference on Machine Learning 32, Beijing, China, II-325-II-333.
- Chen Y.-N., Lin H.-T. (2012) Feature-aware Label Space Dimension Reduction for Multi-label Classification, Proceedings of the 25th International Conference on Neural Information Processing Systems 1, Nevada, USA, 1529-1537.
- Herrera F., Charte F., Rivera A.J., del Jesus M.J. (2016) Multilabel Classification. Problem Analysis, Metrics and Techniques, Springer Switzerland.
- Hangal S., MacLean D., Lam M.S., Heer J. (2010) All Friends are Not Equal: Using Weights in Social Graphs to Improve Search, Proceedings of the 4th ACM Workshop on Social Network Mining and Analysis, Washington, USA, 1-7.
- Andersen J.S., Zukunft O. (2016) Semi-Clustering that Scales: An Empirical Evaluation of GraphX, Proceedings of the 2016 IEEE International Congress on Big Data, San Francisco, USA, 333-336.
- Malewicz G., Austern M.H., Bik A.J.C., Dehnert J.C., Horn I., Leiser N., Czajkowski G. (2010) Pregel: A System for Large-Scale Graph Processing, Proceedings of the 2010 International Conference on Management of Data, New York, USA, 135-146.
- http://disi.unitn.it/moschitti/corpora.htm (accessed November 20, 2017).
- http://grafos.ml/okapi.html (accessed November 20, 2017).
- Boring C.C., Squires T.S., Tong T. (1991) Cancer statistics, 1991, CA: A Cancer Journal for Clinicians, 41(6), 19-36.
- Cytowane przez
- ISSN
- 2084-5537
- Język
- eng
- URI / DOI
- http://dx.doi.org/0.22630/ISIM.2018.7.3.19