BazEkon - Biblioteka Główna Uniwersytetu Ekonomicznego w Krakowie

BazEkon home page

Meny główne

Autor
Woźniak Rafał (Lodz University of Technology), Ożdżyński Piotr (Lodz University of Technology), Zakrzewska Danuata (Lodz University of Technology)
Tytuł
Cluster Analysis of Medical Text Documents by Using Semi-Clustering Approach Based on GRAPH Representation
Źródło
Information Systems in Management, 2018, vol. 7, nr 3, s. 213-224, rys., tab., bibliogr. 14 poz.
Systemy Informatyczne w Zarządzaniu
Słowa kluczowe
Analiza skupień, Eksploracja tekstu
Cluster analysis, Text mining
Uwagi
summ.
Abstrakt
The development of Internet resulted in an increasing number of online text repositories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may significantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semi-clustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multi-label classification process. The performance of the method is examined by experiments conducted on real medical datasets. (original abstract)
Pełny tekst
Pokaż
Bibliografia
Pokaż
  1. Tsoumakas G., Katakis I., Vlahavas I. (2008) Effective and Efficient Multilabel Classification in Domains with Large Number of Labels, Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD'08, 30-44.
  2. Balasubramanian K., Lebanon G. (2012) The Landmark Selection Method for Multiple Output Prediction, Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, 983-990.
  3. Read J., Pfahringer B., Holmes G. (2008) Multi-label Classification Using Ensembles of Pruned Sets, Proceedings of 8th IEEE International Conference on Data Mining, 995-1000.
  4. Bi W., Kwok J. (2013) Efficient Multi-label Classification with Many Labels, Proceedings of the 30th International Conference on International Conference on Machine Learning 28, Atlanta, Georgia, USA, III-405-III-413.
  5. Hsu D., Kakade S.M., Langford J., Zhang T. (2009) Multi-label Prediction via Compressed Sensing, Bengio Y., Schuurmans D., Lafferty J.D., Williams C.K.I., Culotta A. [eds]: Advances in Neural Information Processing Systems 22, Curran Associates Inc., 772-780.
  6. Lin Z., Ding G., Hu M., Wang J. (2014) Multi-label Classification via Feature-aware Implicit Label Space Encoding, Proceedings of the 31st International Conference on International Conference on Machine Learning 32, Beijing, China, II-325-II-333.
  7. Chen Y.-N., Lin H.-T. (2012) Feature-aware Label Space Dimension Reduction for Multi-label Classification, Proceedings of the 25th International Conference on Neural Information Processing Systems 1, Nevada, USA, 1529-1537.
  8. Herrera F., Charte F., Rivera A.J., del Jesus M.J. (2016) Multilabel Classification. Problem Analysis, Metrics and Techniques, Springer Switzerland.
  9. Hangal S., MacLean D., Lam M.S., Heer J. (2010) All Friends are Not Equal: Using Weights in Social Graphs to Improve Search, Proceedings of the 4th ACM Workshop on Social Network Mining and Analysis, Washington, USA, 1-7.
  10. Andersen J.S., Zukunft O. (2016) Semi-Clustering that Scales: An Empirical Evaluation of GraphX, Proceedings of the 2016 IEEE International Congress on Big Data, San Francisco, USA, 333-336.
  11. Malewicz G., Austern M.H., Bik A.J.C., Dehnert J.C., Horn I., Leiser N., Czajkowski G. (2010) Pregel: A System for Large-Scale Graph Processing, Proceedings of the 2010 International Conference on Management of Data, New York, USA, 135-146.
  12. http://disi.unitn.it/moschitti/corpora.htm (accessed November 20, 2017).
  13. http://grafos.ml/okapi.html (accessed November 20, 2017).
  14. Boring C.C., Squires T.S., Tong T. (1991) Cancer statistics, 1991, CA: A Cancer Journal for Clinicians, 41(6), 19-36.
Cytowane przez
Pokaż
ISSN
2084-5537
Język
eng
URI / DOI
http://dx.doi.org/0.22630/ISIM.2018.7.3.19
Udostępnij na Facebooku Udostępnij na Twitterze Udostępnij na Google+ Udostępnij na Pinterest Udostępnij na LinkedIn Wyślij znajomemu