BazEkon - The Main Library of the Cracow University of Economics

BazEkon home page

Main menu

Ceglarek Dariusz (Wyższa Szkoła Bankowa w Poznaniu)
Zastosowanie kompresji semantycznej w zadaniach przetwarzania języka naturalnego
Applying Semantic Compression in Natural Language Processing Tasks
Zeszyty Naukowe Wyższej Szkoły Bankowej w Poznaniu, 2012, nr 40, s. 39-64, bibliogr. 28 poz.
The Poznan School of Banking Research Journal
Issue title
Information and communication technology w gospodarce opartej na wiedzy. Wybrane aspekty teoretyczne i aplikacyjne ; Information and Communication Technology in Knowledge Economy Selected Theoretical and Application Aspects
Sieć semantyczna, Ochrona własności intelektualnej
Semantic Web Service (SWS), Intellectual property protection
streszcz., summ.
Kompresja semantyczna jest techniką pozwalającą uzyskać właściwą generalizację pojęć w zależności od kontekstu, dzięki czemu można znaleźć w różnych dokumentach tę samą myśl inaczej sformułowaną lub sformułowaną z użyciem innych pojęć. Rozwój koncepcji kompresji semantycznej i opracowanie nowych algorytmów pozwolił zastosować ją do klasyfikacji dokumentów i rozbudowy struktur reprezentacji wiedzy, takich jak sieci semantyczne. W artykule przedstawiono wyniki badań nad nowymi metodami i narzędziami kompresji semantycznej, które zostały przystosowane do zadań przetwarzania języka naturalnego.(abstrakt oryginalny)

Semantic compression is a new technique that enables to attain correct generalisation of terms in a given context. Thanks to this generalisation, some common thought can be detected in different documents. The rules governing the generalisation process are based on a data structure referred to as a domain frequency dictionary. Having established the domain for a given text fragment a disambiguation of possibly many hypernyms becomes a feasible task. Semantic compression, thus informed generalisation, is possible through the use of semantic networks as a knowledge representation structure. In the light of given overview, one can see that semantic compression makes possible a number of improvements in comparison to already established Natural Language Processing techniques. These improvements along with detailed discussion of various elements of algorithms and data structures necessary to make the semantic compression a viable solution are the core of this work. The semantic compression can be applied in a variety of scenarios. The original scenario for which the semantic compression was introduced was plagiarism detection. With the increasing effort spent on development of the semantic compression, new domains of application were discovered. Thanks to the remodeling of already existing data sources to match the algorithms enabling the semantic compression, it became possible to use it as a base for an automaton. Thanks to the exploration of hypernymhyponym and synonym relations the automaton is capable of discovering new terms that may be included in the knowledge representation structures.(original abstract)
The Main Library of the Cracow University of Economics
The Library of Warsaw School of Economics
The Library of University of Economics in Katowice
The Main Library of Poznań University of Economics and Business
Full text
  1. Baeza-Yates R.A., Ribeiro-Neto B., Modern Information Retrieval, Addison-Wesley Longman Publishing, Boston 1999.
  2. Baziz M., Towards a Semantic Representation of Documents by Ontology-Document Mapping, w: Artificial Intelligence: Methodology, Systems, and Applications. 11th International Conference, AIMSA 2004, Varna, Bulgaria, September 2-4, 2004. Proceedings, red. Ch. Bussler, D. Fensel, Springer, 2004, "Lecture Notes in Computer Science" 2004, t. 3192, s. 33-43.
  3. Boyd-Graber J., Blei D.M., Zhu X., A Topic Model for Word Sense Disambiguation, w: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, June 2007, s. 1024-1033.
  4. Burrows S., Tahaghoghi S.M.M., Zobel J., Efficient plagiarism detection for large code repositories, "Software: Practice and Experience" 2007, t. 37, nr 2, s. 151-175.
  5. Ceglarek D., Zastosowanie sieci semantycznej do disambiguacji pojęć w języku naturalnym, w: Systemy wspomagania organizacji SWO 2006, Wyd. AE w Katowicach, Katowice 2006.
  6. Ceglarek D., Koncepcja komponentowego systemu ochrony własności intelektualnej wykorzystującego semantyczne struktury informacji, w: Technologie informatyczne w zarządzaniu wiedzą - uwarunkowania i realizacja, red. P. Adamczewski, M. Zakrzewicz, Wyd. WSB w Poznaniu, Poznań 2009.
  7. Ceglarek D., Haniewicz K., Rutkowski W., Quality of Semantic Compression in Classification, w: Computational Collective Intelligence, Second International Conference, ICCCI 2010, Kaohsiung, Taiwan, November 10-12, 2010. Proceedings, cz. 1, red. J.-S. Pan, S.-M. Chen, N.T. Nguyen, Springer-Verlag, Berlin - Heidelberg 2010, "Lecture Notes in Computer Science" 2010, t. 6421, s. 162-171.
  8. Ceglarek D., Haniewicz K., Rutkowski W., Semantic Compression for Specialised Information Retrieval Systems, w: Advances in Intelligent Information and Database Systems, red. N.T. Nguyen, R. Katarzyniak, S.-M. Chen, Springer Verlag, Berlin - Heidelberg 2010, "Studies in Computational Intelligence" 2010, t. 283, s. 111-121.
  9. Ceglarek D., Haniewicz K., Rutkowski W., Domain Based Semantic Compression for Automatic Text Comprehension Augmentation and Recommendation, w: Computational Collective Intelligence. Technologies and Applications. Third International Conference, ICCCI 2011, Gdynia, Poland, September 21-23, 2011, Proceedings, t. 2, red. P. Jędrzejowicz, N.T. Nguyen, K. Hoang, Springer-Verlag, Berlin - Heidelberg 2011, "Lecture Notes in Computer Science" 2011, t. 6923, s. 40-49.
  10. Ceglarek D., Haniewicz K., Rutkowski W., Towards Knowledge Acquisition with WiSENet, w: New Challenges for Intelligent Information and Database Systems, red. N.T. Nguyen, B. Trawinski, J.J. Jung, Springer Verlag, Berlin - Heidelberg 2011, "Studies in Computational Intelligence" 2011, t. 351, s. 75-84.
  11. Ceglarek D., Haniewicz K., Fast Plagiarism Detection by Sentence Hashing, w: Artificial Intelligence and Soft Computing. 11th International Conference, ICAISC 2012, Zakopane, Poland, April 29-May 3, 2012, Proceedings, t. 2, red. L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh, J.M. Zurada, Springer-Verlag, Berlin - Heidelberg 2012, "Lecture Notes in Computer Science" 2012, t. 7268, s. 30-38.
  12. Erk K., Pad'o S., A Structured Vector Space Model for Word Meaning in Context, w: EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA 2008, s. 897-906.
  13. Hotho A., Staab S., Stumme G., Explaining Text Clustering Results Using Semantic Structures, w: Knowledge Discovery in Databases: PKDD 2003. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, September 22-26, 2003, Proceedings, red. N. Lavrač, D. Gamberger, H. Blockeel, L. Todorovski, PKDD, Springer Verlag, Berlin - Heidelberg 2003, "Lecture Notes in Computer Science" 2003, t. 2838, s. 217-228.
  14. Information Retrieval: Data Structures & Algorithms, red. W.B. Frakes, R.A. Baeza-Yates, Prentice-Hall, 1992.
  15. Krovetz R., Croft W.B., Lexical ambiguity and information retrieval, "ACM Transactions on Information Systems" 1992, nr 10, s. 115-141.
  16. Lukashenko R., Graudina V., Grundspenkis J., Computer-based plagiarism detection methods and tools: an overview, w: Proceedings of the 2007 International Conference on Computer Systems and Technologies, CompSysTech '07. New York, USA, ACM, 2007, s. 401-406.
  17. Miller G.A., Wordnet: a lexical database for English, "Communications of the ACM" 1995, t. 38, nr 11.
  18. Miłkowski M., Automated Building of Error Corpora of Polish, w: Corpus Linguistics, Computer Tools, and Applications - State of the Art, PALC 2007, red. B. Lewandowska-Tomaszczyk, Peter Lang, Frankfurt am Main 2008, s. 631-639.
  19. Nock R., Nielsen F., On weighting clustering, "The IEEE Transactions on Pattern Analysis and Machine Intelligence" 2006, nr 28(8), s. 1223-1235.
  20. Ota T., Masuyama S., Automatic plagiarism detection among term papers, w: Proceedings of the 3rd International Universal Communication '09, ACM, 2009, s. 395-399.
  21. Percova N.N., On the types of semantic compression of text, w: COLING '82. Proceedings of the 9th conference on Computational linguistics, t. 2, Academia Praha, 1982, s. 229-231.
  22. Rosenzweig J., Mihalcea R., Csomai A., "WordNet bibliography". Web page: a bibliography referring to research involving the WordNet lexical database, wordnet [1.09.2007].
  23. Sanderson M., Word Sense Disambiguation and Information Retrieval, w: SIGIR '94. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, red. W.B. Croft, C.J. van Rijsbergen, SIGIR, ACM/Springer, New York 1994, s. 142-151.
  24. Sanderson M., Retrieving with Good Sense, "Information Retrieval" 2000, t. 2, nr 1, s. 49-69.
  25. Sinha R., Mihalcea R., Unsupervised graph-basedword sense disambiguation using measures of word semantic similarity, w: International Conference on Semantic Computing ICSC 2007, IEEE 2007, s. 363-369.
  26. Snow R., Jurafsky D., Ng A.Y., Learning syntactic patterns for automatic hypernym discovery, w: Advances in Neural Information Processing Systems (NIPS), 2005.
  27. Staab S., Hotho A., Ontology-based text document clustering, w: IIS, Advances in Soft Computing, red. M.A. Kłopotek, S.T. Wierzchoń, K. Trojanowski, Springer, 2003, s. 451-452.
  28. Stokoe Ch., Oakes M.P., Tait J., Word Sense Disambiguation in Information Retrieval Revisited, SIGIR, 2003.
Cited by
Share on Facebook Share on Twitter Share on Google+ Share on Pinterest Share on LinkedIn Wyślij znajomemu