BazEkon - The Main Library of the Cracow University of Economics

BazEkon home page

Main menu

Kreński Karol (The Main School of Fire Service), Fliszkiewicz Mateusz (The Main School of Fire Service)
Data Cleansing of the Fire & Rescue Text Corpus. The Case Study of Correction of the Misspellings and Segmentation into Sentences
Annals of Computer Science and Information Systems, 2014, vol. 2, s. 331 - 335, tab., bibliogr. 11 poz.
Straż pożarna, Studium przypadku, Analiza danych
Firefighters, Case study, Data analysis
The article presents a case study of applying data cleansing methods and segmentation procedures in order to correct and enhance the structure of the domain corpus of fire service. During the study we present our approach and the results in the task of correcting the misspellings, as well as the method of segmenting the corpus into sentences.(original abstract)
Full text
  1. Elzinga P., Poelmans J., Viaene S., Dedene G., and Morsing S., "Terrorist threat assessment with formal concept analysis," in Intelligence and Security Informatics (ISI), 2010 IEEE International Conference on. IEEE, 2010, pp. 77-82.
  2. Hernández M. A. and Stolfo S. J., "Real-world data is dirty: Data cleansing and the merge/purge problem," Data mining and knowledge discovery, vol. 2, no. 1, pp. 9-37, 1998.
  3. Krasuski A., Kreński K., Wasilewski P., and Łazowy S., "Granular approach in knowledge discovery," in Rough Sets and Knowledge Technology. Springer, 2012, pp. 416-421.
  4. Lee M. L., Lu H., Ling T. W., and Ko Y. T., "Cleansing data for mining and warehousing," in Database and Expert Systems Applications. Springer, 1999, pp. 751-760.
  5. Levenshtein V. I., "Binary codes capable of correcting deletions, insertions, and reversals," Soviet physics doklady, vol. 10, pp. 707-710, 1966.
  6. Müller H. and Freytag J. -C., Problems, methods, and challenges in comprehensive data cleansing. Professoren des Inst. Für Informatik, 2005.
  7. Poelmans J., Elzinga P., Dedene G., Viaene S., and Kuznetsov S., "A concept discovery approach for fighting human trafficking and forced prostitution," Conceptual Structures for Discovering Knowledge, pp. 201-214, 2011.
  8. Rudolf M. and Świdziński M., "Automatic utterance boundaries recognition in large polish text corpora," in Intelligent Information Processing and Web Mining. Springer, 2004, pp. 247-256.
  9. Wikipedia, "Zipf's law,"'s_law, [Access: 23.04.2014].
  10. Work C., "Ewidencja zdarze´n - EWID99," Abacus,,Tech. Rep., [Access: 23.04.2014].
  11. Zipf G. K., "Selected studies of the principle of relative frequency in language." 1932.
Cited by
Share on Facebook Share on Twitter Share on Google+ Share on Pinterest Share on LinkedIn Wyślij znajomemu