Comparative study between Part-of-Speech and statistical methods of text extraction in the tourism domain

Kuntarto, Guson P. and Moechtar, Fahmi Lutfiansyah and Santoso, Berkah I. and Gunawan, Irwan Prasetya (2015) Comparative study between Part-of-Speech and statistical methods of text extraction in the tourism domain. In: 2015 International Conference on Information Technology Systems and Innovation (ICITSI), 16-19 Nov. 2015, Bandung.

[thumbnail of pdf] Text (pdf)
SIF-Artikel-002 kuntarto2015.pdf - Published Version
Restricted to Registered users only

Download (335kB)

Abstract

In this paper, a comparison between two different text extraction methods is given, namely the linguistic (Part-of-Speech / POS) and statistical methods (Term Frequency Inverse Document Frequency / TF-IDF). Text extractions were performed as part of ontology population in the Indonesian tourism domain. This paper also contributes in creating a multimedia corpus from three different resources or websites of Balinese tourism domain. Performance of each method is evaluated by means of several relevance measures. It was found that the statistical method used gives higher relevance than the linguistic methods. We have analysed that this is due to the limitation of the reference terms used in the initial ontology from our previous research

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Bali tourism; linguistic method; ontology population; part of speech; statistical method; TF-IDF
Subjects: Computer Science
Computer Science > Database management
Computer Science > Web-Based Group Decision Support System (WGDSS) > Web-Based
Computer Science > Web-Based
Divisions: Fakultas Teknik dan Ilmu Komputer > Program Studi Informatika
Depositing User: Users 2 not found.
Date Deposited: 22 Jul 2016 03:26
Last Modified: 10 Feb 2022 02:00
URI: https://repository.bakrie.ac.id/id/eprint/125

Actions (login required)

View Item View Item