Portuguese corpus-based learning using ETL

We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.

Saved in:
Bibliographic Details
Main Authors: Milidiú,Ruy Luiz, Santos,Cícero Nogueira dos, Duarte,Julio Cesar
Format: Digital revista
Language:English
Published: Sociedade Brasileira de Computação 2008
Online Access:http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003
Tags: Add Tag
No Tags, Be the first to tag this record!
id oai:scielo:S0104-65002008000400003
record_format ojs
spelling oai:scielo:S0104-650020080004000032009-03-09Portuguese corpus-based learning using ETLMilidiú,Ruy LuizSantos,Cícero Nogueira dosDuarte,Julio Cesar Entropy Guided Transformation Learning transformation-based learning decision trees natural language processing We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.info:eu-repo/semantics/openAccessSociedade Brasileira de ComputaçãoJournal of the Brazilian Computer Society v.14 n.4 20082008-12-01info:eu-repo/semantics/articletext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003en10.1007/BF03192569
institution SCIELO
collection OJS
country Brasil
countrycode BR
component Revista
access En linea
databasecode rev-scielo-br
tag revista
region America del Sur
libraryname SciELO
language English
format Digital
author Milidiú,Ruy Luiz
Santos,Cícero Nogueira dos
Duarte,Julio Cesar
spellingShingle Milidiú,Ruy Luiz
Santos,Cícero Nogueira dos
Duarte,Julio Cesar
Portuguese corpus-based learning using ETL
author_facet Milidiú,Ruy Luiz
Santos,Cícero Nogueira dos
Duarte,Julio Cesar
author_sort Milidiú,Ruy Luiz
title Portuguese corpus-based learning using ETL
title_short Portuguese corpus-based learning using ETL
title_full Portuguese corpus-based learning using ETL
title_fullStr Portuguese corpus-based learning using ETL
title_full_unstemmed Portuguese corpus-based learning using ETL
title_sort portuguese corpus-based learning using etl
description We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.
publisher Sociedade Brasileira de Computação
publishDate 2008
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003
work_keys_str_mv AT milidiuruyluiz portuguesecorpusbasedlearningusingetl
AT santosciceronogueirados portuguesecorpusbasedlearningusingetl
AT duartejuliocesar portuguesecorpusbasedlearningusingetl
_version_ 1756411109860966400