Less is More, More or Less... Finding the Optimal Threshold for Lexicalization in Chunking

Abstract: Lexicalization of the input of sequential taggers has gone a long way since it was invented by Molina and Pla [4]. In this paper we thoroughly investigate the method introduced by Indig and Endrédy [2] to find out the best lexicalization level for chunking and to explore the behavior of different IOB representations. Both tasks are applied to the CoNLL-2000 dataset. Our goal is to introduce a transformation method to accommodate the parameters of the development set to the training set using their frequency distributions which other tasks like POS tagging or NER could benefit too.

Saved in:
Bibliographic Details
Main Author: Indig,Balázs
Format: Digital revista
Language:English
Published: Instituto Politécnico Nacional, Centro de Investigación en Computación 2017
Online Access:http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462017000400637
Tags: Add Tag
No Tags, Be the first to tag this record!