Semi-Automatic Parallel Corpora Extraction from Comparable News Corpora

Semi-Automatic Parallel Corpora Extraction from Comparable News Corpora

The parallel corpus is a necessary resource in many multi/cross lingual natural language processing applications that include Machine Translation and Cross Lingual Information Retreival. Preparation of large scale parallel corpus takes time and also demands the linguistics skill. In the present work, a technique has been developed that extracts parallel corpus between Manipuri, a morphologically rich and resource constrained Indian language and English from a comparable news corpora collected from the web. A medium sized Manipuri-English bilingual lexicon and another list of Manipuri-English transliterated entities have been developed and used in the present work. Using morphological information for the agglutinative and inflective Manipuri language, the alignment quality based on similarity measure is further improved. A high level of performance is desirable since errors in sentence alignment cause further errors in systems that use the aligned text. The system has been evaluated and error analysis has also been carried out. The technique shows its effectiveness in Manipuri-English language pair and is extendable to other resource constrained, agglutinative and inflective Indian languages.

Saved in:

Bibliographic Details
Main Authors:	Singh,Thoudam Doren, Bandyopadhyay,Sivaji
Format:	Digital revista
Language:	English
Published:	Instituto Politécnico Nacional, Centro de Innovación y Desarrollo Tecnológico en Cómputo 2010
Online Access:	http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442010000100003
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Segmenting corpora of texts
by: Sardinha,Tony Berber
Published: (2002)

The future of multimodal corpora
by: Knight,Dawn
Published: (2011)

Corpora and historical linguistics
by: Kytö,Merja
Published: (2011)

Spoken corpora and pragmatics
by: Moneglia,Massimo
Published: (2011)

Corpora and cognitive linguistics
by: Newman,John
Published: (2011)

Automatic Semantic Role Labeling using Selectional Preferences with Very Large Corpora
by: Calvo,Hiram, et al.
Published: (2008)

How Children’s Literature is Translated: Suggestions for Stylistic Research Using Parallel Corpora
by: Toolan,Michael
Published: (2018)

Corpora from a sociolinguistic perspective
by: Kendall,Tyler
Published: (2011)

PADI-web: ASF corpora
by: Roche, Mathieu, et al.
Published: (2018)

Retrieving Lexical Semantics from Multilingual Corpora
by: Shahid,Ahmad R., et al.
Published: (2010)

Total corpora mobilization for penile reconstruction
by: Barroso Jr.,Ubirajara, et al.
Published: (2022)

Collocation lists as instruments for metaphor detection in corpora
by: Sardinha,Tony Berber
Published: (2006)

Formation and maintenance of corpora lutea in laboratory animals
by: 73737 Greenwald, G.S., et al.

Formation and maintenance of corpora lutea in laboratory animals
by: 73737 Greenwald, G.S., et al.

Looking for opinion in land-use planning corpora
by: Kergosien, Eric, et al.

Spatial representations of texts from the BVLAC and PADI-WEB corpora
by: Fize, Jacques
Published: (2018)

Linguistic corpora of understudied languages: do they make sense?
by: Vinogradov,Igor
Published: (2016)

Parsing Arabic Nominal Sentences with Transducers to Annotate Corpora
by: Hammouda,Nadia Ghezaiel, et al.
Published: (2017)

Glandectomy with preservation of corpora cavernosa in the treatment of penile carcinoma
by: Fonseca,Aluizio G. da, et al.
Published: (2003)

Fracture of corpora cavernosa with massive cavernosal-venous shunts
by: Lang,Erich K., et al.
Published: (2014)

Uso de corpora na formação de tradutores
by: Sardinha,Antonio P. Berber
Published: (2003)

Children's Literature Parallel Corpora: a Hybrid Experimental Model to Evaluate Transfers of Language Complexity Via Linguistic Transcoding
by: Durão,Adja Balbino de Amorim Barbieri, et al.
Published: (2018)

Corpora amylacea in temporal lobe epilepsy associated with hippocampal sclerosis
by: Ribeiro,Marlise de Castro, et al.
Published: (2003)

DISCOVERY LEARNING IN THE LANGUAGE-FOR-TRANSLATION CLASSROOM: CORPORA AS LEARNING AIDS
by: Bernardini,Silvia
Published: (2016)

Accuracy of transrectal ultrasonography for evaluation of corpora lutea in hair sheep
by: Contreras-Solís, Ignacio, et al.
Published: (2010)

Enriching Epidemiological Thematic Features For Disease Surveillance Corpora Classification
by: Menya, Edmond, et al.

Working with specialized language: a practical guide to using corpora
by: Bowker, Lynne, et al.

Metaphor in corpora: a corpus-driven analysis of Applied Linguistics dissertations
by: Sardinha,Tony Berber
Published: (2007)

Evaluación de relaciones ontológicas en corpora de dominio restringido
by: Tovar,Mireya, et al.
Published: (2015)

Natural Language Processing Using Very Large Corpora [electronic resource] /
by: Armstrong, Susan. editor., et al.
Published: (1999)

Natural Language Processing Using Very Large Corpora [electronic resource] /
by: Armstrong, Susan. editor., et al.
Published: (1999)

Use of hCG on the induction of accessory corpora lutea in Morada Nova ewes.
by: VERGANI, Gabriel Brun, et al.
Published: (2018-12-12)

Corpora lutea morphological and echotextural attributes related to plasma progesterone concentration in heifers.
by: SIQUEIRA, L. G. B., et al.
Published: (2012-06-22)

Detailed Anatomy of the Corpora-Glans Ligament via the P45 Plastination Method
by: Jiang,Wen-Bin, et al.
Published: (2023)

Deleterious effects of progestagen treatment in VEGF expression in corpora lutea of pregnant ewes
by: Letelier, C. A., et al.
Published: (2011)

Deleterious effects of progestagen treatment in VEGF expression in corpora lutea of pregnant ewes
by: Letelier, C. A., et al.
Published: (2011)

Cell hyperproliferation in corpora allata of Blattella germanica females in the premetamorphic nymphal instar
by: Santos, Carolina G., et al.
Published: (2013-07)

Biosynthesis of juvenile hormones by locust corpora allata and possible applications to tsetse flies
by: Tobo, S.S., et al.

Detecting Salient Events in Large Corpora by a Combination of NLP and Data Mining Techniques
by: Battistelli,Delphine, et al.
Published: (2013)

USING CORPORA IN SCIENTIFIC AND TECHNICAL TRANSLATION TRAINING: RESOURCES TO IDENTIFY CONVENTIONALITY AND PROMOTE CREATIVITY
by: Rodríguez,Clara Inés López
Published: (2016)

Resource Map