Generation of Bilingual Dictionaries using Structural Properties

Dubey,Ajay; Varma,Vasudeva

Generation of Bilingual Dictionaries using Structural Properties

Building bilingual dictionaries from Wikipedia has been extensively studied in the area of computation linguistics. These dictionaries play a crucial role in Natural Language Processing(NLP) applications like Cross-Lingual Information Retrieval, Machine Translation and Named Entity Recognition. To build these dictionaries, most of the existing approaches use information present in Wikipedia titles, info-boxes and categories. Interestingly, not many use the structural properties of a document like sections, subsections, etc. In this work we exploit the structural properties of documents to build a bilingual English-Hindi dictionary. The main intuition behind this approach is that documents in different languages discussing the same topic are likely to have similar structural elements. Though we present our experiments only for Hindi, our approach is language independent and can be easily extended to other languages. The major contribution of our work is that the dictionary contains translation and transliteration of words which include Named Entities to a large extent. We evaluate our dictionary using manually computed precision. We generated a massive list of 72k tokens using our approach with 0.75 precision.

Saved in:

Bibliographic Details
Main Authors:	Dubey,Ajay, Varma,Vasudeva
Format:	Digital revista
Language:	English
Published:	Instituto Politécnico Nacional, Centro de Investigación en Computación 2013
Online Access:	http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462013000200006
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:scielo:S1405-55462013000200006
record_format	ojs
spelling	oai:scielo:S1405-554620130002000062013-11-27Generation of Bilingual Dictionaries using Structural PropertiesDubey,AjayVarma,Vasudeva Bilingual dictionary comparable corpora structural elements Building bilingual dictionaries from Wikipedia has been extensively studied in the area of computation linguistics. These dictionaries play a crucial role in Natural Language Processing(NLP) applications like Cross-Lingual Information Retrieval, Machine Translation and Named Entity Recognition. To build these dictionaries, most of the existing approaches use information present in Wikipedia titles, info-boxes and categories. Interestingly, not many use the structural properties of a document like sections, subsections, etc. In this work we exploit the structural properties of documents to build a bilingual English-Hindi dictionary. The main intuition behind this approach is that documents in different languages discussing the same topic are likely to have similar structural elements. Though we present our experiments only for Hindi, our approach is language independent and can be easily extended to other languages. The major contribution of our work is that the dictionary contains translation and transliteration of words which include Named Entities to a large extent. We evaluate our dictionary using manually computed precision. We generated a massive list of 72k tokens using our approach with 0.75 precision.info:eu-repo/semantics/openAccessInstituto Politécnico Nacional, Centro de Investigación en ComputaciónComputación y Sistemas v.17 n.2 20132013-06-01info:eu-repo/semantics/articletext/htmlhttp://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462013000200006en
institution	SCIELO
collection	OJS
country	México
countrycode	MX
component	Revista
access	En linea
databasecode	rev-scielo-mx
tag	revista
region	America del Norte
libraryname	SciELO
language	English
format	Digital
author	Dubey,Ajay Varma,Vasudeva
spellingShingle	Dubey,Ajay Varma,Vasudeva Generation of Bilingual Dictionaries using Structural Properties
author_facet	Dubey,Ajay Varma,Vasudeva
author_sort	Dubey,Ajay
title	Generation of Bilingual Dictionaries using Structural Properties
title_short	Generation of Bilingual Dictionaries using Structural Properties
title_full	Generation of Bilingual Dictionaries using Structural Properties
title_fullStr	Generation of Bilingual Dictionaries using Structural Properties
title_full_unstemmed	Generation of Bilingual Dictionaries using Structural Properties
title_sort	generation of bilingual dictionaries using structural properties
description	Building bilingual dictionaries from Wikipedia has been extensively studied in the area of computation linguistics. These dictionaries play a crucial role in Natural Language Processing(NLP) applications like Cross-Lingual Information Retrieval, Machine Translation and Named Entity Recognition. To build these dictionaries, most of the existing approaches use information present in Wikipedia titles, info-boxes and categories. Interestingly, not many use the structural properties of a document like sections, subsections, etc. In this work we exploit the structural properties of documents to build a bilingual English-Hindi dictionary. The main intuition behind this approach is that documents in different languages discussing the same topic are likely to have similar structural elements. Though we present our experiments only for Hindi, our approach is language independent and can be easily extended to other languages. The major contribution of our work is that the dictionary contains translation and transliteration of words which include Named Entities to a large extent. We evaluate our dictionary using manually computed precision. We generated a massive list of 72k tokens using our approach with 0.75 precision.
publisher	Instituto Politécnico Nacional, Centro de Investigación en Computación
publishDate	2013
url	http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462013000200006
work_keys_str_mv	AT dubeyajay generationofbilingualdictionariesusingstructuralproperties AT varmavasudeva generationofbilingualdictionariesusingstructuralproperties
_version_	1756225743460761600

Generation of Bilingual Dictionaries using Structural Properties

Similar Items

Resource Map