Annotation of epidemiological information in animal disease-related news articles: guidelines and manually labelled corpus

This dataset contains two files: (i) An annotated corpus ("epi_info_corpus.xlsx") containing 486 manually annotated sentences extracted from 32 animal disease-related news articles. These news articles were obtained from the database of an event-based biosurveillance system dedicated to animal health surveillance, PADI-web (https://padi-web.cirad.fr/en/). The first sheet (‘article_metadata’) provides metadata about the news articles : (1) id_article, the unique id of a news article, (2) title, the title of the news article, (3) source, the name of the news article website, (3) publication_date, the publication date of the news article (mm-dd-yyyy) and (4) URL, the web URL of the news article. The second sheet (‘annot_sentences’) contains the annotated sentences: each row corresponds to a sentence from a news article. Each sentence has two distinct labels, Event type and Information type. The set of columns is : (1) id_article, the id of the news article to which the sentence belongs, (2) id_sentence, the unique id of the sentence, indicating its position in the news content (integer ranging from 1 to n, n being the total number of sentences in the news article), (3) sentence_text, the sentence textual content, (4) event_type, the Event type label and (5) information_type, the Information type label. Event type labels indicate the relation between the sentence and the epidemiological context, i.e. current event (CE), risk event (RE), old event (OE), general (G) and irrelevant (IR). Information type labels indicate the type of epidemiological information, i.e descriptive epidemiology (DE), distribution (DI), preventive and control measures (PCM), economic and political consequences (EPC), transmission pathway (TP), concern and risk factors (CRF), general epidemiology (GE) and irrelevant (IR). (ii) The annotation guidelines ("epi_info_guidelines.doc") providing a detailed description of each category.

Saved in:
Bibliographic Details
Main Authors: Valentin Sarah, De Waele Valérie, Vilain Aline, Arsevska Elena, Lancelot Renaud, Roche Mathieu
Other Authors: Valentin, Sarah
Format: Text corpus biblioteca
Language:English
French
Published: CIRAD Dataverse 2019
Subjects:Agricultural Sciences, Computer and Information Science, Medicine, Health and Life Sciences, annotation, disease surveillance, event-based surveillance, text mining, african swine fever,
Online Access:https://doi.org/10.18167/DVN1/YGAKNB
Tags: Add Tag
No Tags, Be the first to tag this record!