Fasta sequences for the Drosophila melanogaster Manually Curated Transposable Elements (MCTE) library

The compressed file contains two files: MCTE.fasta with the consensus TE sequences in fasta format and the MCTE_info.xlsx Excel file with information for every sequence in the MCTE.fasta file. We used the REPET package (v.2.5) (Flutre et al. 2011; Hoede et al. 2014; Quesneville et al. 2005) for performing TE annotations using a manually curated TE (MCTE) library of consensus sequences. Briefly, REPET is composed of two main pipelines, TEdenovo dedicated to de novo detection of TE families and TEannot for the annotation and analysis of TEs in genomic sequences. For the creation of the MCTE library, we first run the REPET (v.2.5) (Flutre et al. 2011; Hoede et al. 2014; Quesneville et al. 2005) TEdenovo pipeline with default parameters on 13 genomes Drosophila melanogaster natural strains. The manual curation of the identified consensuses involved three main procedures: removal of redundant sequences, the manual identification of potentially artifactual sequences and the classification of consensuses into families. Redundant sequences (consensus sequences present in more than one genome) were identified by first running the PASTEClassifier module from PASTEC with default options (Hoede et al. 2014). We also performed similarity clustering, multiple sequence alignments (MSA) of the clusters and generated consensus sequences for each MSA in order to obtain a consensus sequence representative of all the genomes. We manually explored the consensus sequences and their copies using the plotCoverage tool from REPET and discarded consensuses showing mainly a high number of small copies. The assignation of the consensus sequences into families was performed using BLAT (Kent 2002) against the curated canonical sequences of Drosophila TEs from the Berkeley Drosophila Genome Project (BDGP) v.9.4.1 (https://fruitfly.org/p_disrupt/TE.html). When no matches were found, we used RepeatMasker (v.4) (Smit 2015) with the release RepBaseRepeatMaskerEdition-20181026 of the RepBase (Bao et al. 2015).

Saved in:
Bibliographic Details
Main Author: Rech, Gabriel E.
Other Authors: European Research Council
Format: dataset biblioteca
Language:English
Published: DIGITAL.CSIC 2021-03-02
Subjects:Drosophila melanogaster, Transposable elements, Library, Consensuses, Transposable Elements Drosophila Melanogaster Rech 2021,
Online Access:http://hdl.handle.net/10261/231577
http://dx.doi.org/10.13039/501100000781
http://dx.doi.org/10.13039/501100000780
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The compressed file contains two files: MCTE.fasta with the consensus TE sequences in fasta format and the MCTE_info.xlsx Excel file with information for every sequence in the MCTE.fasta file. We used the REPET package (v.2.5) (Flutre et al. 2011; Hoede et al. 2014; Quesneville et al. 2005) for performing TE annotations using a manually curated TE (MCTE) library of consensus sequences. Briefly, REPET is composed of two main pipelines, TEdenovo dedicated to de novo detection of TE families and TEannot for the annotation and analysis of TEs in genomic sequences. For the creation of the MCTE library, we first run the REPET (v.2.5) (Flutre et al. 2011; Hoede et al. 2014; Quesneville et al. 2005) TEdenovo pipeline with default parameters on 13 genomes Drosophila melanogaster natural strains. The manual curation of the identified consensuses involved three main procedures: removal of redundant sequences, the manual identification of potentially artifactual sequences and the classification of consensuses into families. Redundant sequences (consensus sequences present in more than one genome) were identified by first running the PASTEClassifier module from PASTEC with default options (Hoede et al. 2014). We also performed similarity clustering, multiple sequence alignments (MSA) of the clusters and generated consensus sequences for each MSA in order to obtain a consensus sequence representative of all the genomes. We manually explored the consensus sequences and their copies using the plotCoverage tool from REPET and discarded consensuses showing mainly a high number of small copies. The assignation of the consensus sequences into families was performed using BLAT (Kent 2002) against the curated canonical sequences of Drosophila TEs from the Berkeley Drosophila Genome Project (BDGP) v.9.4.1 (https://fruitfly.org/p_disrupt/TE.html). When no matches were found, we used RepeatMasker (v.4) (Smit 2015) with the release RepBaseRepeatMaskerEdition-20181026 of the RepBase (Bao et al. 2015).