I am trying to build a comprehensive database of prokaryotic (bacteria and archea) and fungal genomes to be used for screening ancient DNA reads for contamination. What I found unfortunately that many of genomes in NCBI or EMBL databases have a lot of poly-N inserts, which obviously need to be eliminated. This can be done either by removing inserts from each FASTA record, which may be difficult, or by splitting records at poly-N inserts and trimming Ns from the ends. Is there a tool/sctipt to do this? Alternatively, I may have to abandon genomes and just concatenate GenBank relevant records, but I first will have to extract FASTA from them. Any advice?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Well, dowloaded data files need to be preprocessed by BWA-SW to make databases for local install of DeconSeq, and the author removed Ns by splitting, as BWA-SW replaces Ns with either of A, G, C, T at random (citing the paper). But it does not say in the paper how this was done...
Comment
-
I am also not convinced that you should remove N's, but if you must, you can with Biopieces (www.biopieces.org):
Code:read_fasta -i in.fna | transliterate_seq -d 'nN' | write_fasta -o out.fna -x
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
57 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment