a little help with creating annotations out of lower case bases

teetee1

Junior Member

Join Date: May 2013

Posts: 2
- Share
- Tweet
#1

a little help with creating annotations out of lower case bases

06-23-2014, 11:05 AM

I recently noticed that UCSC genome data such as the one for C. elegans below (30MB file)

http://hgdownload.soe.ucsc.edu/goldenPath/ce10/bigZips/chromFa.tar.gz

contains lower case bases in the sequence for repeats or low complexity regions. I would like to mask them out for my mapping or variant calling by creating annotations out of those regions.

The only two format I can think of are BED and GFF but I wonder if anyone has a better idea on how to do that or if there is already an existing tool on UCSC / other tools to do so. TIA.
Tags: None
Brian Bushnell

Super Moderator

Join Date: Jan 2014

Posts: 2709
- Share
- Tweet
#2

06-23-2014, 12:07 PM

Not sure if this is exactly what you want, but BBTools contains a script that will convert lower-case letters to Ns:

reformat.sh in=reference.fasta out=masked.fasta lowercaseton

It works on gzipped files but not on tar archives, so you'll have to untar it first.
Comment

Previous template Next

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Today, 12:59 PM	0 responses 7 views 0 reactions	Last Post by seqadmin Today, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 8 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad