SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
Lots of unmapped reads - SOLiD bacterial RNA-seq and bowtie mapping Jean RNA Sequencing 10 01-17-2013 11:11 AM
How to deal with the RNA-seq reads in the overlpping regions of genome? tunliangliang Bioinformatics 0 03-07-2012 12:17 AM
SOLiD unmapped RNA-seq reads from TOPHAT rkk SOLiD 1 01-18-2012 04:18 AM
SOLiD unmapped RNA-seq reads from TOPHAT rkk Bioinformatics 0 12-22-2011 01:40 PM
RNA-seq reads mapping to coding regions m!x RNA Sequencing 0 02-17-2010 12:04 PM

Reply
 
Thread Tools
Old 05-27-2014, 03:56 AM   #1
ritandr
Junior Member
 
Location: Finland

Join Date: May 2014
Posts: 2
Red face Unmapped RNA-seq reads consist of repeated nucleotides (short homopolymeric regions)

Hello!

I have low mapping rate for the SOLiD RNA-seq data (organism - bacteria), around 30-40%, although usually we get 70-80%. I extracted unmapped reads and reads that have multiple hits (they are all poorly aligned and discarded from the further analysis), so:

1) average quality is the same as for good samples (~26 bases)
2) there is an enrichment of TTTTT for unmapped reads and different kind of other k-mers for multiple-hits reads (most of them consistent between samples)
4) GC content is higher (53-55%) for unmapped and muliple-hits reads than for mapped reads (40%)
3) if I look at reads, they look like they consist of short straches of repeated nucleotides:

>178_1751_207_F3
AGGGAAAGGCGAAAAGAACCCCGGCGAGGGGAGTGAAAAAGAACCTGAAACCGTGTACGT
ACAAGGAGGGGAGAT
>178_1751_758_F3
CGAAAGGCGTAGTCGATGGGAAACAGGTTAATATTCCTGTACTTGGTGTTACTGCGAAGG
GGGGACGGAGATGCG
>178_1752_2_F3
AAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGGAACTCCT
TGCATCTAAATTTAT

I also tried to assemble reads with Trinity, but all the derived contigs are mapped to our bacteria. Mapping agaist human genome did not give anything. It does not look like it is biological contamination. Checked for adapters and did trimming - nothing.

Last edited by ritandr; 05-28-2014 at 01:24 AM.
ritandr is offline   Reply With Quote
Old 05-27-2014, 05:03 AM   #2
TiborNagy
Senior Member
 
Location: Budapest

Join Date: Mar 2010
Posts: 329
Default

Just because unmapped reads does not fit to the human genome, it does not mean it is not contamination. I have found mouse contamination in tomato sequences.
TiborNagy is offline   Reply With Quote
Old 05-27-2014, 09:37 AM   #3
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Solid reads should be in colorspace; you can't accurately convert them to base-space without mapping them. So, how did you generate those base-space reads in your post? Multi-hit and unmapped reads are fundamentally different. Also, it's hard to correctly convert a poorly-aligned read to base-space.

In summary, I think you need to BLAST the original colorspace reads (assuming there's a colorspace version of BLAST) to see what they are.
Brian Bushnell is offline   Reply With Quote
Old 06-04-2014, 04:03 AM   #4
ritandr
Junior Member
 
Location: Finland

Join Date: May 2014
Posts: 2
Default

Thank you for answers,

I did not find any difference in mapping percentage using color-space reads with LIfescope and base-space with Bowtie2, so the problem is not about their conversion. I have Blasted around 1300 of unmapped reads against nucleotide db NT, there are quite a lot of reads (25%) that are mapped to rRNA genes and to complete genome sequences (50%) of several bacteria (Bacillus and Enterococcus), and these species are the same for two different 'bad' samples. But it is impossible that they contaminate our samples. If I map against Bacillus and Enterococcus species, I get higher percentage of mapped reads (40-50%), than for our bacteria, but all of them are multiple-hit reads, and almost zero of unique reads. So, it looks like rRNA contamination, but from which source - I do not understand. The samples preparation also included rRNA exclusion...
ritandr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:15 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO