SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
reason for low mapping rate?? miaom RNA Sequencing 3 05-10-2014 08:25 AM
Very low map rate while mapping to denovo assebly flyingoyster RNA Sequencing 6 11-19-2013 06:12 PM
The low mapping rate vivienne_lovely Bioinformatics 7 06-05-2013 06:45 PM
ChIP-Seq mapping rate aquleaf Bioinformatics 1 05-08-2012 08:45 PM
Mapping rate decreases using Tophat1.2.0 from 1.1.4 zun Bioinformatics 1 04-14-2011 06:32 PM

Reply
 
Thread Tools
Old 10-23-2014, 03:43 PM   #1
bbm
Member
 
Location: North Carolina

Join Date: Sep 2011
Posts: 38
Default how to increase the mapping rate?

I have a set of RNA-seq dataset of single end 100bp reads (30 million per sample), and first using tophat2, mapping rate is only 5% to the ref genome. Then I tried to trim raw data to 40-100bp, and mapping rate increase to 18%. I'm doing the mapping with no trimmed data right now...

I wonder what other ways I can try to increase the mapping rate? trim read range to 50-100? increase the phred score based on fastqc?

Any comments will be appreciated!
bbm is offline   Reply With Quote
Old 10-23-2014, 03:47 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Can you post the FastQC plots of what the data looks like? No point in doing random trimming of data.

Take a few reads and do an old fashioned blast to make sure the data is from your sample/correct genome. Mistakes sometimes happen at sequencing cores.
GenoMax is offline   Reply With Quote
Old 10-23-2014, 03:49 PM   #3
bbm
Member
 
Location: North Carolina

Join Date: Sep 2011
Posts: 38
Default

I just had no trimming data alignment, and it is 15%.

15.66% overall alignment rate

I will post the fastqc plots soon. Thank you!
bbm is offline   Reply With Quote
Old 10-23-2014, 03:55 PM   #4
bbm
Member
 
Location: North Carolina

Join Date: Sep 2011
Posts: 38
Default

Attached here is the fastqc before I trimmed
Attached Images
File Type: png crop.png (77.4 KB, 13 views)
bbm is offline   Reply With Quote
Old 10-23-2014, 04:00 PM   #5
bbm
Member
 
Location: North Carolina

Join Date: Sep 2011
Posts: 38
Default

This is the fastqc after I trimmed using trimmomatic, 40-100bp

java -jar /usr/local/apps/trimmomatic/Trimmomatic-0.32/trimmomatic-0.32.jar SE 1.fastq 1.trimmed.fastq ILLUMINACLIP:/usr/local/apps/trimmomatic/Trimmomatic-0.32/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:40
Attached Images
File Type: png crop.png (75.7 KB, 7 views)
bbm is offline   Reply With Quote
Old 10-23-2014, 04:12 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Q-score wise there is no issue, so the problem must lie elsewhere. It is possible to get great data that may not align at all so this is only part of the QC. Report back on the blast result. Do the GC plots look strange?
GenoMax is offline   Reply With Quote
Old 10-23-2014, 04:27 PM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I don't really understand what you mean by "trimming to 40-100bp". But, it would not surprise me if your problem was adapter contamination; do you know what kind of adapters were used? They might not be TruSeq.
Brian Bushnell is offline   Reply With Quote
Old 10-24-2014, 12:36 AM   #8
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

What organism are you working with and what is your reference?
I have seen such fastqc results just recently.
The reason was a severe rRNA contamination. Maybe mRNA enrichment / ribo-depletion didn't work (or wasn't done)?
If the respective sequences are not (or are only partially) represented in your reference, you can of course not map to them. Look at the sequence duplication levels: if there is an increase at 10k, this is an indication for that. If you are working with human samples, the relatively high GC content is another one.
To verify this, simply use the rRNA sequences as reference and map to them.
WhatsOEver is offline   Reply With Quote
Old 10-27-2014, 05:34 AM   #9
bbm
Member
 
Location: North Carolina

Join Date: Sep 2011
Posts: 38
Default

Quote:
Originally Posted by GenoMax View Post
Q-score wise there is no issue, so the problem must lie elsewhere. It is possible to get great data that may not align at all so this is only part of the QC. Report back on the blast result. Do the GC plots look strange?
here is the overall fastqc
Attached Files
File Type: pdf WO_20h_CGATGT_L003_R1_001.trimmed.pdf (804.4 KB, 7 views)
bbm is offline   Reply With Quote
Old 10-27-2014, 05:50 AM   #10
bbm
Member
 
Location: North Carolina

Join Date: Sep 2011
Posts: 38
Default

Quote:
Originally Posted by WhatsOEver View Post
What organism are you working with and what is your reference?
I have seen such fastqc results just recently.
The reason was a severe rRNA contamination. Maybe mRNA enrichment / ribo-depletion didn't work (or wasn't done)?
If the respective sequences are not (or are only partially) represented in your reference, you can of course not map to them. Look at the sequence duplication levels: if there is an increase at 10k, this is an indication for that. If you are working with human samples, the relatively high GC content is another one.
To verify this, simply use the rRNA sequences as reference and map to them.
The reference is honeybee genome, which is the 2nd version so far. Thank you for your suggestion. I think it may be the problem of low quality lib prep.
bbm is offline   Reply With Quote
Old 10-27-2014, 05:51 AM   #11
bbm
Member
 
Location: North Carolina

Join Date: Sep 2011
Posts: 38
Default

Quote:
Originally Posted by Brian Bushnell View Post
I don't really understand what you mean by "trimming to 40-100bp". But, it would not surprise me if your problem was adapter contamination; do you know what kind of adapters were used? They might not be TruSeq.
The lib was done by the NEBNext® RNA Library Prep Kit for Illumina, so it should be TruSeq adaptors.
bbm is offline   Reply With Quote
Old 10-27-2014, 09:47 AM   #12
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

From looking at the fastqc output (btw: there is a new, slightly better fastqc version available), I can only say again that it looks very similar to our rRNA "contaminated" samples. More interestingly, we also used the NEB kit...
WhatsOEver is offline   Reply With Quote
Old 10-27-2014, 10:13 AM   #13
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

@bbm: Were you mapping to the entire genome or just the transcriptome?
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:30 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO