SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 04:55 PM
Illumina RNA-seq: should I trim low quality bases prior to mapping? JonB RNA Sequencing 1 02-14-2013 03:49 AM
Aligning paired end read files that have different numbers of reads after trimming d1antho RNA Sequencing 2 01-21-2013 09:41 AM
Quality testing of ChIP-seq library prior to sequencing. The Bric Sample Prep / Library Generation 8 09-17-2010 04:10 PM
Considering Quality scores of reads when aligning thinkRNA Bioinformatics 2 06-01-2010 07:40 AM

Reply
 
Thread Tools
Old 02-28-2017, 02:24 PM   #1
gstone
Junior Member
 
Location: Boston

Join Date: Jan 2017
Posts: 8
Default Quality trimming prior to aligning

I am aligning reads to the GrCh37 human reference genome using STAR and am confused as to how to get the most reliable results. At first I aligned my reads with no pre-processing using STAR and got essentially 2 groups: 1 group that aligned well, with 85-95+% of reads uniquely mapping, and another group with only about 60% uniquely mapping. I then tried using sickle to trim low quality reads (q<20). Reads that mapped 85-95+% unique lost about 1 million reads and map a similar % uniquely, and reads that mapped about 60% now map >80% but lost millions of uniquely mapped reads. Though millions of reads are lost after quality trimming, on average ~90% of reads survive in every case, with a max of 21% reads lost after quality trimming. My question is which pipeline should I trust? Should I be more interested in the raw number of mapping reads, or the percentage? Are the reads lost after quality trimming untrustworthy anyways?

I have done a lot of reading on this and have yet to find a definitive consensus. Any help or guidance would be much appreciated!

Last edited by gstone; 02-28-2017 at 02:48 PM.
gstone is offline   Reply With Quote
Old 02-28-2017, 04:27 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

If you want to do quality-trimming, it should be done conservatively to avoid losing so much information that the reads no longer map uniquely, and increasing bias. Typically, I suggest either not trimming, or to at most Q8~Q12. On the other hand, you should definitely adapter-trim prior to mapping.

It's also possible to map with a more error-tolerant aligner to avoid the issues incurred by quality-trimming entirely. BBMap is slower than STAR, but yields a higher mapping rate.

Oh, and one more thing - for overlapping paired-ended reads, it's also possible to increase read quality by using the consensus in the overlapping portion, by using BBMerge like this:

Code:
bbmerge.sh in=reads.fq out=corrected.fq ecco mix vstrict adapters=default
That may negate the need for quality-trimming, and it's a very fast operation.

Last edited by Brian Bushnell; 02-28-2017 at 04:29 PM.
Brian Bushnell is offline   Reply With Quote
Old 02-28-2017, 05:02 PM   #3
gstone
Junior Member
 
Location: Boston

Join Date: Jan 2017
Posts: 8
Default

Hi Brian, thank you so much for your quick and thoughtful response. If you don't mind taking the time I was hoping to ask 2 questions. 1) How would I map the consensus file, would I treat it as a single-end read? and 2) When I use bbrepair to match the order of my paired end reads and then quality trim I get less input reads to my aligner than if I just quality trim then align, but no reads are placed in the singletons file when I call bbrepair. Why is this?

Thank you for being so generous with you time!

Last edited by gstone; 02-28-2017 at 05:06 PM.
gstone is offline   Reply With Quote
Old 02-28-2017, 05:40 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by gstone View Post
1) How would I map the consensus file, would I treat it as a single-end read?
Sorry, let me clarify -

This command assumes you have paired reads interleaved in "reads.fq". In that case, you can use BBMerge like this:

Code:
bbmerge.sh in=reads.fq out=merged.fq outu=unmerged.fq
...which produces 2 files. merged.fq contains consensus singletons made from overlapping pairs (each pair yields one read), and unmerged.fq contains interleaved pairs that did not confidently overlap so they remain unchanged. This is nice for assembly, but sometimes annoying otherwise (like for mapping). Also, some programs only accept paired reads. So you can also do this:

Code:
bbmerge.sh in=reads.fq out=corrected.fq ecco mix
...which produces a single output file that still contains paired, interleaved reads. Reads that overlapped will have the overlapping portion corrected to the consensus - where they match, the quality score is increased, and where they don't match, the quality score is set to the difference of the quality scores and the base is set to the base with the higher quality score. With this command, the result is used for mapping just like the input would have been.

Quote:
2) When I use bbrepair to match the order of my paired end reads and then quality trim I get less input reads to my aligner than if I just quality trim then align, but no reads are placed in the singletons file when I call bbrepair. Why is this?
Can you give the exact sequence of commands you are using? Also, how did the order of your paired reads get scrambled in the first place? If you start with the raw reads and use a series of tools that respects pairing, you'll never need to use repair.sh.

Quote:
Thank you for being so generous with you time!
You're welcome!
Brian Bushnell is offline   Reply With Quote
Old 02-28-2017, 06:01 PM   #5
gstone
Junior Member
 
Location: Boston

Join Date: Jan 2017
Posts: 8
Default

Great, thank you for the clarification on the first point. As for my use of BBRepair, when I was aligning with STAR I got a large amount of reads too short to map, which was due to the paired-end reads being out of order. I used BBRepair and the mapping rate drastically improved (~40% to anywhere from ~65%-97%). I then figured I'd use BBRepair then quality trim just to be safe. My repair command is as follows: bbmap/repair.sh in1=file1 in2=file2 out1=out1 out2=out2 outs=out_singles repair. As to how the reads got scrambled is a mystery to me, but I was very excited to find BBRepair.

My confusion essentially comes from the fact that using BBRepair prior to quality trimming results in fewer input reads than just quality trimming, even though BBRepair identifies no singleton reads. My understanding is that BBRepair just orders the reads between the pairs.

Thank you again for all of your help.
gstone is offline   Reply With Quote
Old 03-02-2017, 09:09 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by gstone View Post
My confusion essentially comes from the fact that using BBRepair prior to quality trimming results in fewer input reads than just quality trimming, even though BBRepair identifies no singleton reads. My understanding is that BBRepair just orders the reads between the pairs.
Your understanding is correct. The repair tool does not discard any reads - they either get sent to "out1", "out2", or "outs". How are you counting the reads before and after? Can you post the commands and results? It's possible that there's a bug somewhere, and if so, I'd like to fix it ASAP.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
alignment, bioinfomatics, quality trimming, rna sequencing, star

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:27 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO