SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Single end read with paired end reads tahamasoodi Bioinformatics 2 01-16-2016 08:46 AM
MetaSim: why paired end reverse read is much shorter than forward read?? gen_argentino Bioinformatics 0 09-06-2012 07:38 AM
Average Read Coverage for 454 paired end read data lisa1102 Core Facilities 8 10-18-2011 09:40 AM
Difference in paired-end and single-end read ? darshan Bioinformatics 1 10-01-2009 12:44 AM

Reply
 
Thread Tools
Old 12-15-2016, 04:40 PM   #101
tshalev
Junior Member
 
Location: Canada

Join Date: Dec 2016
Posts: 4
Default

The latest version, release 36_62. I'll keep you posted.
tshalev is offline   Reply With Quote
Old 12-15-2016, 05:48 PM   #102
j.m.c
Junior Member
 
Location: Vancouver BC

Join Date: Dec 2016
Posts: 2
Default

Thank you for your reply.

Yes, my reads were 87 bp after trimming with trimmomatic. I had also removed adapter sequences with trimmomatic and now I think I see the issue if understood correctly what you said:

"The 35bp reads you ended up with are because of the short insert. When you have 2x87bp reads with a 35bp insert, you get 35bp of overlap on the 3' end and then 52bp of the 5' end overhanging on each side; that's adapter sequence. BBMerge trims that off so you are left with only the 35bp of genomic sequence. "

That means the overhangs are removed since BBmerge thinks they are adapter sequences. My reads are from RNA-seq data (not genomic data, I am sorry I didn't specify earlier) and since I removed adapter sequences with trimmomatic, I am actually loosing data if the 5' overhangs were trimmed off...

Is there any way to prevent that with BBmerge?

Otherwise I will try BBmerge with my raw reads without removing adapters.

Thanks!
j.m.c is offline   Reply With Quote
Old 12-15-2016, 07:36 PM   #103
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

If you know your adapter sequences (or have a list of typical adapter sequences, or actually, you can just say "adapter=default"), you can do this:

Code:
bbmerge.sh in=reads.fq adapter=adapter.fa out=merged.fq outu=unmerged.fq
If BBMerge thinks that you still have untrimmed adapters in those cases... I am quite confident it is correct. Adapter-trimming programs are not perfect (nor is BBMerge or BBDuk). I recommend BBDuk for adapter-trimming because it uses both adapter sequences and overlap information (very conservatively), but you will still end up with some untrimmed reads that actually had adapters. The problem is that Illumina sequence quality declines with each cycle, so by the end of the read (the part that typically overlaps, or has adapter sequence) the error rate can be pretty high. If you use an adapter-trimming program that solely uses sequence-matching to a list of provided adapter sequences, then the high mismatch rate will yield poor adapter-trimming for low-quality reads. BBDuk with the "tbo" flag uses both adapter sequences and overlap information, which for short-insert reads, gives added weight to the high-quality initial bases in a read pair.

So - it's not surprising that Trimmomatic did not do complete trimming. I recommend you use BBDuk instead. It still won't give perfect adapter-trimming, but it will be much better than Trimmomatic.
Brian Bushnell is offline   Reply With Quote
Old 02-15-2017, 04:38 PM   #104
peerah
Junior Member
 
Location: USA

Join Date: Jan 2015
Posts: 6
Default

Hi Brian! I have a question: I am working on a fungal ITS metagenomic amplicon library with a pretty wide variation in sizes (200-500 bp). We are doing 2x300, and my second reads are a little bit lower in quality compared to the firsts. Is there any setting on the BBMerge that I should modify in order to get the most out of the data? I'm pretty new to the field, so please let me know if you need more information! Thank you.
peerah is offline   Reply With Quote
Old 02-15-2017, 04:52 PM   #105
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Hi! With that range you should have a worst a 100bp overlap, which is plenty. But 2x300 MiSeq runs have had major quality problems in the past, so it's possible that trimming would help. I'd suggest adding the flags "qtrim2 trimq=10,15". This will first try to merge the reads, and if unsuccessful (because the quality was too low so there were too many mismatches) quality-trim to Q10 on the right side and retry; then if still unsuccessful do the same at Q15. This isn't necessary unless the data is quite bad, but it will generally increase your merge rate, and is better than simply quality-trimming all reads prior to merging.
Brian Bushnell is offline   Reply With Quote
Old 04-11-2017, 04:37 AM   #106
mdavrandi
Junior Member
 
Location: London, UK

Join Date: Apr 2017
Posts: 1
Default

Quote:
Originally Posted by peerah View Post
Hi Brian! I have a question: I am working on a fungal ITS metagenomic amplicon library with a pretty wide variation in sizes (200-500 bp). We are doing 2x300, and my second reads are a little bit lower in quality compared to the firsts. Is there any setting on the BBMerge that I should modify in order to get the most out of the data? I'm pretty new to the field, so please let me know if you need more information! Thank you.
Hi Peerah,

We are having the same problem in our lab with 2x300 miseq runs- very poor Read 2 >Q30 scores- and I was wondering if Brian`s recommendation improved the number of paired-sequences you obtained from that run.

Cheers
mdavrandi is offline   Reply With Quote
Old 04-11-2017, 05:02 AM   #107
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,576
Default

Quote:
Originally Posted by mdavrandi View Post
Hi Peerah,

We are having the same problem in our lab with 2x300 miseq runs- very poor Read 2 >Q30 scores- and I was wondering if Brian`s recommendation improved the number of paired-sequences you obtained from that run.

Cheers
In case you had missed this post that has first explanation for poor read 2 scores.
GenoMax is offline   Reply With Quote
Old 10-10-2017, 05:11 PM   #108
ashuchawla
Member
 
Location: san diego

Join Date: Jan 2012
Posts: 38
Default Confusion regarding read merging

Dear Brian, or anybody else who could help me,

I used the following command for BBMerge:
bbmerge.sh in=reads.fq out=merged.fq pfilter=1

I got theses stats:
Pairs: 2545201
Joined: 1491688 58.61%
Ambiguous: 439613 17.27%
No Solution: 613393 24.10%
Too Short: 0 0.00%
Avg Insert: 322.6

My questions:
1. What happens to the bases while read merging if there is a mismatch outside of the 12 bases this command considers. As per my understanding, Minimum number of overlapping bases to allow merging is 12. In other words, could you please explain exactly how does the merge happen between two paired end reads when I use the above mentioned command for a perfect overlap?

2. Could you please explain, what do "Ambiguous" and "No solution" mean?

Thank you so much,
Ashu
ashuchawla is offline   Reply With Quote
Old 10-11-2017, 01:48 PM   #109
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Hi Ashu,

"Ambiguous" means there are multiple possible overlaps. For example, if read 1 and read 2 both end with "ACACACACACACACACACACAC", there are lots of possible overlap frames, none of which is particularly better than another. So, that would be ambiguous.

"No solution" means there is no overlap satisfying BBMerge's fairly strict criteria for the number of matching and mismatching bases in the best possible overlap frame.

If there is no frame in which the length, entropy (this determines the minimum necessary length), number of matching bases, and number of mismatching bases satisfy the cutoffs, the pair will not be merged and it will be declared "No solution". If there are multiple frames satisfying those cutoffs, and the second-best frame is sufficiently close to the best frame that it's really hard to tell which one is correct, the pair will not be merged and it will be declared "Ambiguous".

The pair will only be merged if there seems to be an unambiguously good solution.

"minoverlap=12" means that reads will never be merged if the best overlap is shorter than 12 bp. pfilter=1 will prevent reads from merging if there are any mismatches (I don't particularly recommend this, but it might be useful in some situations...). pfilter means probability filter, and considers the base qualities, so a read with a mismatch on a Q2 base might pass while an otherwise identical read with a mismatch in a Q40 base might fail. BBMerge will still look for all possible overlaps, and if, say, you have a 30bp overlap with 1 mismatch and a 20bp overlap with 0 mismatches, that would still be declared ambiguous.

Incidentally! The BBMerge paper was accepted by PLOS ONE and will be published soon, so you can read all the algorithmic details there =) But I don't actually know the date it will be published, so feel free to ask me more questions in the meantime if I have not sufficiently clarified things.
Brian Bushnell is offline   Reply With Quote
Old 10-11-2017, 02:17 PM   #110
ashuchawla
Member
 
Location: san diego

Join Date: Jan 2012
Posts: 38
Default

Thank you Brian for your reply. I have to merge paired end reads from a Miseq run( I quality trimmed them at Q30). The overlap is around 100bp according to the experimentalist. What options would you recommend to merge these reads? Once I have the merged reads, I will use dedup to get all unique merged reads and run further analysis on them.

Ashu

Quote:
Originally Posted by Brian Bushnell View Post
Hi Ashu,

"Ambiguous" means there are multiple possible overlaps. For example, if read 1 and read 2 both end with "ACACACACACACACACACACAC", there are lots of possible overlap frames, none of which is particularly better than another. So, that would be ambiguous.

"No solution" means there is no overlap satisfying BBMerge's fairly strict criteria for the number of matching and mismatching bases in the best possible overlap frame.

If there is no frame in which the length, entropy (this determines the minimum necessary length), number of matching bases, and number of mismatching bases satisfy the cutoffs, the pair will not be merged and it will be declared "No solution". If there are multiple frames satisfying those cutoffs, and the second-best frame is sufficiently close to the best frame that it's really hard to tell which one is correct, the pair will not be merged and it will be declared "Ambiguous".

The pair will only be merged if there seems to be an unambiguously good solution.

"minoverlap=12" means that reads will never be merged if the best overlap is shorter than 12 bp. pfilter=1 will prevent reads from merging if there are any mismatches (I don't particularly recommend this, but it might be useful in some situations...). pfilter means probability filter, and considers the base qualities, so a read with a mismatch on a Q2 base might pass while an otherwise identical read with a mismatch in a Q40 base might fail. BBMerge will still look for all possible overlaps, and if, say, you have a 30bp overlap with 1 mismatch and a 20bp overlap with 0 mismatches, that would still be declared ambiguous.

Incidentally! The BBMerge paper was accepted by PLOS ONE and will be published soon, so you can read all the algorithmic details there =) But I don't actually know the date it will be published, so feel free to ask me more questions in the meantime if I have not sufficiently clarified things.
ashuchawla is offline   Reply With Quote
Old 10-11-2017, 04:52 PM   #111
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,576
Default

Quote:
I quality trimmed them at Q30
That is overly strict. What type of dataset is this and do you have a reference genome available?
GenoMax is offline   Reply With Quote
Old 10-11-2017, 06:52 PM   #112
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

As GenoMax says, trimming to Q30 is not beneficial before merging reads. BBMerge has some internal quality-trimming options, so it can try to merge, then quality-trim if it is unsuccessful, then try to merge again, etc. That can slightly increase the merge rate. But typically I just use the whole untrimmed reads as input. The longer the input reads are, the less likely it is for BBMerge to make an accidental incorrect merge, and it does take quality scores into account, so I do not recommend quality-trimming prior to BBMerge. Adapter-trimming is fine though.
Brian Bushnell is offline   Reply With Quote
Old 10-14-2017, 01:54 PM   #113
finswimmer
Member
 
Location: Europe

Join Date: Oct 2016
Posts: 41
Default

Hello Brian,

Quote:
Originally Posted by Brian Bushnell View Post
Aapter-trimming is fine though.
Do you recommend adapter trimming prior use of bbmerge? I thought if I provide the adapter sequence to bbmerge, it can find those paires which completly overlap more easy.

fin swimmer
finswimmer is offline   Reply With Quote
Reply

Tags
bbmap, bbmerge, bbtools, flash, pair end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO