![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Single end read with paired end reads | tahamasoodi | Bioinformatics | 2 | 01-16-2016 08:46 AM |
MetaSim: why paired end reverse read is much shorter than forward read?? | gen_argentino | Bioinformatics | 0 | 09-06-2012 07:38 AM |
Average Read Coverage for 454 paired end read data | lisa1102 | Core Facilities | 8 | 10-18-2011 09:40 AM |
Difference in paired-end and single-end read ? | darshan | Bioinformatics | 1 | 10-01-2009 12:44 AM |
![]() |
|
Thread Tools |
![]() |
#101 |
Junior Member
Location: Canada Join Date: Dec 2016
Posts: 4
|
![]()
The latest version, release 36_62. I'll keep you posted.
|
![]() |
![]() |
![]() |
#102 |
Junior Member
Location: Vancouver BC Join Date: Dec 2016
Posts: 2
|
![]()
Thank you for your reply.
Yes, my reads were 87 bp after trimming with trimmomatic. I had also removed adapter sequences with trimmomatic and now I think I see the issue if understood correctly what you said: "The 35bp reads you ended up with are because of the short insert. When you have 2x87bp reads with a 35bp insert, you get 35bp of overlap on the 3' end and then 52bp of the 5' end overhanging on each side; that's adapter sequence. BBMerge trims that off so you are left with only the 35bp of genomic sequence. " That means the overhangs are removed since BBmerge thinks they are adapter sequences. My reads are from RNA-seq data (not genomic data, I am sorry I didn't specify earlier) and since I removed adapter sequences with trimmomatic, I am actually loosing data if the 5' overhangs were trimmed off... Is there any way to prevent that with BBmerge? Otherwise I will try BBmerge with my raw reads without removing adapters. Thanks! |
![]() |
![]() |
![]() |
#103 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
If you know your adapter sequences (or have a list of typical adapter sequences, or actually, you can just say "adapter=default"), you can do this:
Code:
bbmerge.sh in=reads.fq adapter=adapter.fa out=merged.fq outu=unmerged.fq So - it's not surprising that Trimmomatic did not do complete trimming. I recommend you use BBDuk instead. It still won't give perfect adapter-trimming, but it will be much better than Trimmomatic. |
![]() |
![]() |
![]() |
#104 |
Junior Member
Location: USA Join Date: Jan 2015
Posts: 6
|
![]()
Hi Brian! I have a question: I am working on a fungal ITS metagenomic amplicon library with a pretty wide variation in sizes (200-500 bp). We are doing 2x300, and my second reads are a little bit lower in quality compared to the firsts. Is there any setting on the BBMerge that I should modify in order to get the most out of the data? I'm pretty new to the field, so please let me know if you need more information! Thank you.
|
![]() |
![]() |
![]() |
#105 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Hi! With that range you should have a worst a 100bp overlap, which is plenty. But 2x300 MiSeq runs have had major quality problems in the past, so it's possible that trimming would help. I'd suggest adding the flags "qtrim2 trimq=10,15". This will first try to merge the reads, and if unsuccessful (because the quality was too low so there were too many mismatches) quality-trim to Q10 on the right side and retry; then if still unsuccessful do the same at Q15. This isn't necessary unless the data is quite bad, but it will generally increase your merge rate, and is better than simply quality-trimming all reads prior to merging.
|
![]() |
![]() |
![]() |
#106 | |
Junior Member
Location: London, UK Join Date: Apr 2017
Posts: 1
|
![]() Quote:
We are having the same problem in our lab with 2x300 miseq runs- very poor Read 2 >Q30 scores- and I was wondering if Brian`s recommendation improved the number of paired-sequences you obtained from that run. Cheers |
|
![]() |
![]() |
![]() |
#107 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#108 |
Member
Location: san diego Join Date: Jan 2012
Posts: 38
|
![]()
Dear Brian, or anybody else who could help me,
I used the following command for BBMerge: bbmerge.sh in=reads.fq out=merged.fq pfilter=1 I got theses stats: Pairs: 2545201 Joined: 1491688 58.61% Ambiguous: 439613 17.27% No Solution: 613393 24.10% Too Short: 0 0.00% Avg Insert: 322.6 My questions: 1. What happens to the bases while read merging if there is a mismatch outside of the 12 bases this command considers. As per my understanding, Minimum number of overlapping bases to allow merging is 12. In other words, could you please explain exactly how does the merge happen between two paired end reads when I use the above mentioned command for a perfect overlap? 2. Could you please explain, what do "Ambiguous" and "No solution" mean? Thank you so much, Ashu |
![]() |
![]() |
![]() |
#109 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Hi Ashu,
"Ambiguous" means there are multiple possible overlaps. For example, if read 1 and read 2 both end with "ACACACACACACACACACACAC", there are lots of possible overlap frames, none of which is particularly better than another. So, that would be ambiguous. "No solution" means there is no overlap satisfying BBMerge's fairly strict criteria for the number of matching and mismatching bases in the best possible overlap frame. If there is no frame in which the length, entropy (this determines the minimum necessary length), number of matching bases, and number of mismatching bases satisfy the cutoffs, the pair will not be merged and it will be declared "No solution". If there are multiple frames satisfying those cutoffs, and the second-best frame is sufficiently close to the best frame that it's really hard to tell which one is correct, the pair will not be merged and it will be declared "Ambiguous". The pair will only be merged if there seems to be an unambiguously good solution. "minoverlap=12" means that reads will never be merged if the best overlap is shorter than 12 bp. pfilter=1 will prevent reads from merging if there are any mismatches (I don't particularly recommend this, but it might be useful in some situations...). pfilter means probability filter, and considers the base qualities, so a read with a mismatch on a Q2 base might pass while an otherwise identical read with a mismatch in a Q40 base might fail. BBMerge will still look for all possible overlaps, and if, say, you have a 30bp overlap with 1 mismatch and a 20bp overlap with 0 mismatches, that would still be declared ambiguous. Incidentally! The BBMerge paper was accepted by PLOS ONE and will be published soon, so you can read all the algorithmic details there =) But I don't actually know the date it will be published, so feel free to ask me more questions in the meantime if I have not sufficiently clarified things. |
![]() |
![]() |
![]() |
#110 | |
Member
Location: san diego Join Date: Jan 2012
Posts: 38
|
![]()
Thank you Brian for your reply. I have to merge paired end reads from a Miseq run( I quality trimmed them at Q30). The overlap is around 100bp according to the experimentalist. What options would you recommend to merge these reads? Once I have the merged reads, I will use dedup to get all unique merged reads and run further analysis on them.
Ashu Quote:
|
|
![]() |
![]() |
![]() |
#111 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#112 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
As GenoMax says, trimming to Q30 is not beneficial before merging reads. BBMerge has some internal quality-trimming options, so it can try to merge, then quality-trim if it is unsuccessful, then try to merge again, etc. That can slightly increase the merge rate. But typically I just use the whole untrimmed reads as input. The longer the input reads are, the less likely it is for BBMerge to make an accidental incorrect merge, and it does take quality scores into account, so I do not recommend quality-trimming prior to BBMerge. Adapter-trimming is fine though.
|
![]() |
![]() |
![]() |
#113 |
Member
Location: Europe Join Date: Oct 2016
Posts: 60
|
![]() |
![]() |
![]() |
![]() |
#114 |
Junior Member
Location: Sitzerland Join Date: Oct 2017
Posts: 7
|
![]()
Hello, I'm building a pipeline for metagenomics.
I follow the bb tools user guide and do: - normalization with bbnorm - error correction with tedpole - merge (with extension) with bbmerge I want to increase the merging to get a better assembly. I suspect that many reads, which could be merge are thrown away during the normalisation. Wouldn't it be better to do merging (without extension) first than taking primarily the merged reads, normalize, error-correct and merge with extension? What is the best way of normalising paired end and merged pairs or singletons in bbnorm? For now I do two rounds of bbnorm and supply the other reads via the `extra` parameter, is there a better way to do? |
![]() |
![]() |
![]() |
#115 |
Junior Member
Location: knoxville Join Date: Oct 2017
Posts: 7
|
![]()
Hi,
I have the shotgun data. Paired-end reads 100bp each end. I want to do MetaPhlAn2 next to know the general taxonomy profile. So I am considering to merge them before the MetaPhlAn2. However, I do not know I need to run bbmap first to do quality control, OR to run bbmerge first to merge the sequence. Any suggestions? Thanks in advance |
![]() |
![]() |
![]() |
#116 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
@chloe - It's normally simplest and most effective to do QC first on the raw data, then anything else (such as merging) later.
@silask - they way you are doing it is currently the most effective way. It's a little bit annoying to have to run BBNorm twice, but that's the only way to process both paired and unpaired reads. |
![]() |
![]() |
![]() |
#117 |
Junior Member
Location: knoxville Join Date: Oct 2017
Posts: 7
|
![]()
Hi, Brian,
Thanks for the reply. However, I have tried the QC. I used bbduk.sh in=R1.fastq.gz out=filter_R1.fq maq=30 bbduk.sh in=R2.fastq.gz out=filter_R2.fq maq=30 (no reads in R1R2 will be trimmed) bbduk.sh in=R1.fastq.gz out=clean_R1.fq trimq=30 bbduk.sh in=R2.fastq.gz out=clean_R2.fq trimq=30 (it will trim 50% of reverse reads, but no forward reads) bbduk.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1_001.fq out2=R2.fq outm=fail.fq bhist=hist_base.txt qhist=hist_q.txt aqhist=hist_aq.txt bqhist=hist_bq.txt ecco=t (Also no reads will be trimmed) But when I run the code BBmerge, only 32.268% of the reads can be joined. Do you have any suggestions? Thanks in advance. |
![]() |
![]() |
![]() |
#118 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]()
@chloe1005: It is possible that only 32% of your reads have inserts of a size that the reads can merge.
`trimq=30` is too severe a bar for trimming. If you have a reference genome then not doing any trimming for quality works fine. If you are doing any de novo work then you may want to trim at Q20 or Q25. |
![]() |
![]() |
![]() |
#119 |
Junior Member
Location: knoxville Join Date: Oct 2017
Posts: 7
|
![]()
Hi,
I am still confusing about the difference between the quality trimming and quality filtering. What is the difference between them? May also know how to get the reference genome? Since I also see the first threads in this post. Looking forward to getting the answer. |
![]() |
![]() |
![]() |
#120 |
Member
Location: Germany Join Date: Mar 2013
Posts: 44
|
![]()
Hi Brian, somehow the t=x flag doesn't reduce the number of nodes in use. Any suggestions what goes wrong or can I somehow include Java flags?
Bests, Ulrike |
![]() |
![]() |
![]() |
Tags |
bbmap, bbmerge, bbtools, flash, pair end |
Thread Tools | |
|
|