SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Single end read with paired end reads tahamasoodi Bioinformatics 2 01-16-2016 07:46 AM
MetaSim: why paired end reverse read is much shorter than forward read?? gen_argentino Bioinformatics 0 09-06-2012 06:38 AM
Average Read Coverage for 454 paired end read data lisa1102 Core Facilities 8 10-18-2011 08:40 AM
Difference in paired-end and single-end read ? darshan Bioinformatics 1 09-30-2009 11:44 PM

Reply
 
Thread Tools
Old 11-22-2016, 07:43 AM   #81
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,795
Default

Quote:
Originally Posted by moistplus View Post
is it possible to estimate insert size of non overlapped library ?
By mapping with bbmap (assuming you have a reference).
GenoMax is offline   Reply With Quote
Old 11-22-2016, 11:37 AM   #82
moistplus
Member
 
Location: Germany

Join Date: Feb 2016
Posts: 40
Default

Quote:
Originally Posted by GenoMax View Post
By mapping with bbmap (assuming you have a reference).
Thank you, I'm gonna try it.

What is the equivalent option to bwa mem -M (for single mapping) ?
moistplus is offline   Reply With Quote
Old 11-22-2016, 12:19 PM   #83
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

There is no equivalent to -M, since BBMap does not produce split alignments. Also, by default, BBMap does not consider a pair "proper" unless it aligns in the fragment orientation, with reads on opposite strands. if you are trying to quantify the insert size distribution of a long-mate-pair library with a different orientation, you should add the flag "requirecorrectstrand=f".

To generate an insert size histogram, you can do this:
Code:
(for interleaved reads)
bbmap.sh ref=ref.fa in=reads.fq ihist=ihist.txt

(for reads in two files)
bbmap.sh ref=ref.fa in1=r1.fq in2=r2.fq ihist=ihist.txt
Brian Bushnell is offline   Reply With Quote
Old 11-23-2016, 10:50 AM   #84
moistplus
Member
 
Location: Germany

Join Date: Feb 2016
Posts: 40
Default

In fact I would like to estimate my insert size and the orientation also.

I have ran the classical bwa mem and picard tools CollectInsertSizeMetrics after an assembly.

For some lib, I see something weird. For a library with an insert size of 350 bp:

- 90% are FR and insert size of 350
- 10% are in tandem, so either FF or RR.

I don't know whether it's a problem with my library or the assembler has done something crappy.

I would like to run bbmap to get an another estimation. If I got the same thing, I would like to extract only the PE with the right orientation (FR).

Is it possible to do that with bbtools ?
moistplus is offline   Reply With Quote
Old 11-23-2016, 10:59 AM   #85
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,795
Default

@moistplus: Take a look at these options for BBMap and their default values.

Code:
samestrandpairs=f       (ssp) Specify whether paired reads should map to the
                        same strand or opposite strands.
requirecorrectstrand=t  (rcs) Forbid pairing of reads without correct strand 
                        orientation.  Set to false for long-mate-pair libraries.
killbadpairs=f          (kbp) If a read pair is mapped with an inappropriate
                        insert size or orientation, the read with the lower  
                        mapping quality is marked unmapped.
pairedonly=f            (po) Treat unpaired reads as unmapped.  Thus they will 
                        be sent to 'outu' but not 'outm'.
GenoMax is offline   Reply With Quote
Old 11-23-2016, 11:39 AM   #86
moistplus
Member
 
Location: Germany

Join Date: Feb 2016
Posts: 40
Default

Quote:
Originally Posted by GenoMax View Post
@moistplus: Take a look at these options for BBMap and their default values.

Code:
samestrandpairs=f       (ssp) Specify whether paired reads should map to the
                        same strand or opposite strands.
requirecorrectstrand=t  (rcs) Forbid pairing of reads without correct strand 
                        orientation.  Set to false for long-mate-pair libraries.
killbadpairs=f          (kbp) If a read pair is mapped with an inappropriate
                        insert size or orientation, the read with the lower  
                        mapping quality is marked unmapped.
pairedonly=f            (po) Treat unpaired reads as unmapped.  Thus they will 
                        be sent to 'outu' but not 'outm'.
Thanks Genomax.
So if I undertand well, I should set killbadpairs and pairedonly to true ?
moistplus is offline   Reply With Quote
Old 11-23-2016, 12:25 PM   #87
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Basically, you can do something like this:

bbmap.sh in=reads.fq outm=normal_pairs.sam outu=abnormal.fq killbadpairs=t pairedonly=t

Then the proper pairs will be in normal_pairs.sam, and everything else will be in abnormal.fq. You can subsequently apply further filters to abnormal.fq to look at it in more detail. Either file can put output in fastq or sam (depending on the extension) but everything in outu will have its mapping information removed so if you want to see the orientations, you'd need to remap them. You could in a subsequent pass, do this:

bbmap.sh in=abnormal.fq outm=strange_pairs.sam outu=unmapped.fq requirecorrectstrand=f pairedonly=t

That would give you FF and RR pairs in strange_pairs.sam and all the unmapped or half-mapped pairs in unmapped.fq. You can further split them like this:

bbmap.sh in=unmapped.fq outm=half_mapped.sam outu=both_unmapped.fq

Last edited by Brian Bushnell; 11-23-2016 at 12:28 PM.
Brian Bushnell is offline   Reply With Quote
Old 11-23-2016, 12:46 PM   #88
moistplus
Member
 
Location: Germany

Join Date: Feb 2016
Posts: 40
Default

Quote:
Originally Posted by Brian Bushnell View Post
Basically, you can do something like this:

bbmap.sh in=reads.fq outm=normal_pairs.sam outu=abnormal.fq killbadpairs=t pairedonly=t

Then the proper pairs will be in normal_pairs.sam, and everything else will be in abnormal.fq. You can subsequently apply further filters to abnormal.fq to look at it in more detail. Either file can put output in fastq or sam (depending on the extension) but everything in outu will have its mapping information removed so if you want to see the orientations, you'd need to remap them. You could in a subsequent pass, do this:

bbmap.sh in=abnormal.fq outm=strange_pairs.sam outu=unmapped.fq requirecorrectstrand=f pairedonly=t

That would give you FF and RR pairs in strange_pairs.sam and all the unmapped or half-mapped pairs in unmapped.fq. You can further split them like this:

bbmap.sh in=unmapped.fq outm=half_mapped.sam outu=both_unmapped.fq
Your tool is amazing ! there are everythings !

If I set
HTML Code:
 outm=normal_pairs.fq
, I will have interleaved paired end ?
moistplus is offline   Reply With Quote
Old 11-23-2016, 12:57 PM   #89
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by moistplus View Post
If I set
HTML Code:
 outm=normal_pairs.fq
, I will have interleaved paired end ?
Yes, they will be interleaved. If you want you can use "outm1=r1.fq outm2=r2.fq" to keep them in 2 files but I find interleaved reads more convenient.
Brian Bushnell is offline   Reply With Quote
Old 12-01-2016, 02:07 AM   #90
moistplus
Member
 
Location: Germany

Join Date: Feb 2016
Posts: 40
Default

Quote:
Originally Posted by Brian Bushnell View Post
Yes, they will be interleaved. If you want you can use "outm1=r1.fq outm2=r2.fq" to keep them in 2 files but I find interleaved reads more convenient.
Is that possible to output the 2 fastq and a sam file at the same time ?
moistplus is offline   Reply With Quote
Old 12-01-2016, 09:19 AM   #91
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

You can output fastq to outm and sam to out or outu (or any similar combination), but you can't output both sam and fastq to outm.
Brian Bushnell is offline   Reply With Quote
Old 12-06-2016, 09:12 AM   #92
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

@Moistplus

To clarify, "mix" puts both merged and unmerged reads in the same file, the target of "out=". It should only be used in conjunction with "ecco" which does not actually merge the reads. With "ecco" and "mix" you will get the same number of output reads as input reads. Without "ecco" and "mix" your merged reads will go to "out=" and unmerged reads will go to "outu=", and the total number of reads afterward will be less than the number of input reads.
Brian Bushnell is offline   Reply With Quote
Old 12-15-2016, 09:32 AM   #93
tshalev
Junior Member
 
Location: Canada

Join Date: Dec 2016
Posts: 4
Default Confusion regarding quality/adapter trimming and read merging

Hi everyone,

First off, I'd like to say thanks Brian, BBMerge has been really useful in speeding up my assemblies and improving assembly quality.

I'm having some confusion regarding how to combine trimming and merging. In the quick guide for BBMerge, it says:

"Adapter-trimming reads is fine and is recommended prior to running BBDuk, particularly if kmers will be used. Quality-trimming is usually not recommended, unless the library has major quality issues at the right end of reads resulting in a low merge rate, in which case only weak trimming (such as qtrim=r trimq=8) should be used."

I'm not sure if "BBDuk" is a typo, as this is the guide for BBMerge. Basically I understand that the process needs to be something like this:

1)
Code:
 bbduk.sh in=reads.fq out=clean.fq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo
To trim adapters;

2)
Code:
bbmerge-auto.sh in=clean.fq out=merged.fq adapter1=something adapter2=something rem k=62 extend2=50 ecct
To merge reads;

3)
Code:
bbduk.sh in=merged.fq out=merged_trimmed.fq qtrim=r trimq=10
To further trim poor quality reads.

My main issues here are:

1) This seems a bit convoluted, and feels like too much messing with the data.

2) I'm still unsure whether I am actually supposed to trim adapters before merging, and if so, does it make sense to use the adapter flags in BBMerge? Can it use the adapter information if I've already trimmed them?

Thanks!
tshalev is offline   Reply With Quote
Old 12-15-2016, 12:29 PM   #94
j.m.c
Junior Member
 
Location: Vancouver BC

Join Date: Dec 2016
Posts: 2
Default BBmerge smaller insert sizes than expected

Hi,

I just run BBmerge on a set of paired-end reads (illumina 2x100bp) that had been trimmed with trimmomatic to remove adapter sequences and low quality bases at the end of the reads. Trimmed reads (87 bases long) were merged with BBmerge (following the recommended command for optimal accuracy) and the merged-reads had an insert size that ranged from 35 to 265 bases with an avg of 124 bases.

My questions:

How can I get merged reads that are 35 bases long if my input PE reads were 87 bases long?

Is there something wrong in my commands?

Below the code I used with trimmomatic and BBmerge:


trimmomatic-0.36.jar PE -threads 4 -trimlog YC1.trimlog.txt YC1_R1.fastq YC1_R2.fastq YC1_R1.trimmed.paired.fastq YC1_R1.trimmed.unpaired.fastq YC1_R2.trimmed.paired.fastq YC1_R2.trimmed.unpaired.fastq HEADCROP:13 SLIDINGWINDOW:4:15 MINLEN:60 &

/usr/local/BBMap_36.62/bbmerge.sh in1=YC1_R1.trimmed.paired.fastq in2=YC1_R2.trimmed.paired.fastq out=YC1.trimmed.paired.merged.fastq outu1=YC1_R1.trimmed.paired.unmerged.fastq outu2=YC1_R2.trimmed.paired.unmerged.fastq rem k=62 extend2=50 ecct -Xmx48000m -threads=6 &

Thank you very much!

Last edited by j.m.c; 12-15-2016 at 12:40 PM.
j.m.c is offline   Reply With Quote
Old 12-15-2016, 01:05 PM   #95
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

@tshalev:

Hi! Sorry it's a little convoluted. Well, allow me to explain:

First off, yep, BBDuk was a typo, thanks for catching it. Secondly, I would change your commands to something like this:

Code:
bbduk.sh in=reads.fq out=clean.fq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo

bbmerge-auto.sh in=clean.fq out=merged.fq outu=unmerged.fq adapter1=something adapter2=something rem k=62 extend2=50 ecct

bbduk.sh in=merged.fq out=merged_trimmed.fq qtrim=rl trimq=10

bbduk.sh in=unmerged.fq out=unmerged_trimmed qtrim=r trimq=10
I always recommend keeping the unmerged reads and using those too for the assembler, particularly if you are assembling with Spades. Some reads, like those in low-complexity areas, will not merge without a very long overlap, so tossing out unmerged reads can incur bias.

As for using adapter sequences for BBMerge - that reduces the false-positive rate. If two reads appear to have a good overlap for merging, but that overlap indicates the insert size is shorter than read length, then if and only if you have specified adapter sequences, BBMerge will look to see if adapter sequences are in the expected locations (the overhanging part outside of the overlap). If there are not adapter sequences there, it will assume that was a false positive and the merge will be rejected. Therefore, if you already did adapter-trimming, all of these pairs that merge with insert shorter than read length should be false positives, but BBMerge won't know that unless you actually give it the adapter sequences. So it will print at the end something like "300 adapter sequences expected, 7 found" or similar, indicating that 293 false-positive merges were prevented.

As for over-processing... that's a good thing to worry about. All the processing steps are beneficial as long as the assembly improves, so I often like to assemble after every step to determine its impact. Adapter-trimming is basically always beneficial. Merging depends on the library, the sequencing mode, and the assembler, but is (in my experience) always beneficial with Spades, Tadpole and Ray, and not beneficial with Megahit. Quality-trimming is generally beneficial if you pick the correct cutoff, but 10 is a nice, conservative value that should generally be fine.
Brian Bushnell is offline   Reply With Quote
Old 12-15-2016, 01:12 PM   #96
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by j.m.c View Post
My questions:

How can I get merged reads that are 35 bases long if my input PE reads were 87 bases long?

Is there something wrong in my commands?
Your commands look fine, although I'm a little confused about what you mean by 87-bp reads. Were they initially 87 bp long, or was that the average after trimming? Also, I suggest trying BBMerge prior to quality-trimming unless you have very low quality data, but you can also try either way and see which gives a higher merge rate.

Note that BBMerge's default "mininsert" is 35; it does not, by default, look for overlaps indicating an insert size smaller than that. If you have a lot of short-insert reads you may want to set it lower, or just do adapter-trimming prior to BBMerge and throw away the short junk. You can do adapter-trimming with BBDuk as in the above post.

The 35bp reads you ended up with are because of the short insert. When you have 2x87bp reads with a 35bp insert, you get 35bp of overlap on the 3' end and then 52bp of the 5' end overhanging on each side; that's adapter sequence. BBMerge trims that off so you are left with only the 35bp of genomic sequence.
Brian Bushnell is offline   Reply With Quote
Old 12-15-2016, 01:12 PM   #97
tshalev
Junior Member
 
Location: Canada

Join Date: Dec 2016
Posts: 4
Default

@Brian Bushnell

Ah OK, I see. So it "expects" to not find the adapter sequence there, since it has hopefully been removed by BBDuk. Slightly unrelated, I am using RNA-Seq data for a coniferous tree species, and am assembling using conventional assemblers such as Trinity, Velvet-Oases, etc. BBMerge is appropriate for this purpose, right? I keep noticing a lot of threads talking about 16S data, or amplicon data, and I haven't even heard of the assemblers that you mentioned .

Thanks!
tshalev is offline   Reply With Quote
Old 12-15-2016, 01:26 PM   #98
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The primary reason people use read merging is for 16S or other amplicon analyses, I believe. But I don't personally work with 16S very often, so BBMerge is designed and optimized for improving assemblies. Of course, it works on 16S as well, but I use it to optimize assembly pipelines. That said, I have never used Trinity, so I don't know how it would affect a Trinity assembly. As long as you assemble with both the merged and unmerged reads, most assemblers benefit from BBMerge (some quite a lot) so I would expect it to improve a Trinity assembly, but I'd be interested to hear what you experience, if you have the time and interest to run Trinity both ways. Of course RNA-seq assembly quality is especially hard to measure, but metrics like mapping rate, N50, and size are at least somewhat useful.

What kind of assembly are you doing, how long are your reads, and what organism? Is it just RNA-seq?
Brian Bushnell is offline   Reply With Quote
Old 12-15-2016, 01:47 PM   #99
tshalev
Junior Member
 
Location: Canada

Join Date: Dec 2016
Posts: 4
Default

@Brian Bushnell

I am working with foliage tissue from a species of coniferous tree. I'm using 100bp reads, on Illumina HiSeq 4000. I did actually do the comparison tests about a year ago for using merging vs. not merging, on some different data that I had. For these I trimmed first using Trimmomatic though and did not use kmer information or adapter recognition (not sure if these were implemented in BBMerge back then).

My overall consensus was that merging and then assembling with both merged and unmerged reads produced better assemblies than not merging, over four different assemblers (Trinity, Velvet+Oases, SOAPdenovoTrans and transABySS). This was gauged using the optimized assembly score from Transrate, as well as by assembly completeness as measured by BUSCO and and contiguity as measured by Conditional Reciprocal Best BLAST (from Transrate) against gene sets of other conifer species. In all cases the gains were enough to warrant the use of merging.

I'm interested to see now how using some of these new features will affect my assembly. I already see an increase in merging rate from about ~57% to ~83% using the verystrict parameter, although I won't know whether this includes false positives until I assemble. Regarding adapters expected versus adapters found, I'm seeing ~430000 adapters expected versus ~6000 adapters found in ~91.5 million reads after adapter trimming, so I guess this is good?
tshalev is offline   Reply With Quote
Old 12-15-2016, 02:20 PM   #100
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by tshalev View Post
My overall consensus was that merging and then assembling with both merged and unmerged reads produced better assemblies than not merging, over four different assemblers (Trinity, Velvet+Oases, SOAPdenovoTrans and transABySS). This was gauged using the optimized assembly score from Transrate, as well as by assembly completeness as measured by BUSCO and and contiguity as measured by Conditional Reciprocal Best BLAST (from Transrate) against gene sets of other conifer species. In all cases the gains were enough to warrant the use of merging.
Great, thanks for that info!

Quote:
I'm interested to see now how using some of these new features will affect my assembly. I already see an increase in merging rate from about ~57% to ~83% using the verystrict parameter, although I won't know whether this includes false positives until I assemble.
OK, please let me know the results - it's useful for giving people guidance on when to use rem flag. I've never tried it in conjunction with RNA-seq data, just isolates, metagenomes, and single-cell, though it improved all of those cases.

Quote:
Regarding adapters expected versus adapters found, I'm seeing ~430000 adapters expected versus ~6000 adapters found in ~91.5 million reads after adapter trimming, so I guess this is good?
That indicates the adapter trimming was fairly complete. What version of BBMap are you using, by the way?
Brian Bushnell is offline   Reply With Quote
Reply

Tags
bbmap, bbmerge, bbtools, flash, pair end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO