Seqanswers Leaderboard Ad

**GenoMax** · 05-05-2016, 11:49 AM

Can you provide the exact command you are using? Your reads do have truseq adapters in them so the inserts may be smaller than you expected.

Code:

@M02344:9008:000000000-AJ1PF:1:1110:28244:16073 1:N:0:TGACCAAT+ATAGAGGC
TCTGCCGTCATCGACTTCGAAGGTTCGAATCCTTCCCCTCTAACCACGGCCGAAATTCAATACCCGGATCAAGCTCAATTCGGGTCGAGGTCGGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGA[COLOR="Red"]GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATGTCGTATGCCGTCTTCTGCTTG[/COLOR]AAAAAAAATAAGTGGTGCGAAGAGAGCCTGTGGCCAACCTCATATGCGTGGAGATGTCTCG

**Brian Bushnell** · 05-05-2016, 11:57 AM

In this case it looks like BBMerge's output is correct... as GenoMax said, you have adapter sequences indicating short inserts. Specifically, read1's first 126 bases exactly match BBMerge's output, and subsequently there is:
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATGTCGTATGCCGTCTTCTGCTTG
...a known Illumina adapter sequence, followed by AAAAAA which is common after the Illumina machine runs of the end of the adapter sequence and has no signal.

No matter what you expect/design your insert size to be, shorter fragments will almost always be present.

**lcmb** · 05-05-2016, 11:59 AM

I am using:

Code:

bbmerge.sh in1=<read1> in2=<read2> out=<mergedreads> outu1=<unmerged1> outu2=<unmerged2> mismatches=0

**lcmb** · 05-05-2016, 12:23 PM

Thank you both for your help! Do you have any suggestions on how to set optional parameters to ensure that my merged file only contains sequences of my intended/designed insert size?

**Brian Bushnell** · 05-05-2016, 12:46 PM

You can postfilter it afterward:

reformat.sh in=reads.fq out=filtered.fq minlength=202 maxlength=202

But, bear in mind that you may be losing important data by doing so. For example, there could be a whole bunch of sequences that are 199bp long for some real biological reason (rather than problems with library prep). So, just be cautious.

**lcmb** · 05-05-2016, 12:52 PM

Thank you!

**shimingt** · 06-14-2016, 01:30 AM

Hello there! Is there a way we could generate a stats or log file with bbmerge?

**GenoMax** · 06-14-2016, 02:59 AM

Originally posted by shimingt View Post

Hello there! Is there a way we could generate a stats or log file with bbmerge?

Stats are automatically generated with each run of BBMerge. They look something like this

Code:

Pairs:                  2879431
Joined:                 2052925         71.296%
Ambiguous:              810015          28.131%
No Solution:            16491           0.573%
Too Short:              0               0.000%

Avg Insert:             396.7
Standard Deviation:     98.1
Mode:                   415

Insert range:           35 - 591
90th percentile:        524
75th percentile:        469
50th percentile:        402
25th percentile:        332
10th percentile:        262

You can capture STDOUT/STDERR with standard bash conventions to a file (2>&1) to get the log info.

**shimingt** · 06-14-2016, 03:30 AM

Log File

Hello,

I am a novice with command prompt. I can't seem to get a log file.

This is my command prompt:

for x in *_R1_001.fastq;do echo bbmerge.sh -Xmx20g in1=$x in2=${x%_R1_001.*}_R2_001.fastq out=bbmerge\/${x%_*-*_R1_001.*}_R1R2_bbmerge.fastq>bbmerge\/${x%_*-*_L001_R1_001.*}_bbmerge_log.txt 2>&1 outu1=outu1\/$x outu2=outu2\/${x%_R1_001.*}_R2_001.fastq >> bbmerge.sh;done

Am I doing something wrong here?

There are log files but they are all empty?

**GenoMax** · 06-14-2016, 03:56 AM

Try this (I am assuming that your variable expressions are correct). The log file and the redirect should be at the end of the command.

Code:

for x in *_R1_001.fastq;do your_bbmerge_command_along_with_options >bbmerge\/${x%_*-*_L001_R1_001.*}_bbmerge_log.txt 2>&1;done

**shimingt** · 06-14-2016, 05:31 PM

Log file from bbmerge

Hello GenoMax,

Thanks for your reply.

I tried a command like this:

for x in *_R1_001.fastq;do echo bbmerge.sh -Xmx20g in1=$x in2=${x%_R1_001.*}_R2_001.fastq out=bbmerge\/${x%_*-*_R1_001.*}_R1R2_bbmerge.fastq outu1=outu1\/$x outu2=outu2\/${x%_R1_001.*}_R2_001.fastq > bbmerge\/${x%_*-*_L001_R1_001.*}_bbmerge_log.txt 2>&1 >> bbmerge.sh;done

The log files are generated according to the names, but the log files are empty.

Am I doing something wrong here?

**GenoMax** · 06-15-2016, 03:17 AM

Can you try

Code:

for x in *_R1_001.fastq;do bbmerge.sh -Xmx20g in1=$x in2=${x%_R1_001.*}_R2_001.fastq out=bbmerge\/${x%_*-*_R1_001.*}_R1R2_bbmerge.fastq outu1=outu1\/$x outu2=outu2\/${x%_R1_001.*}_R2_001.fastq > bbmerge\/${x%_*-*_L001_R1_001.*}_bbmerge_log.txt 2>&1;done

**shimingt** · 06-16-2016, 12:45 AM

Dear Genomax,

Thanks for your help!

**sganschow** · 11-02-2016, 08:24 AM

Hello,

how does BBMerge behave when the reads contain repetitive regions at the right end?

My amplicons are variable in length and derive from STR's, meaning that they are like:
(non-repetitive flanking region) - (tandem repeats) - (non-repetitive flanking region).
If the the amplicon is long enough, I could imagine that there is a case where paired reads overlap only in the repetitive region and thus multiple ways of merging are theoretically correct.
Until now, merging works fine. Can BBMerge ensure that merged reads are always consistent with the "real" amplicon sequence?

Sebastian

**Brian Bushnell** · 11-02-2016, 09:07 AM

If reads overlap only in a repetitive region, they will not be merged. BBMerge looks at all the possible overlaps, and keeps track of the two top-scoring ones (based on length and match/mismatch ratio). If those two are close, the pair will be classified as "ambiguous" and not merged. However, that does not mean it will always be correct; say you have a tandem repeat of two copies, like this:

ARRB

...where A and B are unique, and R is repeat. If the reads look like this:

1: ARR
2: RRB

...then there are 2 good overlap frames, forming ARRB or ARRRB, so the merge will be rejected as ambiguous. But if the reads only span a single repeat each, like this:

1: AR
2: RB

...then there will only be one apparently good overlap frame, and the reads will probably be merged incorrectly to form ARB. BBMerge's false-positive merge rate is extremely low, but it's not perfect. With shotgun data you can add the flag "rsem" to greatly reduce the chances of false-positive merges due to short tandem repeats, but that does not really work with amplicon data. You may want to simulate some data using your expected sequence to see what the actual behavior is.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News