SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Single end read with paired end reads tahamasoodi Bioinformatics 2 01-16-2016 07:46 AM
MetaSim: why paired end reverse read is much shorter than forward read?? gen_argentino Bioinformatics 0 09-06-2012 06:38 AM
Average Read Coverage for 454 paired end read data lisa1102 Core Facilities 8 10-18-2011 08:40 AM
Difference in paired-end and single-end read ? darshan Bioinformatics 1 09-30-2009 11:44 PM

Reply
 
Thread Tools
Old 07-08-2018, 10:28 AM   #121
kokyriakidis
Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 12
Default RQCFilter Norm and EC

Hi Brian,

I am trying to trim and filter my data with RQCFilter but I cannot find an option for normalisation and error correction. Are there any parameters in this package? Also there is a parameter called -merge. Does it do merging? Should I set it to false and try normalising and error correcting first?
kokyriakidis is offline   Reply With Quote
Old 07-08-2018, 11:37 AM   #122
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

Can you clarify which program you are referring to? I don't think there is a RQCfilter program in BBMap suite.
GenoMax is offline   Reply With Quote
Old 07-08-2018, 12:11 PM   #123
kokyriakidis
Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 12
Default

Source: https://jgi.doe.gov/data-and-tools/b...preprocessing/

"These steps replicate the QA protocol implemented at JGI for Illumina reads. There is a program “RQCFilter” which implements them as a pipeline, but that is not publically available because it has numerous hard-coded paths to reference datasets of contaminants."

It is in the bbtools files.

Nevermind! 1) Is it a good plan to normalise and error correct first BEFORE merging? 2) Do I need to follow a different approach at trimming and filtering short vs long mate pair reads (Nextera)?

Last edited by kokyriakidis; 07-08-2018 at 12:15 PM.
kokyriakidis is offline   Reply With Quote
Old 07-08-2018, 10:19 PM   #124
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

Since notes on the page you linked say this:
Quote:
There is a program “RQCFilter” which implements them as a pipeline, but that is not publically available because it has numerous hard-coded paths to reference datasets of contaminants.
You should follow the steps that are denoted to replicate that functionality on the linked page.

In general @Brian has recommended merging reads before doing any additional manipulations.
GenoMax is offline   Reply With Quote
Old 07-08-2018, 10:25 PM   #125
kokyriakidis
Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 12
Default

Quote:
Originally Posted by GenoMax View Post
Since notes on the page you linked say this:


You should follow the steps that are denoted to replicate that functionality on the linked page.

In general @Brian has recommended merging reads before doing any additional manipulations.
In long pair mate reads I just do the splitNextera extra step? Otherwise the pipeline remains the same?
kokyriakidis is offline   Reply With Quote
Old 07-09-2018, 02:26 AM   #126
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

I would think so. I don't have first hand experience with mate pair reads but I recall that you need to switch one of the reads around.
GenoMax is offline   Reply With Quote
Old 08-06-2018, 03:16 PM   #127
ilya
Junior Member
 
Location: Boston

Join Date: Jul 2012
Posts: 2
Default

BBMerge guide recommends trimming adapters before merging -- but also, in a different place, recommends providing the adapter sequences to BBMerge. Which is best?
ilya is offline   Reply With Quote
Old 02-24-2019, 05:18 AM   #128
Shriram369
Junior Member
 
Location: Ireland

Join Date: Feb 2019
Posts: 1
Smile Program ran out of memory on large dataset: Need some tips

Hi folks,

We have a shotgun metagenomic dataset (approx. 120Gbs compressed). I want to merge paired-end reads as longer reads will increase assembly performance. And I have tried it on a small subset of data and it remarkably increased N50 and scaffold length.

But now I want to merged approx 120Gbs of compressed data for subsequent assembly. We have a system with 32 threads and 120Gb of memory. After going through tips on bbtools page, I tried following command and ran out of memory (Error message: This program ran out of memory.
Try increasing the -Xmx flag and using tool-specific memory-related parameters).

bbmerge-auto.sh in1=in_R1.fastq.gz in2=in_R2.fastq.gz out=merged.fastq.gz outu1=1_um.fastq.gz outu2=2_um.fastq.gz outa=adapters.txt ihist=insert_histogram.txt k=62 vstrict rem extend2=50 ecct mininsert=150 -Xmx80g minprob=0.8 prefilter=2 prealloc ziplevel=5

My question are:

1. Are there any other specific parameters with which it is manageable to run this command on mentioned configured server.

2. Can I subset the data using partition.sh bbtools wrapper and run the command? But as I understand sub-setting the data will reduced merging of reads. is it true?

Any tips/advice in this case is appreciated.

Thanks
Shriram369 is offline   Reply With Quote
Old 02-24-2019, 06:49 AM   #129
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

@Shriram369: As long as your reads are in proper order in the files it would be fine to sub-set the data into manageable chunks and then do the merging.
GenoMax is offline   Reply With Quote
Old 04-04-2020, 04:05 PM   #130
yy273826987
Junior Member
 
Location: Cincinnati, Ohio, USA

Join Date: Mar 2020
Posts: 6
Default Can we merge two forward reads with this tool?

Hi Brain,

I am really new to bioinformatics data analysis and just found this wonderful tool. Here I have a question: I have several environmental samples (A, B, and C). I sequenced them (shotgun metagenoimcs sequencing; paired-end) and found that, for sample B, the sequencing depth is not high enough. So, I asked the sequencing center to sequence sample B again. In the end, I got two sets of sequencing results for sample B: B.R1, B.R2, B.2nd.R1, and B.2nd.R2. For my downstream analysis (e.g., co-assembly), do you think I should merge B.R1 and B.2nd.R1 first? If so, how can use BBmerge to do that? Based on my understanding, BBmerger is designed to merge R1 and R2. Can it be used to merge two sets of R1s (from two separate sequencing runs)? Or, is that merging even necessary?

Thanks a lot!

Yours,
yy273826987 is offline   Reply With Quote
Old 04-06-2020, 03:12 AM   #131
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

If you have two separate sequencing runs you can't "merge" the two reads since they are not sequencing the same fragment. Reason you can (in some cases) merge two reads R1/R2 to get a longer representation is because they are sequences from same fragment.
GenoMax is offline   Reply With Quote
Old 04-21-2020, 10:18 AM   #132
shunyip
Member
 
Location: San Francisco South Bay Area

Join Date: Oct 2013
Posts: 21
Default bbmerge minoverlap default value is different between usage and source code?

Hi Brian,

I am currently using version 38.82; and I am looking into the default settings of the "minoverlap" argument.

In bbmerge's usage, it says:
Code:
minoverlap=12        Minimum number of overlapping bases to allow merging
However, when I looked into ./current/jgi/BBMerge.java :
Code:
...
}else if(a.equals("minoverlappingbases") || a.equals("minoverlapbases") || a.equals("minoverlap")){
                                MIN_OVERLAPPING_BASES=Integer.parseInt(b);
...
private static int MIN_OVERLAPPING_BASES=11;
...
It seems that in the source code, minoverlap is set to 11, but in the usage, it says that the default value is 12. Am I looking at it correctly?

Thank you for developing this amazing tool!
Ken
shunyip is offline   Reply With Quote
Reply

Tags
bbmap, bbmerge, bbtools, flash, pair end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:49 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO