SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Single end read with paired end reads tahamasoodi Bioinformatics 2 01-16-2016 07:46 AM
MetaSim: why paired end reverse read is much shorter than forward read?? gen_argentino Bioinformatics 0 09-06-2012 06:38 AM
Average Read Coverage for 454 paired end read data lisa1102 Core Facilities 8 10-18-2011 08:40 AM
Difference in paired-end and single-end read ? darshan Bioinformatics 1 09-30-2009 11:44 PM

Reply
 
Thread Tools
Old 07-08-2018, 10:28 AM   #121
kokyriakidis
Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 12
Default RQCFilter Norm and EC

Hi Brian,

I am trying to trim and filter my data with RQCFilter but I cannot find an option for normalisation and error correction. Are there any parameters in this package? Also there is a parameter called -merge. Does it do merging? Should I set it to false and try normalising and error correcting first?
kokyriakidis is offline   Reply With Quote
Old 07-08-2018, 11:37 AM   #122
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Can you clarify which program you are referring to? I don't think there is a RQCfilter program in BBMap suite.
GenoMax is offline   Reply With Quote
Old 07-08-2018, 12:11 PM   #123
kokyriakidis
Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 12
Default

Source: https://jgi.doe.gov/data-and-tools/b...preprocessing/

"These steps replicate the QA protocol implemented at JGI for Illumina reads. There is a program “RQCFilter” which implements them as a pipeline, but that is not publically available because it has numerous hard-coded paths to reference datasets of contaminants."

It is in the bbtools files.

Nevermind! 1) Is it a good plan to normalise and error correct first BEFORE merging? 2) Do I need to follow a different approach at trimming and filtering short vs long mate pair reads (Nextera)?

Last edited by kokyriakidis; 07-08-2018 at 12:15 PM.
kokyriakidis is offline   Reply With Quote
Old 07-08-2018, 10:19 PM   #124
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Since notes on the page you linked say this:
Quote:
There is a program “RQCFilter” which implements them as a pipeline, but that is not publically available because it has numerous hard-coded paths to reference datasets of contaminants.
You should follow the steps that are denoted to replicate that functionality on the linked page.

In general @Brian has recommended merging reads before doing any additional manipulations.
GenoMax is offline   Reply With Quote
Old 07-08-2018, 10:25 PM   #125
kokyriakidis
Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 12
Default

Quote:
Originally Posted by GenoMax View Post
Since notes on the page you linked say this:


You should follow the steps that are denoted to replicate that functionality on the linked page.

In general @Brian has recommended merging reads before doing any additional manipulations.
In long pair mate reads I just do the splitNextera extra step? Otherwise the pipeline remains the same?
kokyriakidis is offline   Reply With Quote
Old 07-09-2018, 02:26 AM   #126
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

I would think so. I don't have first hand experience with mate pair reads but I recall that you need to switch one of the reads around.
GenoMax is offline   Reply With Quote
Old 08-06-2018, 03:16 PM   #127
ilya
Junior Member
 
Location: Boston

Join Date: Jul 2012
Posts: 2
Default

BBMerge guide recommends trimming adapters before merging -- but also, in a different place, recommends providing the adapter sequences to BBMerge. Which is best?
ilya is offline   Reply With Quote
Old 02-24-2019, 05:18 AM   #128
Shriram369
Junior Member
 
Location: Ireland

Join Date: Feb 2019
Posts: 1
Smile Program ran out of memory on large dataset: Need some tips

Hi folks,

We have a shotgun metagenomic dataset (approx. 120Gbs compressed). I want to merge paired-end reads as longer reads will increase assembly performance. And I have tried it on a small subset of data and it remarkably increased N50 and scaffold length.

But now I want to merged approx 120Gbs of compressed data for subsequent assembly. We have a system with 32 threads and 120Gb of memory. After going through tips on bbtools page, I tried following command and ran out of memory (Error message: This program ran out of memory.
Try increasing the -Xmx flag and using tool-specific memory-related parameters).

bbmerge-auto.sh in1=in_R1.fastq.gz in2=in_R2.fastq.gz out=merged.fastq.gz outu1=1_um.fastq.gz outu2=2_um.fastq.gz outa=adapters.txt ihist=insert_histogram.txt k=62 vstrict rem extend2=50 ecct mininsert=150 -Xmx80g minprob=0.8 prefilter=2 prealloc ziplevel=5

My question are:

1. Are there any other specific parameters with which it is manageable to run this command on mentioned configured server.

2. Can I subset the data using partition.sh bbtools wrapper and run the command? But as I understand sub-setting the data will reduced merging of reads. is it true?

Any tips/advice in this case is appreciated.

Thanks
Shriram369 is offline   Reply With Quote
Old 02-24-2019, 06:49 AM   #129
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@Shriram369: As long as your reads are in proper order in the files it would be fine to sub-set the data into manageable chunks and then do the merging.
GenoMax is offline   Reply With Quote
Reply

Tags
bbmap, bbmerge, bbtools, flash, pair end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO