SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract reads from paired-end fastq based on specific adapters with bbduk gspirito Bioinformatics 2 11-12-2019 06:37 AM
How does trimq work in BBDuk? jnarag Bioinformatics 0 09-16-2019 11:30 AM
Removing duplicate fastq entries from concatenated files horvathdp Bioinformatics 15 07-05-2019 12:20 PM
How to work with ENCODE RNA-seq fastq files? Karenj Bioinformatics 4 12-12-2012 07:18 AM
Splitting concatenated PE fastq to two files for respect reads JayM Illumina/Solexa 5 11-05-2010 02:58 AM

Reply
 
Thread Tools
Old 06-29-2021, 01:15 AM   #1
Pluto
Junior Member
 
Location: London

Join Date: Jun 2021
Posts: 3
Post Does BBDuk work on concatenated fastq files

Basically, I have around 200 samples and wanted to sequence at a depth of 50 million reads per sample. This was not possible on one run so I chose to do all 200, 14 times. I have concatenated all the R1 together and all the R2 together from one sample and so on.... so now I have 200 R1 and 200 R2.

I was wondering if BBDuk can deal with these files as each file is made up of 14 fastq output with 14 headings.

Thank you in advance.
Pluto is offline   Reply With Quote
Old 06-29-2021, 04:29 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,121
Default

BBduk will work fine. Hopefully you concatenated the files in exactly the same order for both R1/R2 files.
GenoMax is offline   Reply With Quote
Old 07-23-2021, 06:32 AM   #3
Pluto
Junior Member
 
Location: London

Join Date: Jun 2021
Posts: 3
Default Adaptor trimming is not working

Thank you for getting back to me. The adaptor trimming is not working sadly.

This is what my script looks like:

Ordered=t #Set to true to output reads in same order as input
Ktrim=r #once a reference kmer is matched in a read, that kmer and all the bases to the right will be trimmed
K=21 #specifies the kmer size
Mink=8 #"mink" allows it to use shorter kmers at the ends of the read
Hdist=2 #number of permitted mismatches


for Prefix in `ls -1 *_R1.fastq.gz | sed 's/_R1.fastq.gz//'`
do

bbduk.sh -Xmx128g in1=$Prefix\_R1.fastq.gz in2=$Prefix\_R2.fastq.gz out1=$Prefix\_clean_R1.fastq.gz out2=$Prefix\_clean_R2.fastq.gz ref=$adapters ordered=$Ordered ktrim=$Ktrim k=$K mink=$Mink hdist=$Hdist tpe tbo

done

Remember my R1 and R2 files consist of concatenated sequences from different runs. Do you think this could be the reason?

Many thanks
Pluto is offline   Reply With Quote
Old 07-23-2021, 06:49 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,121
Default

As I said before as long as the files are concatenated in same order AND they had the same number of reads in sync across R1/R2 files to begin with this should work without any problems. If things are not working you need to make sure that the reads in your files are in sync. You can check on that using a different bbtool called "repair.sh".

BBduk.sh needs very little memory there is no need to assign 128G for this job. 4G would be perfectly fine.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO