SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Counts on single and paired ends reads merged bam file RocheKermit Bioinformatics 4 03-27-2014 07:35 AM
know the reads are paired ends via the dataset? arkilis Bioinformatics 4 09-29-2013 11:53 PM
inchworm and paired ends ians Bioinformatics 4 10-05-2011 07:55 AM
paired ends with cuffdiff Greg Bioinformatics 1 07-05-2010 11:12 AM
MAQ paired ends prm36 Bioinformatics 0 04-15-2010 08:29 AM

Reply
 
Thread Tools
Old 12-05-2015, 05:01 AM   #1
sebl
Member
 
Location: Israel

Join Date: Mar 2014
Posts: 26
Default cut 5' and 3' ends of paired-ends reads

Dear all,

I have 2x250 paired-end reads for de novo assembly of bacterial genomes.

The reads were processed with cutadapt to remove Illumina adaptors.

In FastQC I see that I should remove some (~10) bases from the beggining of the reads (both reads in the pair) because of base distribution at this region, even though quality is fine. I also wish to cut the end of the reads dues to low quality; ~10 bases on the first read and ~40 on the second one. This is almost constant in all genomes sequenced.

In the end I wish to filter out too short reads.

I went through cutadapt, trimmomatic and a couple of other tools, but I cannot find how I could cut a defined number of bases from both ends from both reads at the same time as to keep reads paired.

Suggestions would be much appreciated. Any other ideas on how to process these reads?

Thanks!
sebl is offline   Reply With Quote
Old 12-05-2015, 05:54 AM   #2
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

The bias in the first bases of the reads from libraries generated with random hexaprimers has been documented, and discussed over and over again. Do not cut them! You will just be discarding perfectly good bases.
http://seqanswers.com/forums/showthread.php?t=64396

With Trimmomatic, you have the option of setting the minimum quality of the leading or trailing bases, with the options LEADING and TRAILING. It's true that there doesn't seem to be an option to cut a specified number of bases off the tail. There is only an option for the head with HEADCROP. But, it just makes so much more sense to trim by quality score anyway. Unless, you are using an aligner that absolutely requires all reads to have the same length.

Frankly, I would just use the example command given in the Trimmomatic manual, and only change the minimum length, given that you will want to keep only reads long enough to do a proper assembly.

With Cutadapt, you do have the option --cut which will allow you to specify the number of reads you want to trim off the 5' and 3' ends. Again, it is preferable to trim by quality unless your assembler requires all reads to be of the same length, which is generally not the case.

There is also BBDuk, written by Brian Bushnell, an active member of this forum, which seems to have just about every option imaginable.
blancha is offline   Reply With Quote
Old 12-05-2015, 07:17 AM   #3
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

The trimmomatic command CROP removes bases from the 3' end of the reads.
mastal is offline   Reply With Quote
Old 12-05-2015, 07:40 AM   #4
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

@mastal is correct.

The parameters are a bit different though from HEADCROP. Rather than specifying the number of bases to cut, you specify the read length after cutting.
blancha is offline   Reply With Quote
Old 12-08-2015, 02:48 AM   #5
sebl
Member
 
Location: Israel

Join Date: Mar 2014
Posts: 26
Default

Thank you for the reply.

I guess you are right and I should work on quality trimming.

Our libraries however were prepared by mechanical shearing of gDNA, not the usual random hexamer protocol in Nextera, if that is what you meant.
sebl is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO