Seqanswers Leaderboard Ad

**Wallysb01** · 09-11-2012, 07:35 AM

If you're using single end reads, the fastx toolkit is pretty simple and easy to use. You can filter based on any quality score you want in a lot of different ways. Ie. the average quality score of the read needs to be equal or greater than X, or you must have at least Y bases at or above quality Z. It is pretty friendly, but it doesn't handle paired reads very well, because it will remove one side of the pair, and not flag the other side as now just a single end read if the quality score is above the cut off. I've used sickle for trimming PE reads, and highly recommend it. I believe the same group at UC Davis has a paired read filtering script too.

**SES** · 09-12-2012, 04:35 AM

Check out PRINSEQ, it has a stand alone and web-based version.

**tahamasoodi** · 11-27-2012, 12:47 AM

I have some low quality PE fastq files (read 1 and read 2 separate files). Will fastx toolkit work for these samples?

**westerman** · 11-27-2012, 10:48 AM

Read up on Trimmomatic before you start using fastx toolkit on PE sequences. fastx toolkit can be used with PE but you'll need to do more work in order to keep the reads matching between R1 and R2.

**tahamasoodi** · 11-27-2012, 11:01 AM

Hi,

I tried Trimmomatic but could not succeed, I'm getting empty files. I used the following command:
java -classpath trimmomatic-0.15.jar org.usadellab.trimmomatic.TrimmomaticPE read1.fastq.gz read2.fastq.gz read1_forward_paired.fastq.gz read1_forward_unpaired.fastq.gz read2_reverse_paired.fastq.gz read2_reverse_unpaired.fastq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

I've some issues with the command I used:
1)Input is read1 and read2, why we need unpaired output files as in this case read1_forward_unpaired.fastq.gz and read2_reverse_unpaired.fastq.gz?
2)I didn't use illumina clipping as i don't have this!
3) Seeing the above FastQC graphs, what quality threshold shall I give?

Where i'm doing the mistake?

**westerman** · 11-27-2012, 11:17 AM

Originally posted by tahamasoodi View Post

Where i'm doing the mistake?

If I was to make a single guess I am willing to bet that you should use the most recent version (0.22) instead of the 0.15 version.

Are there no log files or error messages you can examine?

As for the need for unpaired output files this is so that reads that no longer have a mate (due to the MINLEN) have a place to go to.

**tahamasoodi** · 11-27-2012, 11:41 AM

Now I'm at home, tomorrow I'll let you know the version but there are no error messages, just 4 empty output files after 5-8 mins.

Further I'm using bwa for the alignment, if I give the flag -q 15 or q 20, is it not enough to keep the low quality sequences apart from aligning the reference genome?

**westerman** · 11-27-2012, 12:42 PM

Originally posted by tahamasoodi View Post

Further I'm using bwa for the alignment, if I give the flag -q 15 or q 20, is it not enough to keep the low quality sequences apart from aligning the reference genome?

That was not your original question which was:

Will fastx toolkit work for these samples?

Yes, you could use BWA with a high -q flag. Or bowtie2 with or without the qseq quality filtering. In either case you could do further processing on the resultant BAM file in order to get rid of poor mapping reads.

**tahamasoodi** · 11-27-2012, 12:55 PM

Because of fastx toolkit failure for PE data and some error in Trimmomatic, the question has changed to bwa. Is -q 15 ok for the attached data?

What further processing is needed for BAM files?

**westerman** · 11-27-2012, 01:05 PM

Sure. -q 15 will get rid of the bad parts of your reads. Personally I like a -q 20 cutoff but it does depend on what you want to retain. For mapping to a known reference (instead of de-novo work) a poor quality dataset can be tolerated.

The BAM file will have a MAPQ (mapping quality) score which can be used to get rid of reads that do not map very well either due to q-score, repeat region, etc.

**tahamasoodi** · 11-27-2012, 01:23 PM

Thanks a lot!
Another question, how ever it is not related to this thread but i thought it is the best time to ask. May I know how can we calculate the following:

Reads mapped to the human genome
Reads mapped to the target regions (exome)
Coverage of target regions at 1x
Coverage of target regions at 10x
Coverage of target regions at 50x

Thanks again,

**tahamasoodi** · 11-28-2012, 07:07 AM

Hi Westerman,

I'm using the latest version of Trimmomatic but still getting the empty output files. My syntax is:

root@taha-MacPro:~/Softwares/Trimmomatic-0.22# java -classpath trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE /home/taha/Desktop/Sample_lane5/lane5_NoIndex_L005_R1_001.fastq.gz /home/taha/Desktop/Sample_lane5/lane5_NoIndex_L005_R2_001.fastq.gz /home/taha/Desktop/Sample_lane5/123_paired.fastq.gz /home/taha/Desktop/Sample_lane5/123_unpaired.fastq.gz /home/taha/Desktop/Sample_lane5/123_pairedR2.fastq.gz /home/taha/Desktop/Sample_lane5/123_unpairedR2.fastq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15

Final message is:
Input Read Pairs: 87970561 Both Surviving: 0 (0.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 87970561 (100.00%)
TrimmomaticPE: Completed successfully

How can I get rid of this?

**westerman** · 11-28-2012, 07:12 AM

Thanks for the complete report. I am guessing here but perhaps you need to indicate the proper quality scoring? I suspect that you are using the latest Illumina technology and thus should add '-phred33' to the command line.

**tahamasoodi** · 11-28-2012, 07:26 AM

Hi Westerman,

Fantantic, it is now running. I ll let you know again when it finishes.

Thanks,

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

how to filter low quality reads ?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News