Seqanswers Leaderboard Ad

**tonybolger** · 10-28-2014, 01:14 AM

Originally posted by BADE View Post

I performed the quality trimming also but the number of surviving reads was only 66% (described in earlier post). I am not sure how many will align to the genome.

From the quality plots, i would guess the machine had some kind of problem during the cycle 2 and 3 in the reverse run. This could be caused by environmental problems (e.g. external vibration) or within the machine (bubbles in the flow cell etc). If you have access to the 'per tile' quality information there may be a clear pattern to the low quality reads.

I would suggest you perform a 'HEADCROP' of 3 bp on the data, immediately after the ILLUMINACLIP step, and before the other quality filtering steps. This will simply drop the problem bases entirely from all reads.

Alternatively, you could widen the window/lower the threshold of the SLIDINGWINDOW, which would help bridge these dodgy patches. However, even after getting the dodgy data past the trimming stage, you still have the issue of aligning it so the HEADCROP might be better.

You might also want to consider more liberal alignment settings in Tophat, since many of the reads are probably failing due to these poor quality bases.

**BADE** · 10-29-2014, 09:00 AM

Hi tonybolger,

I would suggest you perform a 'HEADCROP' of 3 bp on the data, immediately after the ILLUMINACLIP step, and before the other quality filtering steps. This will simply drop the problem bases entirely from all reads.

I performed trimmomatic as per suggested settings. Here is my code and output:

Code:

TrimmomaticPE: Started with arguments: -threads 28 -phred33 _WT_CTTGTA_L001_R1_001.fastq _WT_CTTGTA_L001_R2_001.fastq Out_paired_WT_CTTGTA_L001_R1_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R1_001.fastq.gz Out_paired_WT_CTTGTA_L001_R2_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R2_001.fastq.gz ILLUMINACLIP:/home/kakrana/tools/Trimmomatic-0.32/TruSeq3-PE-2.fa:2:30:10:8:true HEADCROP:3 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

output:
Input Read Pairs: 22060013 Both Surviving: 14588404 (66.13%) Forward Only Surviving: 6416736 (29.09%) Reverse Only Surviving: 295779 (1.34%) Dropped: 759094 (3.44%)

As you see HEADCROP is not helping in getting more survive reads. What do you suggest? Should I use these paired reads to do the TopHat with changed settings?

You might also want to consider more liberal alignment settings in Tophat, since many of the reads are probably failing due to these poor quality bases.

For the previous TopHat run I used the below standard analysis options:
* FASTQ Quality Scale: Sanger (PHRED33)
* Anchor length: 8
* Maximum number of mismatches that can appear in the anchor region of spliced alignment: 0
* The minimum intron length: 70
* The maximum intron length: 50000
* Minimum isoform fraction: 0.15
* Maximum number of alignments to be allowed: 20
* Minimum intron length that may be found during split-segment (default) search: 50
* Maximum intron length that may be found during split-segment (default) search: 500000
* Number of mismatches allowed in each segment alignment for reads mapped independently: 2
* Minimum length of read segments: 20
* Mate-Pair Inner Distance: 50
* Bowtie 2 speed and sensitivity: Sensitive (slower)

Would you please elaborate more about what setting should I use?

Thanks for your suggestions.

**tonybolger** · 10-29-2014, 09:37 AM

Originally posted by BADE View Post

Input Read Pairs: 22060013 Both Surviving: 14588404 (66.13%) Forward Only Surviving: 6416736 (29.09%) Reverse Only Surviving: 295779 (1.34%) Dropped: 759094 (3.44%)

As you see HEADCROP is not helping in getting more survive reads. What do you suggest?

OK, not as much improvement as i hoped. I guess you will also need to be a bit more liberal with the SLIDINGWINDOW - maybe an average of 10 or 12, rather than 15. You could already get a major improvement with 5, but that is very liberal.

You can also remove the MINLENGTH to see precisely how short the reads are getting after filtering.

Another alternative is the MAXINFO quality filter mode (rather than sliding window) - it adaptively gets stricter during the read, so almost all reads will get close to the target length.

Originally posted by BADE View Post

Should I use these paired reads to do the TopHat with changed settings?

I think you need to gain a few more reads, and then try to gain alignment rate with tophat. I would try alternative settings of --initial-read-mismatches and --segment-mismatches.

You could also consider aligning against the reference transcriptome - it won't get you the alternative splicing, but it will indicate if the reads are mostly ok with a few errors, or completely random, since most standard aligners are a bit more liberal than tophat.

In any case, given the quality plots and the mapping rates, it is a question of how much effort you want to spend on such low-quality data.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News