![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Count the Number of Paired-End Reads Mapped by Tophat | Fernas | Bioinformatics | 5 | 10-23-2015 02:26 AM |
Asymmetric trimmomatic output with paired-end RNA seq. data | alpha2zee | Bioinformatics | 6 | 11-19-2014 04:57 AM |
Metagenomics/ Number of paired end reads in metasim | kumara | Introductions | 0 | 07-24-2014 11:49 AM |
Trimming paired end RNAseq - Trimmomatic | sjeschonek | Bioinformatics | 1 | 07-14-2014 11:52 AM |
How to count number of mapped paired-end and single-end rna-seq reads | repinementer | Bioinformatics | 8 | 01-06-2013 05:06 AM |
![]() |
|
Thread Tools |
![]() |
#1 | ||
Member
Location: Boston Join Date: Jan 2014
Posts: 13
|
![]()
Hi Friends,
I am having this problem with Illumina Hiseq (v. 1.9) paired end libraries (150nt reads). The number of surviving reads after trimming are very low. This is my command: Code:
java -jar trimmomatic-0.32.jar PE -threads 28 -phred33 WT_CTTGTA_L001_R1_001.fastq WT_CTTGTA_L001_R2_001.fastq Out_paired_WT_CTTGTA_L001_R1_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R1_001.fastq.gz Out_paired_WT_CTTGTA_L001_R2_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R2_001.fastq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Quote:
Quote:
BADE |
||
![]() |
![]() |
![]() |
#2 |
Member
Location: Germany Join Date: Sep 2013
Posts: 14
|
![]()
Can you give some more information about the run itself? Sample, Cluster density, Insert size, size selection and library prep might be helpful for troubleshooting.
It might be worth checking if the quality of the reverse read or the insert size is the actual issue by running the adapter trimming and quality trimming in two separate steps. |
![]() |
![]() |
![]() |
#4 | |||
Member
Location: Boston Join Date: Jan 2014
Posts: 13
|
![]()
Hi Avo,
Quote:
Quote:
Quote:
Thanks BADE |
|||
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,091
|
![]()
Median scores for R2 are still above Q30 so things are not that bad. If this is a re-sequencing project you shouldn't worry about trimming based on Q-scores. Is this a MiSeq run?
|
![]() |
![]() |
![]() |
#6 | ||
Member
Location: Boston Join Date: Jan 2014
Posts: 13
|
![]()
Hi Genomax,
Quote:
Quote:
Thanks BADE |
||
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,091
|
![]()
Reason I asked about this being a MiSeq run was because of the # of reads. 22 million PE reads seems to be on the low end (11 mil unique clusters) for a HiSeq 2500 run.
If you have a reference genome available then I would suggest that you trim only the adapters (and very low Q-scores (< 5), if you are worried about that). That should leave you with more reads to go forward. |
![]() |
![]() |
![]() |
#8 | |||
Member
Location: Boston Join Date: Jan 2014
Posts: 13
|
![]()
Hi Genomax,
Quote:
Quote:
Quote:
Any further suggestions would be helpful. Bade |
|||
![]() |
![]() |
![]() |
#9 | ||
Member
Location: Boston Join Date: Jan 2014
Posts: 13
|
![]()
Hi All,
As suggested in this thread I did the pre-processing of all the samples and proceeded to map the reads with TopHat2 keeping the standard analysis options (pasted below) Quote:
Quote:
Please suggest. BADE |
||
![]() |
![]() |
![]() |
#10 | |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
That normally means that your read ordering got messed up by some preprocessing step, and thus the reads are no longer properly paired. Note, for example -
Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,091
|
![]()
@BADE: You had 22059856 pairs surviving at the end of trimmomatic run. Did you do something to the files afterwards?
|
![]() |
![]() |
![]() |
#12 | |||
Member
Location: Boston Join Date: Jan 2014
Posts: 13
|
![]() Quote:
Quote:
Quote:
Please suggest Thanks, BADE |
|||
![]() |
![]() |
![]() |
#13 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
The data has an unexpectedly low mapping and pairing rate. You may want to do quality-trimming first, or use local alignment, or use a more error-tolerant aligner. As a first step, I would suggest quality-trimming. It's also possible that the quality is so low that adapter-trimming tools can't detect adapter sequence. In that case, unless the genomic material is incredibly precious, you should just resequence it.
What organism is this, and do you have a reference or at least some assembly? |
![]() |
![]() |
![]() |
#14 |
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]()
hi BADE,
You may try skewer for preprocessing your data. It's demonstrated to produce better input for downstream analysis of RNA-Seq data. It's easy to use and runs fast. Last edited by relipmoc; 10-22-2014 at 10:12 PM. Reason: typo |
![]() |
![]() |
![]() |
#15 | |||
Member
Location: Boston Join Date: Jan 2014
Posts: 13
|
![]() Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#16 | |
Senior Member
Location: berlin Join Date: Feb 2010
Posts: 156
|
![]() Quote:
I would suggest you perform a 'HEADCROP' of 3 bp on the data, immediately after the ILLUMINACLIP step, and before the other quality filtering steps. This will simply drop the problem bases entirely from all reads. Alternatively, you could widen the window/lower the threshold of the SLIDINGWINDOW, which would help bridge these dodgy patches. However, even after getting the dodgy data past the trimming stage, you still have the issue of aligning it so the HEADCROP might be better. You might also want to consider more liberal alignment settings in Tophat, since many of the reads are probably failing due to these poor quality bases. |
|
![]() |
![]() |
![]() |
#17 | ||
Member
Location: Boston Join Date: Jan 2014
Posts: 13
|
![]()
Hi tonybolger,
Quote:
Code:
TrimmomaticPE: Started with arguments: -threads 28 -phred33 _WT_CTTGTA_L001_R1_001.fastq _WT_CTTGTA_L001_R2_001.fastq Out_paired_WT_CTTGTA_L001_R1_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R1_001.fastq.gz Out_paired_WT_CTTGTA_L001_R2_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R2_001.fastq.gz ILLUMINACLIP:/home/kakrana/tools/Trimmomatic-0.32/TruSeq3-PE-2.fa:2:30:10:8:true HEADCROP:3 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Input Read Pairs: 22060013 Both Surviving: 14588404 (66.13%) Forward Only Surviving: 6416736 (29.09%) Reverse Only Surviving: 295779 (1.34%) Dropped: 759094 (3.44%) As you see HEADCROP is not helping in getting more survive reads. What do you suggest? Should I use these paired reads to do the TopHat with changed settings? Quote:
* FASTQ Quality Scale: Sanger (PHRED33) * Anchor length: 8 * Maximum number of mismatches that can appear in the anchor region of spliced alignment: 0 * The minimum intron length: 70 * The maximum intron length: 50000 * Minimum isoform fraction: 0.15 * Maximum number of alignments to be allowed: 20 * Minimum intron length that may be found during split-segment (default) search: 50 * Maximum intron length that may be found during split-segment (default) search: 500000 * Number of mismatches allowed in each segment alignment for reads mapped independently: 2 * Minimum length of read segments: 20 * Mate-Pair Inner Distance: 50 * Bowtie 2 speed and sensitivity: Sensitive (slower) Would you please elaborate more about what setting should I use? Thanks for your suggestions. |
||
![]() |
![]() |
![]() |
#18 | ||
Senior Member
Location: berlin Join Date: Feb 2010
Posts: 156
|
![]() Quote:
You can also remove the MINLENGTH to see precisely how short the reads are getting after filtering. Another alternative is the MAXINFO quality filter mode (rather than sliding window) - it adaptively gets stricter during the read, so almost all reads will get close to the target length. Quote:
You could also consider aligning against the reference transcriptome - it won't get you the alternative splicing, but it will indicate if the reads are mostly ok with a few errors, or completely random, since most standard aligners are a bit more liberal than tophat. In any case, given the quality plots and the mapping rates, it is a question of how much effort you want to spend on such low-quality data. |
||
![]() |
![]() |
![]() |
Tags |
illumina, paired end, trimming, trimmomatic |
Thread Tools | |
|
|