![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
TopHat, trimmed PE reads, and SAM flags | zzhao2 | RNA Sequencing | 8 | 05-29-2014 07:28 AM |
FastQC analyses of trimmed MiSeq reads kmer content | akjones | Bioinformatics | 4 | 02-07-2014 06:50 AM |
Using trimmed reads with HTseq-count before DESeq? | aafc | Bioinformatics | 0 | 12-04-2012 11:24 AM |
Using trimmed reads in bwa (PE100bp data) | angelawu | Bioinformatics | 0 | 05-02-2011 07:32 PM |
Problems with bwa on Q2 trimmed paired end reads | curious_mapper | Bioinformatics | 2 | 05-06-2010 02:44 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: France Join Date: Jul 2013
Posts: 20
|
![]()
Hi Everyone,
I am doing a comparison of tophat2 vs STAR alignment of my RNA-seq data, and trimmed vs untrimmed data. (I was getting different results using tophat2 than the bioinformaticians were with STAR but they don't seem to interested in determining why which is why I am testing it out myself). I found quite a large difference in mapping efficiency in tophat when I trimmed my reads using cutadapt (up to 35% more mapping) compared to untrimmed. I know STAR is supposed to soft clip the reads but I'm still curious to see if there is any difference and the percentages compared to tophat2. While I have no problems with my raw input data in STAR, it doesn't seem to like my trimmed reads and gives the following error: EXITING because of FATAL ERROR: Read1 and Read2 are not consistent, reached the end of the one before the other one SOLUTION: Check you your input files: they may be corrupted I assume this is because during the trimming process, they will no longer all be 100bp long and I will lose some reads altogether. I tried the following option: --readMatesLengthsIn NotEqual but it still gave the same error. Any suggestions? Will STAR let me run the files if they aren't equal? Or is it pointless to test trimmed reads with STAR at all? Thanks for your help! |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
It sounds like you just trimmed incorrectly. What was the exact command you used?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: France Join Date: Jul 2013
Posts: 20
|
![]()
I trimmed two adapters based on the overrepresented sequences found by FastQC: the Nextera barcodes and a primer used during the cDNA synthesis.
Code:
cutadapt -q 10 -a CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTGAAAAA -b AAGCAGTGGTATCAACGCAGAGTACNNNNN --minimum-length 36 Sample1_R1.fastq > Sample1trim_R1.fastq 2> Sample1trimlogR1 |
![]() |
![]() |
![]() |
#4 |
Member
Location: Manchester, UK Join Date: Feb 2011
Posts: 52
|
![]()
Just as a side note, I have used STAR on trimmed reads (unequal lengths) and it works fine.
Have you checked if the order of the reads in R1 file and R2 file are the same? From the error message it seems that either of the file has more reads. Check using wc -l I use Trimmomatic in Paired-end mode for clipping adapters. The final files have only those reads that passed QC in both R1 and R2. Check if this is the case from cutadapt output |
![]() |
![]() |
![]() |
#5 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
It sounds like the error message is poorly-worded and actually means there are different numbers of reads in the two files. It sounds like you did your trimming incorrectly such that paired reads were not kept together. When trimming paired reads, you must trim both together, not one file at a time in different processes.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: France Join Date: Jul 2013
Posts: 20
|
![]()
Ok, I checked with cutadapt and indeed, I hadn't trimmed them properly for paired data. I reran the STAR alignment and it worked. Thank you all for taking the time to help me.
As a note, I originally trimmed my data with trimmomatic but got errors with both tophat and STAR so I opted for cutadapt instead. Code:
java -jar /path/to/Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 8 -phred33 -trimlog Sample1trimlog sample1_R1.fastq sample1_R2.fastq sample1_R1_TP.fastq sample1_R1_TU.fastq sample1_R2_TP.fastq sample1_R2_TU.fastq ILLUMINACLIP:/path/to/Trimmomatic-0.32/adapters/adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or > tophat2 error: Error: beginning of quality values record not found! (@D3VDZHS1:119:H036PADXX:1:1103:8363:72199 1:N:0:GGACTCCTTATCCTCT) |
![]() |
![]() |
![]() |
#8 | |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]() Quote:
And if you want another trimming option, I recommend BBDuk. Syntax: bbduk.sh -Xmx1g in1=reads1.fq in2=reads2.fq out1=trimmed1.fq out2=trimmed2.fq ref=truseq.fa.gz,nextera.fa.gz k=25 ktrim=r hdist=1 tbo tpe truseq.fa.gz and nextera.fa.gz are included with the package, in the /resources/ directory. |
|
![]() |
![]() |
![]() |
Tags |
cutadapt, star, trimmed |
Thread Tools | |
|
|