SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat, trimmed PE reads, and SAM flags zzhao2 RNA Sequencing 8 05-29-2014 07:28 AM
FastQC analyses of trimmed MiSeq reads kmer content akjones Bioinformatics 4 02-07-2014 06:50 AM
Using trimmed reads with HTseq-count before DESeq? aafc Bioinformatics 0 12-04-2012 11:24 AM
Using trimmed reads in bwa (PE100bp data) angelawu Bioinformatics 0 05-02-2011 07:32 PM
Problems with bwa on Q2 trimmed paired end reads curious_mapper Bioinformatics 2 05-06-2010 02:44 PM

Reply
 
Thread Tools
Old 10-14-2014, 05:34 AM   #1
travelk
Member
 
Location: France

Join Date: Jul 2013
Posts: 20
Default STAR with trimmed reads

Hi Everyone,

I am doing a comparison of tophat2 vs STAR alignment of my RNA-seq data, and trimmed vs untrimmed data. (I was getting different results using tophat2 than the bioinformaticians were with STAR but they don't seem to interested in determining why which is why I am testing it out myself). I found quite a large difference in mapping efficiency in tophat when I trimmed my reads using cutadapt (up to 35% more mapping) compared to untrimmed. I know STAR is supposed to soft clip the reads but I'm still curious to see if there is any difference and the percentages compared to tophat2. While I have no problems with my raw input data in STAR, it doesn't seem to like my trimmed reads and gives the following error:

EXITING because of FATAL ERROR: Read1 and Read2 are not consistent, reached the end of the one before the other one
SOLUTION: Check you your input files: they may be corrupted

I assume this is because during the trimming process, they will no longer all be 100bp long and I will lose some reads altogether. I tried the following option: --readMatesLengthsIn NotEqual but it still gave the same error.

Any suggestions? Will STAR let me run the files if they aren't equal? Or is it pointless to test trimmed reads with STAR at all?

Thanks for your help!
travelk is offline   Reply With Quote
Old 10-14-2014, 05:49 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

It sounds like you just trimmed incorrectly. What was the exact command you used?
dpryan is offline   Reply With Quote
Old 10-14-2014, 06:03 AM   #3
travelk
Member
 
Location: France

Join Date: Jul 2013
Posts: 20
Default

I trimmed two adapters based on the overrepresented sequences found by FastQC: the Nextera barcodes and a primer used during the cDNA synthesis.

Code:
cutadapt -q 10 -a CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTGAAAAA -b AAGCAGTGGTATCAACGCAGAGTACNNNNN --minimum-length 36 Sample1_R1.fastq > Sample1trim_R1.fastq 2> Sample1trimlogR1
travelk is offline   Reply With Quote
Old 10-14-2014, 12:29 PM   #4
amitm
Member
 
Location: Manchester, UK

Join Date: Feb 2011
Posts: 52
Default

Just as a side note, I have used STAR on trimmed reads (unequal lengths) and it works fine.

Have you checked if the order of the reads in R1 file and R2 file are the same? From the error message it seems that either of the file has more reads. Check using wc -l

I use Trimmomatic in Paired-end mode for clipping adapters. The final files have only those reads that passed QC in both R1 and R2. Check if this is the case from cutadapt output
amitm is offline   Reply With Quote
Old 10-14-2014, 04:36 PM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It sounds like the error message is poorly-worded and actually means there are different numbers of reads in the two files. It sounds like you did your trimming incorrectly such that paired reads were not kept together. When trimming paired reads, you must trim both together, not one file at a time in different processes.
Brian Bushnell is offline   Reply With Quote
Old 10-14-2014, 11:30 PM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

Trimming the input files separately will lead to a lot of problems. As suggested, use trimmomatic or trim_galore or skewer to trim both files at once.
dpryan is offline   Reply With Quote
Old 10-15-2014, 04:40 AM   #7
travelk
Member
 
Location: France

Join Date: Jul 2013
Posts: 20
Default

Ok, I checked with cutadapt and indeed, I hadn't trimmed them properly for paired data. I reran the STAR alignment and it worked. Thank you all for taking the time to help me.

As a note, I originally trimmed my data with trimmomatic but got errors with both tophat and STAR so I opted for cutadapt instead.

Code:
java -jar /path/to/Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 8 -phred33 -trimlog Sample1trimlog sample1_R1.fastq sample1_R2.fastq sample1_R1_TP.fastq sample1_R1_TU.fastq sample1_R2_TP.fastq sample1_R2_TU.fastq ILLUMINACLIP:/path/to/Trimmomatic-0.32/adapters/adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
STAR error:

EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >

tophat2 error:

Error: beginning of quality values record not found! (@D3VDZHS1:119:H036PADXX:1:1103:8363:72199 1:N:0:GGACTCCTTATCCTCT)
travelk is offline   Reply With Quote
Old 10-15-2014, 08:00 AM   #8
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by dpryan View Post
Trimming the input files separately will lead to a lot of problems. As suggested, use trimmomatic or trim_galore or skewer to trim both files at once.
It looks like the output files were corrupted somehow. Can you output the top 8 lines of each file?

And if you want another trimming option, I recommend BBDuk.

Syntax:

bbduk.sh -Xmx1g in1=reads1.fq in2=reads2.fq out1=trimmed1.fq out2=trimmed2.fq ref=truseq.fa.gz,nextera.fa.gz k=25 ktrim=r hdist=1 tbo tpe

truseq.fa.gz and nextera.fa.gz are included with the package, in the /resources/ directory.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
cutadapt, star, trimmed

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO