SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa mem with paired/unpaired reads mchaisso Bioinformatics 3 09-09-2013 10:43 AM
bam to paired AND unpaired fastq reads Kennels Bioinformatics 4 06-18-2013 10:55 PM
Unpaired reads in paired data Nick General 0 06-22-2011 07:19 AM
RNA-Seq analysis with paired and unpaired reads bzhang Bioinformatics 0 05-14-2010 02:15 PM

Reply
 
Thread Tools
Old 02-04-2014, 11:55 PM   #1
shocker8786
Member
 
Location: Urbana Illinois

Join Date: Jan 2013
Posts: 28
Default Align paired and unpaired reads with Tophat

I have a stranded PE RNAseq data set that I want to align with tophat using the --library-type fr-firststrand option. After adaptor trimming I end up with my 4 files:

paired_1
paired_2
unpaired_1
unpaired_2

Is there a way to align these so I do not loose the strand information for the unpaired reads? When I run tophat with the files listed like this:

paired_1,unpaired_1 paired_2,unpaired_2

It seems to want to try and align the two unpaired files as paired files. If I combine the two unpaired files and run tophat with the files listed like this:

paired_1 paired_2,unpaired

It recognizes the last file as unpaired. But am I loosing my strand specificity by aligning this way? Thanks.
shocker8786 is offline   Reply With Quote
Old 02-05-2014, 12:46 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Inputting the files like:
Code:
paired_1,unpaired_1 paired_2,unpaired_2
Will result in "unpaired_1" and "unpaired_2" being treated as paired, which is the opposite of what you want. To keep the strandedness correct you'll need to run things twice. Firstly using "paired_1,unpaired_1 paired_2" with library-type set to fr-firststrand and then just "unpaired_2" by itself with "fr-secondstrand". I should note that aligning unpaired_2 is usually not worthwhile (the reads are often crap), but perhaps you'll get luckier than I have with that.
dpryan is offline   Reply With Quote
Old 02-05-2014, 01:40 AM   #3
shocker8786
Member
 
Location: Urbana Illinois

Join Date: Jan 2013
Posts: 28
Default

Thank you very much for your response, that makes a lot of sense.
shocker8786 is offline   Reply With Quote
Old 02-06-2014, 07:07 PM   #4
natar210@gmail.com
Junior Member
 
Location: Melbourne

Join Date: Jun 2013
Posts: 2
Default

On this note ..I Have a question. I am doing PE RNA-Seq analysis of mouse data. My read length is 90bp and fragment size is 127 bp which means I have overlapping reads. I used Flash and found that not all reads overlap.

I have a file with merged reads which overlaps and also two files with non overlap reads. basically three fastq files.

How do I go about running Tophat with this and I am not really sure hot to calculate the mean inner mate distance and sd??

Has anyone come across this situation???
natar210@gmail.com is offline   Reply With Quote
Old 02-07-2014, 12:47 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

There's no need to run Flash on them, just use tophat2 and bowtie2 instead of bowtie1.

For the mean inner distance, just try 0 and see if that produces acceptable results (I recall reading that tophat re-estimates the insert length as it runs, though I can't say I've ever checked if that's correct).
dpryan is offline   Reply With Quote
Old 02-07-2014, 12:59 AM   #6
natar210@gmail.com
Junior Member
 
Location: Melbourne

Join Date: Jun 2013
Posts: 2
Default

Thanks alot Ryan... I will try that.
natar210@gmail.com is offline   Reply With Quote
Old 02-07-2014, 01:02 AM   #7
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

or just use STAR that do not need to specify inner distance. and also is much faster for the same ( even better ) results
NicoBxl is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO