SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Align paired and unpaired reads with Tophat (http://seqanswers.com/forums/showthread.php?t=40581)

shocker8786 02-05-2014 12:55 AM

Align paired and unpaired reads with Tophat
 
I have a stranded PE RNAseq data set that I want to align with tophat using the --library-type fr-firststrand option. After adaptor trimming I end up with my 4 files:

paired_1
paired_2
unpaired_1
unpaired_2

Is there a way to align these so I do not loose the strand information for the unpaired reads? When I run tophat with the files listed like this:

paired_1,unpaired_1 paired_2,unpaired_2

It seems to want to try and align the two unpaired files as paired files. If I combine the two unpaired files and run tophat with the files listed like this:

paired_1 paired_2,unpaired

It recognizes the last file as unpaired. But am I loosing my strand specificity by aligning this way? Thanks.

dpryan 02-05-2014 01:46 AM

Inputting the files like:
Code:

paired_1,unpaired_1 paired_2,unpaired_2
Will result in "unpaired_1" and "unpaired_2" being treated as paired, which is the opposite of what you want. To keep the strandedness correct you'll need to run things twice. Firstly using "paired_1,unpaired_1 paired_2" with library-type set to fr-firststrand and then just "unpaired_2" by itself with "fr-secondstrand". I should note that aligning unpaired_2 is usually not worthwhile (the reads are often crap), but perhaps you'll get luckier than I have with that.

shocker8786 02-05-2014 02:40 AM

Thank you very much for your response, that makes a lot of sense.

natar210@gmail.com 02-06-2014 08:07 PM

On this note ..I Have a question. I am doing PE RNA-Seq analysis of mouse data. My read length is 90bp and fragment size is 127 bp which means I have overlapping reads. I used Flash and found that not all reads overlap.

I have a file with merged reads which overlaps and also two files with non overlap reads. basically three fastq files.

How do I go about running Tophat with this and I am not really sure hot to calculate the mean inner mate distance and sd??

Has anyone come across this situation???

dpryan 02-07-2014 01:47 AM

There's no need to run Flash on them, just use tophat2 and bowtie2 instead of bowtie1.

For the mean inner distance, just try 0 and see if that produces acceptable results (I recall reading that tophat re-estimates the insert length as it runs, though I can't say I've ever checked if that's correct).

natar210@gmail.com 02-07-2014 01:59 AM

Thanks alot Ryan... I will try that.

NicoBxl 02-07-2014 02:02 AM

or just use STAR that do not need to specify inner distance. and also is much faster for the same ( even better ) results


All times are GMT -8. The time now is 05:43 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.