View Single Post
Old 12-03-2012, 04:23 AM   #1
bob-loblaw
Member
 
Location: /home/bob

Join Date: Jun 2012
Posts: 59
Default Converting Tophats bam output back to separate paired end read fastq files

Hi all,

I was wondering if anyone could offer me some advice on using paired end reads with Tophat, specifically with the output. I'm planning on using Tophat as part of a pipeline for processing my sequence data. The reads that map are obviously going to be easy to deal with, but the unmapped.bam file is proving a bit problematic. I would like to get that bam file back to two fastq files containing the paired reads which didn't map to the reference genome (hg19 in this case). What I was thinking was to convert to sam, and then use Picard's SamToFastq function, but that is returning the following error

MAPQ must be zero if RNAME is not specified;

Which I haven't been able to find anything about online. I'm also not sure how time consuming this will be. I'm currently just playing around with a random sample of my data just trying to get everything working, but my actual data files are probably going to be 20gb + at least in fastq format anyway.

I was also thinking of converting the accepted_hits.bam file to sam and then writing a unix script which would take the files which were input into tophat and write any read which isn't present in the accepted_hits file into 2 new files.

What do you think?
bob-loblaw is offline   Reply With Quote