SEQanswers (
-   Bioinformatics (
-   -   Converting Tophats bam output back to separate paired end read fastq files (

bob-loblaw 12-03-2012 04:23 AM

Converting Tophats bam output back to separate paired end read fastq files
Hi all,

I was wondering if anyone could offer me some advice on using paired end reads with Tophat, specifically with the output. I'm planning on using Tophat as part of a pipeline for processing my sequence data. The reads that map are obviously going to be easy to deal with, but the unmapped.bam file is proving a bit problematic. I would like to get that bam file back to two fastq files containing the paired reads which didn't map to the reference genome (hg19 in this case). What I was thinking was to convert to sam, and then use Picard's SamToFastq function, but that is returning the following error

MAPQ must be zero if RNAME is not specified;

Which I haven't been able to find anything about online. I'm also not sure how time consuming this will be. I'm currently just playing around with a random sample of my data just trying to get everything working, but my actual data files are probably going to be 20gb + at least in fastq format anyway.

I was also thinking of converting the accepted_hits.bam file to sam and then writing a unix script which would take the files which were input into tophat and write any read which isn't present in the accepted_hits file into 2 new files.

What do you think?

All times are GMT -8. The time now is 01:57 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.