SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat: merging accepted_hits.bam and unmapped.bam offspring RNA Sequencing 36 08-13-2015 04:08 AM
extrat unaligned reads from unmapped.bam (tophat) vivienne_lovely Bioinformatics 1 05-26-2013 11:58 PM
TopHat "-M" option and Unmapped.bam file washy RNA Sequencing 2 05-24-2013 07:20 AM
Tophat 2 unmapped.bam chadn737 Bioinformatics 0 04-16-2012 05:04 PM
Split accepted_hits.bam file after Tophat run? hong_sunwoo Bioinformatics 6 10-18-2010 01:06 AM

Reply
 
Thread Tools
Old 01-13-2014, 02:48 AM   #1
danielsbrewer
Member
 
Location: UK

Join Date: Feb 2009
Posts: 27
Default Tophat2: prepare unmapped.bam file for input into a tophat run on alternative genome

I have some paired-end Illumina RNAseq data and have run tophat2 on it against the human genome. I would like now like to run tophat2 again to align the unmapped bams on some alternative genomes to check for contamination/infection. To do this I need to convert the unmapped.bam into fastq files.

To do this I do the following:
1) Remove any reads without a matching pair
Code:
samtools view -f1 -b unmapped.bam > unmapped_paired.bam
2) Sort the reads according to name
Code:
samtools sort -n unmapped_paired.bam unmapped_paired_sort.bam
3) Run tophat's bam2fastx to get fastq
Code:
bam2fastx -q -Q -A -P -o test unmapped_paired_sort.bam
Unfortunately this reports an error:
Code:
Error: couldn't retrieve both reads for pair HISEQ2500-01:110:H7AGVADXX:1:1101:1336:2967. Perhaps the input file is not sorted by name?
The problem is that the unmapped.bam file does not seem to have any information in the RNEXT column about the read name of the matched pair. Anyway three steps just to convert the data back to fastqs seems over the top.

Does anyone have any idea how to fix this problem, or provide a better way to do it?

Thanks

Last edited by danielsbrewer; 01-13-2014 at 02:51 AM.
danielsbrewer is offline   Reply With Quote
Old 01-13-2014, 03:12 AM   #2
danielsbrewer
Member
 
Location: UK

Join Date: Feb 2009
Posts: 27
Default

On further examination, it appears that the FLAGS in the unmapped.bam are inaccurate and even after filtering out the reads without the unpaired flag, there are still reads that are unpaired. I assume this is because the other read of the pair has been mapped.
danielsbrewer is offline   Reply With Quote
Old 01-13-2014, 03:47 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

You might want to try the "--no-mixed" option for tophat2 next time.
dpryan is offline   Reply With Quote
Old 01-13-2014, 03:49 AM   #4
danielsbrewer
Member
 
Location: UK

Join Date: Feb 2009
Posts: 27
Default

Yes that would have done the trick. Still playing around with RNAseq data so I am definitely in the learning phase!

The script in the following looks like it will help:
http://seqanswers.com/forums/showthread.php?t=34520

Just giving it a go now.
danielsbrewer is offline   Reply With Quote
Old 01-13-2014, 03:50 AM   #5
danielsbrewer
Member
 
Location: UK

Join Date: Feb 2009
Posts: 27
Default

Yes that would have done the trick. Still playing around with RNAseq data so I am definitely in the learning phase!

The script in the following looks like it will help:
http://seqanswers.com/forums/showthread.php?t=34520

Just giving it a go now.
danielsbrewer is offline   Reply With Quote
Old 10-15-2014, 01:08 PM   #6
bpb9
Member
 
Location: NYC

Join Date: Aug 2012
Posts: 24
Default bam2fastx libz error

I too am trying to make a fast file out of the unmapped reads so that I can run top hat on an alternative genome. I get a different error:

samtools sort -n unmapped.bam unmapped_sort.bam
bam2fastx -q -Q -A -o outfile unmapped_sort.bam.bam

I get this error:
bam2fastx: /lib64/libz.so.1: no version information available (required by bam2fastx)

Anyone come across this error before?
bpb9 is offline   Reply With Quote
Old 10-15-2014, 05:53 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,585
Default

One possibility is that you are running older versions of libz/libxml2. Are you able to get the bam2fastx to complete (that "error" is likely a warning) otherwise?
GenoMax is offline   Reply With Quote
Old 10-16-2014, 05:43 AM   #8
bpb9
Member
 
Location: NYC

Join Date: Aug 2012
Posts: 24
Default Warning can be ignored

Quote:
Originally Posted by GenoMax View Post
One possibility is that you are running older versions of libz/libxml2. Are you able to get the bam2fastx to complete (that "error" is likely a warning) otherwise?
Hm…sure enough, despite the warning, there is in fact a fastq file produced anyway.

But when I run the program from the cluster's login node (shame on me, I know) I don't get the error, and I still get the fast file. Could that be due to different versions of the program running on the login vs. compute nodes? Any idea?
bpb9 is offline   Reply With Quote
Old 10-16-2014, 05:51 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,585
Default

Quote:
Originally Posted by bpb9 View Post
But when I run the program from the cluster's login node (shame on me, I know) I don't get the error, and I still get the fast file. Could that be due to different versions of the program running on the login vs. compute nodes? Any idea?
That is certainly a possibility. On large clusters sometimes a few stray nodes don't get updated properly/fully. If you know which node gave you the error let the admins know. They should be able to manually update that node.
GenoMax is offline   Reply With Quote
Old 10-16-2014, 06:16 AM   #10
offspring
Member
 
Location: Lund, Sweden

Join Date: Mar 2013
Posts: 31
Default

Just a note on this general topic, the script fix_tophat_unmapped_reads.py in https://github.com/cbrueffer/misc_bioinf/ fixes various issues in unmapped.bam files that prevent them from being used in downstream tools.
offspring is offline   Reply With Quote
Old 05-20-2016, 01:57 AM   #11
fchatonnet
Junior Member
 
Location: France

Join Date: Sep 2014
Posts: 5
Default

It might be a very late answer, but apparently, tophat can even accept bam files as input. I tested it by error and it works perfectly, no differences with an alignment with a fastq file obtained after bam2fatsq transformation...
If anyone can confirm that I'm not doing anything wrong, it would be nice.
fchatonnet is offline   Reply With Quote
Reply

Tags
bam, fastq, rnaseq, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO