Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • danielsbrewer
    Member
    • Feb 2009
    • 35

    Tophat2: prepare unmapped.bam file for input into a tophat run on alternative genome

    I have some paired-end Illumina RNAseq data and have run tophat2 on it against the human genome. I would like now like to run tophat2 again to align the unmapped bams on some alternative genomes to check for contamination/infection. To do this I need to convert the unmapped.bam into fastq files.

    To do this I do the following:
    1) Remove any reads without a matching pair
    Code:
    samtools view -f1 -b unmapped.bam > unmapped_paired.bam
    2) Sort the reads according to name
    Code:
    samtools sort -n unmapped_paired.bam unmapped_paired_sort.bam
    3) Run tophat's bam2fastx to get fastq
    Code:
    bam2fastx -q -Q -A -P -o test unmapped_paired_sort.bam
    Unfortunately this reports an error:
    Code:
    Error: couldn't retrieve both reads for pair HISEQ2500-01:110:H7AGVADXX:1:1101:1336:2967. Perhaps the input file is not sorted by name?
    The problem is that the unmapped.bam file does not seem to have any information in the RNEXT column about the read name of the matched pair. Anyway three steps just to convert the data back to fastqs seems over the top.

    Does anyone have any idea how to fix this problem, or provide a better way to do it?

    Thanks
    Last edited by danielsbrewer; 01-13-2014, 02:51 AM.
  • danielsbrewer
    Member
    • Feb 2009
    • 35

    #2
    On further examination, it appears that the FLAGS in the unmapped.bam are inaccurate and even after filtering out the reads without the unpaired flag, there are still reads that are unpaired. I assume this is because the other read of the pair has been mapped.

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      You might want to try the "--no-mixed" option for tophat2 next time.

      Comment

      • danielsbrewer
        Member
        • Feb 2009
        • 35

        #4
        Yes that would have done the trick. Still playing around with RNAseq data so I am definitely in the learning phase!

        The script in the following looks like it will help:
        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


        Just giving it a go now.

        Comment

        • danielsbrewer
          Member
          • Feb 2009
          • 35

          #5
          Yes that would have done the trick. Still playing around with RNAseq data so I am definitely in the learning phase!

          The script in the following looks like it will help:
          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


          Just giving it a go now.

          Comment

          • bpb9
            Member
            • Aug 2012
            • 24

            #6
            bam2fastx libz error

            I too am trying to make a fast file out of the unmapped reads so that I can run top hat on an alternative genome. I get a different error:

            samtools sort -n unmapped.bam unmapped_sort.bam
            bam2fastx -q -Q -A -o outfile unmapped_sort.bam.bam

            I get this error:
            bam2fastx: /lib64/libz.so.1: no version information available (required by bam2fastx)

            Anyone come across this error before?

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              One possibility is that you are running older versions of libz/libxml2. Are you able to get the bam2fastx to complete (that "error" is likely a warning) otherwise?

              Comment

              • bpb9
                Member
                • Aug 2012
                • 24

                #8
                Warning can be ignored

                Originally posted by GenoMax View Post
                One possibility is that you are running older versions of libz/libxml2. Are you able to get the bam2fastx to complete (that "error" is likely a warning) otherwise?
                Hm…sure enough, despite the warning, there is in fact a fastq file produced anyway.

                But when I run the program from the cluster's login node (shame on me, I know) I don't get the error, and I still get the fast file. Could that be due to different versions of the program running on the login vs. compute nodes? Any idea?

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Originally posted by bpb9 View Post
                  But when I run the program from the cluster's login node (shame on me, I know) I don't get the error, and I still get the fast file. Could that be due to different versions of the program running on the login vs. compute nodes? Any idea?
                  That is certainly a possibility. On large clusters sometimes a few stray nodes don't get updated properly/fully. If you know which node gave you the error let the admins know. They should be able to manually update that node.

                  Comment

                  • offspring
                    Member
                    • Mar 2013
                    • 32

                    #10
                    Just a note on this general topic, the script fix_tophat_unmapped_reads.py in https://github.com/cbrueffer/misc_bioinf/ fixes various issues in unmapped.bam files that prevent them from being used in downstream tools.

                    Comment

                    • fchatonnet
                      Member
                      • Sep 2014
                      • 30

                      #11
                      It might be a very late answer, but apparently, tophat can even accept bam files as input. I tested it by error and it works perfectly, no differences with an alignment with a fastq file obtained after bam2fatsq transformation...
                      If anyone can confirm that I'm not doing anything wrong, it would be nice.

                      Comment

                      Latest Articles

                      Collapse

                      • GATTACAT
                        Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by GATTACAT
                        Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                        07-01-2026, 11:43 AM
                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 11:08 AM
                      0 responses
                      7 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-30-2026, 05:37 AM
                      0 responses
                      11 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-26-2026, 11:10 AM
                      0 responses
                      19 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      53 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...