Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • danielsbrewer
    Member
    • Feb 2009
    • 35

    Tophat2: prepare unmapped.bam file for input into a tophat run on alternative genome

    I have some paired-end Illumina RNAseq data and have run tophat2 on it against the human genome. I would like now like to run tophat2 again to align the unmapped bams on some alternative genomes to check for contamination/infection. To do this I need to convert the unmapped.bam into fastq files.

    To do this I do the following:
    1) Remove any reads without a matching pair
    Code:
    samtools view -f1 -b unmapped.bam > unmapped_paired.bam
    2) Sort the reads according to name
    Code:
    samtools sort -n unmapped_paired.bam unmapped_paired_sort.bam
    3) Run tophat's bam2fastx to get fastq
    Code:
    bam2fastx -q -Q -A -P -o test unmapped_paired_sort.bam
    Unfortunately this reports an error:
    Code:
    Error: couldn't retrieve both reads for pair HISEQ2500-01:110:H7AGVADXX:1:1101:1336:2967. Perhaps the input file is not sorted by name?
    The problem is that the unmapped.bam file does not seem to have any information in the RNEXT column about the read name of the matched pair. Anyway three steps just to convert the data back to fastqs seems over the top.

    Does anyone have any idea how to fix this problem, or provide a better way to do it?

    Thanks
    Last edited by danielsbrewer; 01-13-2014, 02:51 AM.
  • danielsbrewer
    Member
    • Feb 2009
    • 35

    #2
    On further examination, it appears that the FLAGS in the unmapped.bam are inaccurate and even after filtering out the reads without the unpaired flag, there are still reads that are unpaired. I assume this is because the other read of the pair has been mapped.

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      You might want to try the "--no-mixed" option for tophat2 next time.

      Comment

      • danielsbrewer
        Member
        • Feb 2009
        • 35

        #4
        Yes that would have done the trick. Still playing around with RNAseq data so I am definitely in the learning phase!

        The script in the following looks like it will help:
        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


        Just giving it a go now.

        Comment

        • danielsbrewer
          Member
          • Feb 2009
          • 35

          #5
          Yes that would have done the trick. Still playing around with RNAseq data so I am definitely in the learning phase!

          The script in the following looks like it will help:
          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


          Just giving it a go now.

          Comment

          • bpb9
            Member
            • Aug 2012
            • 24

            #6
            bam2fastx libz error

            I too am trying to make a fast file out of the unmapped reads so that I can run top hat on an alternative genome. I get a different error:

            samtools sort -n unmapped.bam unmapped_sort.bam
            bam2fastx -q -Q -A -o outfile unmapped_sort.bam.bam

            I get this error:
            bam2fastx: /lib64/libz.so.1: no version information available (required by bam2fastx)

            Anyone come across this error before?

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              One possibility is that you are running older versions of libz/libxml2. Are you able to get the bam2fastx to complete (that "error" is likely a warning) otherwise?

              Comment

              • bpb9
                Member
                • Aug 2012
                • 24

                #8
                Warning can be ignored

                Originally posted by GenoMax View Post
                One possibility is that you are running older versions of libz/libxml2. Are you able to get the bam2fastx to complete (that "error" is likely a warning) otherwise?
                Hm…sure enough, despite the warning, there is in fact a fastq file produced anyway.

                But when I run the program from the cluster's login node (shame on me, I know) I don't get the error, and I still get the fast file. Could that be due to different versions of the program running on the login vs. compute nodes? Any idea?

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Originally posted by bpb9 View Post
                  But when I run the program from the cluster's login node (shame on me, I know) I don't get the error, and I still get the fast file. Could that be due to different versions of the program running on the login vs. compute nodes? Any idea?
                  That is certainly a possibility. On large clusters sometimes a few stray nodes don't get updated properly/fully. If you know which node gave you the error let the admins know. They should be able to manually update that node.

                  Comment

                  • offspring
                    Member
                    • Mar 2013
                    • 32

                    #10
                    Just a note on this general topic, the script fix_tophat_unmapped_reads.py in https://github.com/cbrueffer/misc_bioinf/ fixes various issues in unmapped.bam files that prevent them from being used in downstream tools.

                    Comment

                    • fchatonnet
                      Member
                      • Sep 2014
                      • 30

                      #11
                      It might be a very late answer, but apparently, tophat can even accept bam files as input. I tested it by error and it works perfectly, no differences with an alignment with a fastq file obtained after bam2fatsq transformation...
                      If anyone can confirm that I'm not doing anything wrong, it would be nice.

                      Comment

                      Latest Articles

                      Collapse

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      24 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      28 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      22 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...