Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • indexing tophat bam files

    Hi,

    I am having trouble using samtools to index my tophat output for IGV viewing. The tophat output bam should be sorted (although I am having trouble too using samtools to sort the tophat output bam file).

    This is how I call the tophat:
    tophat2 -M --b2-very-sensitive --GTF ~/Documents/transcriptome_gtf/genes.gtf -p 7 --read-realign-edit-dist 0 --output-dir ./example ~/Documents/genome_UCSC/genome ~/Documents/Data/example.fastq

    I then call samtools indexing using:
    samtools index accepted_hits.bam

    But I would get this error:
    [bam_index_build2] fail to create the index file.

    Doing samtools sorting with below command give me this error:
    samtools sort ./accepted_hits.bam sort.prefix

    [bam_sort_core] merging from 12 files...
    open: No such file or directory
    [bam_merge_core] fail to open file sort.prefix.0000.bam

    At this point, I'm not sure what is going on. Please help!

    Zach

  • #2
    Can you sort using this command

    Code:
    $ samtools sort ./accepted_hits.bam accepted_hits_sorted
    and they try indexing the sorted file.

    Comment


    • #3
      Originally posted by GenoMax View Post
      Can you sort using this command

      Code:
      $ samtools sort ./accepted_hits.bam accepted_hits_sorted
      and they try indexing the sorted file.
      Nope, I would get the same error message.

      Comment


      • #4
        Which version of samtools are you using?

        Is sorting process making temporary files (with names containing 0001.bam etc) before you get that error?
        Last edited by GenoMax; 12-05-2014, 05:41 PM.

        Comment


        • #5
          Originally posted by GenoMax View Post
          Which version of samtools are you using?
          Version number is 0.1.19-4428cd

          Originally posted by GenoMax View Post
          Is sorting process making temporary files (with names containing 0001.bam etc) before you get that error?
          It looks like no temporary files are created. The command throws the error message after less than a minute of running (actually I'm not sure how long it typically takes). It looks like it stops after loading the file, since calling the same command with the unmapped bam file as argument is much faster in reaching the error message.

          Comment


          • #6
            Is this the version bundled with TopHat code (which is the one tested to work)?

            Comment


            • #7
              Originally posted by GenoMax View Post
              Is this the version bundled with TopHat code (which is the one tested to work)?
              I think I installed samtools before tophat. Everything works actually with tophat and I am able to use the BAM files for HTSEQ and then DESEQ2.

              Comment


              • #8
                Check how much free disk space you have.

                Comment


                • #9
                  Originally posted by blancha View Post
                  Check how much free disk space you have.
                  That shouldn't be a problem, there are more than 700gb left on the hard-drive.

                  Comment


                  • #10
                    Devon Ryan seems to describe the bug here.


                    I would just install samtools 1.1 which has many interesting new features anyway.
                    It should fix the issue.

                    Comment


                    • #11
                      No harm in trying the latest samtools but TopHat page has this to say

                      Removed SAMtools as an external dependency in order to avoid incompatibility issues with recent and future changes of SAMtools and its code library (an older, stable SAMtools version is now packaged with TopHat)
                      I also see a v.0.1.20 on samtools download page so if you want to stay with the old series give that a try.

                      Comment


                      • #12
                        Right, you should also get the latest version of TopHat that comes bundled with the appropriate version of samtools required by TopHat.

                        You'll then have the best of best worlds, the latest version of TopHat running with a tried and tested version of samtools, and the latest version of samtools with all the new bells and whistles.

                        I'm basing all these assumptions on Devon Ryan's post, but his explanations are quite convincing and his description of the bug corresponds to yours.

                        My advice:
                        1- Install the very latest version of samtools with all the new bells and whistles, and without the bug.
                        2- Install the latest version of TopHat2 which comes bundled with a tried and tested version of samtools, that has been tested for compatibility with TopHat2. (This version will be used internally by TopHat.)

                        Comment


                        • #13
                          Incidentally, you will still need to sort the BAM file before indexing it, as GenoMax pointed out.

                          Comment


                          • #14
                            Thanks for all the inputs, looks like updating fixed this bug!

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin


                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                              Yesterday, 07:01 AM
                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            45 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            46 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            39 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            55 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X