Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do I need to sort and index a bam file?

    Hi all,

    I have a large number of bam files from multiple individuals. I need to call snps on them using mpileup. When I sort and index with samtools, the I/O goes crazy on my cluster. Is the sorting/indexing necessary or does it just speed things up?

    Thanks,

  • #2
    Almost certainly necessary. One way to think about it is you'll either pay this penalty now or pay it later -- any caller needs all the reads from the same region, and it is quick & convenient if they are together. I think most callers won't even call on unsorted data.

    One thing to look at: are you sorting/indexing on an NFS drive & can you do the sort/index on a local drive to the machine. That might improve performance.

    Comment


    • #3
      Thanks for your reply. Yes, I'm indexing on the NFS, I'll try to code the bash to sort and index on the local drive and then move to central storage.

      Comment


      • #4
        Likely you will need to sort no matter what software you are using. For samtools mpileup, you actually don't need the .bam index, but those files are small, and fast to make, so you might as well make them.

        Comment


        • #5
          one good reason you wanna sort/index your bam
          is so that you can split your bams into chr or even regions so that you can parallelize your mpileup calls.
          http://kevin-gattaca.blogspot.com/

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          39 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          35 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X