Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • About Samtools view -L

    Can this function be more efficient when dealing with whole genome alignment, for example 15G bam?

    I had thought that it would use a bam index file, but it didn't.
    Is it useful to build a bam index to the huge bam to accelerate the process?

    Any other suggestion?
    Last edited by Brace; 03-29-2013, 01:04 AM.

  • #2
    samtools -L

    Looking at the code for samtools view, it does not appear that the -L flag uses the index to jump to regions specified in a BED file. The -L flag goes through the BAM file line by line and print out lines that overlap with a region in the BED file. Have you tried using the -L flag with a BED file? Does it run slower than you'd like?

    I'm trying to think of a better explanation, but the code looks like it's written to balance lots of random access (via the index) vs retrieving lots of regions--if there are few desired regions relative to the genome size, it's faster to look for the specific region rather than to scan the whole BAM file. If you're looking for lots of regions relative to the genome size, then it's faster to scan through the file line by line and print the lines that overlap your regions of interest.

    I would suggest that depending on how many regions you're looking to retrieve, it may be worth specifying them on the command line if there aren't too many. If you have lots of regions you're looking to retrieve from (I've gone up to a few million before), then the -L flag may be a better choice.

    Comment


    • #3
      Thanks MBekritsky, I've got it.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 11:49 AM
      0 responses
      15 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X