Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • genelab
    Member
    • Nov 2009
    • 27

    How to quick index the sam record according to the read name?

    How to quick index the sam record according to the read name?

    assum i hava a read which named "afaNma_1", and this read has record in the sam format file;

    I want to index and get the samRecord of read "afaNma_1" in this sam file quickly, Can anyone tell me how should i do?


    Thanks
  • Bio.X2Y
    Member
    • Apr 2010
    • 46

    #2
    I'm not aware of anything convenient for doing this, but someone else might be able to shed light.

    If you are comfortable with programming, I'd sort the file by name, leaving the header records on top (samtools probably has something for this). Then write a program to pluck out your target record using a binary search algorithm (http://en.wikipedia.org/wiki/Binary_search_algorithm). Java has a RandomAccessFile class for quickly accessing arbitary file bytes, though I'm sure other languages have their equivalents. The tricky part will be finding the start of a record containing an arbitary byte - you will have to work backwards to find a newline or the start-of-file.

    I know this isn't creating an index, but it should be lightning fast for practical purposes.

    Comment

    • brentp
      Member
      • Apr 2010
      • 72

      #3
      you can use this
      miscellaneous scripts for bioinformatics/genomics that dont merit their own repo. - brentp/bio-playground

      if you have python and tokyo-cabinet

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        I have some still experimental code for SAM/BAM with indexing by name for Biopython here: http://github.com/peterjc/biopython/...-sam-bam-index

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          03-24-2025, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        49 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        57 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        50 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        201 views
        0 reactions
        Last Post seqadmin  
        Working...