Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to quick index the sam record according to the read name?

    How to quick index the sam record according to the read name?

    assum i hava a read which named "afaNma_1", and this read has record in the sam format file;

    I want to index and get the samRecord of read "afaNma_1" in this sam file quickly, Can anyone tell me how should i do?


    Thanks

  • #2
    I'm not aware of anything convenient for doing this, but someone else might be able to shed light.

    If you are comfortable with programming, I'd sort the file by name, leaving the header records on top (samtools probably has something for this). Then write a program to pluck out your target record using a binary search algorithm (http://en.wikipedia.org/wiki/Binary_search_algorithm). Java has a RandomAccessFile class for quickly accessing arbitary file bytes, though I'm sure other languages have their equivalents. The tricky part will be finding the start of a record containing an arbitary byte - you will have to work backwards to find a newline or the start-of-file.

    I know this isn't creating an index, but it should be lightning fast for practical purposes.

    Comment


    • #3
      you can use this
      miscellaneous scripts for bioinformatics/genomics that dont merit their own repo. - brentp/bio-playground

      if you have python and tokyo-cabinet

      Comment


      • #4
        I have some still experimental code for SAM/BAM with indexing by name for Biopython here: http://github.com/peterjc/biopython/...-sam-bam-index

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Innovations in Spatial Biology
          by seqadmin


          Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

          3D Genomics
          While spatial biology often involves studying proteins and RNAs in their...
          01-01-2025, 07:30 PM
        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 01-09-2025, 04:04 PM
        0 responses
        432 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-09-2025, 09:42 AM
        0 responses
        441 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-08-2025, 03:17 PM
        0 responses
        453 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-03-2025, 11:18 AM
        1 response
        50 views
        1 like
        Last Post Tonia
        by Tonia
         
        Working...
        X