Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting Reference Sequence from a bam File

    I'm currently using the bamtools API for a program where I'm trying to extract and print the reference sequence from a .bam file, but can't figure out how I could get to the reference sequence. Is there any way to determine the reference sequence directly from the bam file, or is there a better way to do this?

    I've searched but haven't come up with anything.

    Thanks for any help,
    Andy

  • #2
    You can't

    If you read the SAM/BAM file format definition, you'll see they don't actually contain the reference sequences. All the BAM file header contains is a SAM header (optional chunk of embedded text), number of references, their names and lengths.

    You would normally have a FASTA file to accompany the SAM/BAM file.

    Comment


    • #3
      Should've seen that, thanks.

      Do you have any advice on how to rapidly extract a subsequence from a fasta file using an index file?
      Last edited by andy11; 12-13-2010, 06:18 AM.

      Comment


      • #4
        I've never understood the decision not to (optionally) bundle the reference into the BAM/SAM file. It seems that for most secondgen datasets that the space that the reference would take up would be a trivial amount while the amount of headaches avoided by not having to chase down the proper reference would be large. But, alas, I was not consulted. :-)

        Getting back to 'andy11's question "how to rapidly extract a subsequence from a fasta file using an index file?" we will need a bit more information. Which program created the index file? I am assuming that you are not talking about the BAM/SAM index since that had nothing to do with the reference file.

        Comment


        • #5
          Originally posted by westerman View Post
          I've never understood the decision not to (optionally) bundle the reference into the BAM/SAM file. It seems that for most secondgen datasets that the space that the reference would take up would be a trivial amount while the amount of headaches avoided by not having to chase down the proper reference would be large. But, alas, I was not consulted. :-)
          I agree with you that optionally including the reference sequence could be very useful, especially for non-model organisms. The SAM/BAM design is clearly designed more for mapping and re-sequencing than for de novo assembly.

          Comment


          • #6
            There are many small alignment files far smaller than the compressed human reference genome (e.g. most of 1000 genomes alignments).

            Yes, for users, it would always be good to have more options, but for developers, having more options is a burden which has to be evaluated carefully; the adoption of SAM would also be delayed that way.

            Comment


            • #7
              Originally posted by andy11 View Post
              Should've seen that, thanks.

              Do you have any advice on how to rapidly extract a subsequence from a fasta file using an index file?
              I think USEQ has a utility to specify bed file with coordinates, and it extracts the fasta sequences from a larger fasta file!
              --
              bioinfosm

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:35 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 02:46 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Working...
              X