Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding BAM format.

    Hi,

    I have this output in BAM format.

    NA06984-SRR006041.1145152 1040 1 113040605 57 325M * 0 0 TTGATCACTTCACACACATCTTCATCGATGAGGCTGGCCA
    CTGCATGGAGCCTGAGAGTCTGGTAGCTATAGCAGGTGAGGGACTCAGGTGGGGCTGCAGGTATACACCCTGTGTGGGTCAGAGAGGTTGCACCACTTACCTTTCTTCCCACACCTCTTCTGCTTCCCAGGGCTGATGGAAGTA
    AAGGAAACAGGTGATCCAGGAGGGCAGCTGGTGCTGGCAGGAGACCCTCGGCAGCTGGGGCCTGTGCTGCGTTCCCCACTGACCCAGAAGCATGGACTGGGATACTCACTGCTGGAGCGGCTGCTCACCTACAACTCCCTG 7
    99::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::88:::::::::::;;;;;;;;;;;;;::888:;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:9::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;::::;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;;;;;;;;;;::::::::::::::::::::::::: RG:Z:SRR006041 NM:i:0
    (This is data from the 1000 genomics project.)

    I'm constructing a pipeline to study variations (I get fast-q sequence, index it, align it to ref.seq hg18, do a couple of format conversions and get BAM, call indels and snps, add them to a db, call larger variations, look if they've been reported before, give out fancy graphs and charts, display the alignment, submit a report).

    I'm learning about BWA aligner and the BAM format right now. I'm using pilot data on un-aligned sequences from the 1000 genomes project (because I will have similar BAM outputs).

    I have to study and make sense out of this BAM format. I've read this tutorial on understanding the SAM/ BAM format with little help. Could someone give me further pointers?

    Thanks a lot!
    Joker!sAce
    Last edited by Joker!sAce; 02-28-2011, 07:15 AM.

  • #2
    What specific questions about the format do you have?

    Comment


    • #3
      I understand that there are a lot of columns in this record.

      NA06984-SRR006041.1145152
      1040
      1
      113040605
      57
      325M
      *
      0
      0
      TTGATCACTTCACACACATCTTCATCGATGAGGCTGGCCACTGCATGGAGCCTGAGAGTCTGGTAGCTATAGCAGGTGAGGGACTCAGGTGGGGCTGCAGGTATACACCCTGTGTGGGTCAGAGAGGTTGCACCACTTACCTTTCTTCCCACACCTCTTCTGCTTCCCAGGGCTGATGGAAGTAAAGGAAACAGGTGATCCAGGAGGGCAGCTGGTGCTGGCAGGAGACCCTCGGCAGCTGGGGCCTGTGCTGCGTTCCCCACTGACCCAGAAGCATGGACTGGGATACTCACTGCTGGAGCGGCTGCTCACCTACAACTCCCTG
      799::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::88:::::::::::;;;;;;;;;;;;;::888:;;;;;;;;;;;;;;;;;;;;;;;;;;;;888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:9::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;::::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:::::::::::::::::::::::::
      RG:Z:SRR006041
      NM:i:0

      I'd like to know what they mean. I do have faint ideas but I'd like to know about it anyways.

      Comment


      • #4
        You'll get much better answers if you post specific questionswhich can't be easily found in the SAM format documentation.

        Comment


        • #5
          My study involves divergence study on the gene p53 on short arm of chromosome 17. I need to extract this part of the sequence.

          I understand that I can do this in two ways:
          1. Get raw fasta reads.
          2. Extract from the aligned(to hg18) data(in BAM format).

          How do I do it the 2'nd part?

          Comment


          • #6
            If you know the chromosomal coordinates for your gene (which you can find in the UCSC files or via the browser), then SAMtools can extract this efficiently

            Comment


            • #7
              This sequence has been aligned to hg18. I know the chromosomal co-ordinates for hg18 (chr17:7,520,037-7,531,588 - That's the tp53 repressor gene)

              How do I proceed from here?

              Comment


              • #8
                samtools view aligned.bam chr17:7520037-7531588 > tp53.sam

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                50 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X