Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why don't my SAM files list the chromosomes?

    I used the latest version of BWA. I tried the program 4 different ways on the same paired-end sequence to see which gives me the best quality.

    First way involved using mem. I used one paired-end read that had the adaptor sequences chopped off. I then chopped off poor quality bases from that same file and ran BWA again.

    Second way involved using aln and sampe. I tried this two different ways like the first way.

    After this process, I used samtools for each sam file produced. For each sam file, I converted to bam. Then I sorted the bam file. Then I used the index command on the bam file. Finally I used idxstats for stats.

    My questions:

    1. After using bwa to align/map and then using samtools to sort and index, I checked out each final bam file by converting them to a sam file and I viewed them in the terminal.

    I couldn't seem to find the chromosome, I think in the third column. Why?


    Example from SAM file:
    Code:
    M00532:8:000000000-A17VF:1:1101:16380:1451      83      Serratia        3298780 29      229M1S  =       3298620 -389    TGTCGTTCGCCAACTTCAGCGTGCTCTGGACCTCAATGGCCTTTNTGCTCGCCGCGCCGCCGTTCAACTATTCCGAGGGAGTGATCGGGCTGTTCGGCCTGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCANCTGGCGGACAAAGGCAAGGCCGGNCTGACNACCACCGTCGGCCTGGTGTTNCTGCTGCTGTCCTGGATCCCTATCGCGTTCGCCAAN  D>ED>4'8?1*1*?*AEED>FEA?A1*A???:??A?8A8)8800#;DDDDDDDD?D8D;ECECA?E?C?CC;EDFEEEFFFEDDDDEE?:DDDDDA8)0)0.#####################################?44#EEEEEEFFFFFFFFFFHHFF@?4#HFD?5#HHHEHHHHHHHIHIHHFEA5#IIHHIHIIIHHIIIFFFFFBDDDDDDDD@@???<5#  XT:A:M  NM:i:49 SM:i:29 AM:i:29 XM:i:7  XO:i:0  XG:i:0  MD:Z:44C34G22T0G0G0G0C0G0C0C0G0C0C0G0G0G0G0C0G0C0T0G0G0C0C0G0C0T0T0C0G0C0G0C0G0C0C0G0G3T14T2A5G0T4C20G0T1A26A5

    2. What does the last line mean after running idxstats?

    Serratia 5113802 307778 2900
    * 0 0 155004


    And just for clarification, the first line reads reference sequence name, sequence length, # of mapped reads and # of unmapped reads?

  • #2
    This comes down to how you built the index for BWA. What FASTA file(s) did you use? If you didn't build the index from FASTA sequences that are full chromosome references then you won't get alignments in terms of chromosomes.

    Also that last line of idxstats is probably just the number of unaligned reads. Typically unmapped reads have an '*' in the third column.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Did you read SAM format description?

      Yes, the third column of a sam file has the chromosome name.

      You've done something very wrong, though.

      MD:Z:44C34G22T0G0G0G0C0G0C0C0G0C0C0G0G0G0G0C0G0C0T0G0G0C0C0G0C0T0T0C0G0C0G0C0G0C0C0G0G3T14T2A5G0T4C20G0T1A26A5
      Means that you used the wrong fastq file in the sampe step.

      Comment


      • #4
        Also I recommend mem over the aln/sampe pipeline. It's simpler and it works better.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • #5
          Originally posted by sdriscoll View Post
          This comes down to how you built the index for BWA. What FASTA file(s) did you use? If you didn't build the index from FASTA sequences that are full chromosome references then you won't get alignments in terms of chromosomes.

          Also that last line of idxstats is probably just the number of unaligned reads. Typically unmapped reads have an '*' in the third column.
          I used db11.fasta

          I did build the index.

          And the last part of what you said makes no sense because the first row describes the name, sequences, # of mapped reads, and # of unmapped reads. How does the second row (* 0 0 32694) describe the # of unmapped reads when the first row already lists the # of unmapped reads?

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            Did you read SAM format description?

            Yes, the third column of a sam file has the chromosome name.

            You've done something very wrong, though.



            Means that you used the wrong fastq file in the sampe step.
            Could this have anything to do with the fact that Serratia marcescens is a bacteria with only 1 chromosome?

            Comment


            • #7
              A read can be unmapped, and associated with a chromosome, if it hangs off the edge. You have 2900 such reads. The rest of the unmapped reads didn't map at all, that's the 155004.

              I used bwa and samtools on single chromosome bacterial references all the time. You messed up your sampe command, that's why you have that nonsense MD part. That's the only mistake you appear to have made, everything else looks normal, so I'm not sure what you think the problem is.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X