Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • large BAM, but very small mpileup file

    hi all,
    I applied samtools mpileup to 7 exom-seq samples(human), whose bam biles were generated using BWA. Since I used a for loop to process the samples, the output should be similar. However, for one of the samples, the mpileup file contains only ~1000 lines, with a few lines for each chromosome. Other samples' mpileup files look good with many many more lines.

    I used samtools flagstat to check the BAM file and found that
    very few reads (only 34) were properly paired. I wonder if this is the reason that cause the mpileup problem. More important, does that show something wrong for the library preparation in the sequencing experiment?

    95045628 + 0 in total (QC-passed reads + QC-failed reads)
    31755322 + 0 duplicates
    95045628 + 0 mapped (100.00%:nan%)
    95045628 + 0 paired in sequencing
    47522831 + 0 read1
    47522797 + 0 read2
    34 + 0 properly paired (0.00%:nan%)
    ..

    In contrast, the properly paired reads are many in other samples, e.g.,:
    120529538 + 0 in total (QC-passed reads + QC-failed reads)
    27401894 + 0 duplicates
    120529538 + 0 mapped (100.00%:nan%)
    120529538 + 0 paired in sequencing
    60469618 + 0 read1
    60059920 + 0 read2
    119251912 + 0 properly paired (98.94%:nan%)



    (In all BAM files, I removed unmapped reads, so do not be surprised that mapping rate is 100%.)
    Last edited by mrfox; 10-16-2012, 09:23 PM.

  • #2
    Originally posted by mrfox View Post
    hi all,

    I used samtools flagstat to check the BAM file and found that
    very few reads (only 34) were properly paired. I wonder if this is the reason that cause the mpileup problem.
    Possibly. But even if it is not the cause of the mpileup problem, the lack of pairing is indicative of a more basic problem that needs to be solved first.

    More important, does that show something wrong for the library preparation in the sequencing experiment?
    Likely. You really should dig deeper into the data so that you can tell the lab prep people what went wrong. My gut feeling is that you have just a handful of different fragments that were amplified and are thus suffering from a lack of complexity. But it also could be that many of the fragments were degraded to a point where they map but do not pair. Or perhaps, similar to the first idea, perhaps you just sequenced highly repetitive areas; these can be mapped but pairing would be questionable. Or ... well, dig in and let us know!

    Comment


    • #3
      Thanks for the hints Westerman. I loaded two BAM files to IGV, the upper is for a good sample G, the majority of its reads were properly paired, and the lower is for the bad sample B. The alignments were colored by pairing orientation. The region is a segment of chrM.



      The observation is that 1) the coverages of the two samples are similar, but 2) nearly no reads were properly paired in the bad sample. Actually if I move the mouse to an individual read, mostly likely I found "insert size = 0" and "Pair orientation=R1R2" or F1F2/F2F1, which looks weird.

      So how should we interprete the observation? How come the reads were not paired? Is the problem in library preparation?

      Comment


      • #4
        Originally posted by mrfox View Post

        The observation is that 1) the coverages of the two samples are similar, but 2) nearly no reads were properly paired in the bad sample. Actually if I move the mouse to an individual read, mostly likely I found "insert size = 0" and "Pair orientation=R1R2" or F1F2/F2F1, which looks weird.

        So how should we interprete the observation? How come the reads were not paired? Is the problem in library preparation?
        Is it possible that when making the .bam, you accidentally used read 1 twice, instead of read 1 and read 2? That would explain the insert sizes of 0, and both reads in the same direction.

        Comment


        • #5
          I'll agree with swbarnes -- probably your analysis was wrong. Alternatively the two files are the same; e.g R1 was copied to R2 or vice-versa. Other possibility is that you have an R1 from one sample and an R2 from another.

          Comment


          • #6
            I also realized this problem: I went back to check the bam files created half a year ago and found that indeed R2 was replaced by R1 by mistake. --I should have checked everything from the very beginning.Now the problem was solved. Thank you all for your help!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Innovations in Spatial Biology
              by seqadmin


              Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

              3D Genomics
              While spatial biology often involves studying proteins and RNAs in their...
              Yesterday, 07:30 PM
            • seqadmin
              Advancing Precision Medicine for Rare Diseases in Children
              by seqadmin




              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
              12-16-2024, 07:57 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-30-2024, 01:35 PM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-17-2024, 10:28 AM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-13-2024, 08:24 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-12-2024, 07:41 AM
            0 responses
            40 views
            0 likes
            Last Post seqadmin  
            Working...
            X