Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • large BAM, but very small mpileup file

    hi all,
    I applied samtools mpileup to 7 exom-seq samples(human), whose bam biles were generated using BWA. Since I used a for loop to process the samples, the output should be similar. However, for one of the samples, the mpileup file contains only ~1000 lines, with a few lines for each chromosome. Other samples' mpileup files look good with many many more lines.

    I used samtools flagstat to check the BAM file and found that
    very few reads (only 34) were properly paired. I wonder if this is the reason that cause the mpileup problem. More important, does that show something wrong for the library preparation in the sequencing experiment?

    95045628 + 0 in total (QC-passed reads + QC-failed reads)
    31755322 + 0 duplicates
    95045628 + 0 mapped (100.00%:nan%)
    95045628 + 0 paired in sequencing
    47522831 + 0 read1
    47522797 + 0 read2
    34 + 0 properly paired (0.00%:nan%)
    ..

    In contrast, the properly paired reads are many in other samples, e.g.,:
    120529538 + 0 in total (QC-passed reads + QC-failed reads)
    27401894 + 0 duplicates
    120529538 + 0 mapped (100.00%:nan%)
    120529538 + 0 paired in sequencing
    60469618 + 0 read1
    60059920 + 0 read2
    119251912 + 0 properly paired (98.94%:nan%)



    (In all BAM files, I removed unmapped reads, so do not be surprised that mapping rate is 100%.)
    Last edited by mrfox; 10-16-2012, 09:23 PM.

  • #2
    Originally posted by mrfox View Post
    hi all,

    I used samtools flagstat to check the BAM file and found that
    very few reads (only 34) were properly paired. I wonder if this is the reason that cause the mpileup problem.
    Possibly. But even if it is not the cause of the mpileup problem, the lack of pairing is indicative of a more basic problem that needs to be solved first.

    More important, does that show something wrong for the library preparation in the sequencing experiment?
    Likely. You really should dig deeper into the data so that you can tell the lab prep people what went wrong. My gut feeling is that you have just a handful of different fragments that were amplified and are thus suffering from a lack of complexity. But it also could be that many of the fragments were degraded to a point where they map but do not pair. Or perhaps, similar to the first idea, perhaps you just sequenced highly repetitive areas; these can be mapped but pairing would be questionable. Or ... well, dig in and let us know!

    Comment


    • #3
      Thanks for the hints Westerman. I loaded two BAM files to IGV, the upper is for a good sample G, the majority of its reads were properly paired, and the lower is for the bad sample B. The alignments were colored by pairing orientation. The region is a segment of chrM.



      The observation is that 1) the coverages of the two samples are similar, but 2) nearly no reads were properly paired in the bad sample. Actually if I move the mouse to an individual read, mostly likely I found "insert size = 0" and "Pair orientation=R1R2" or F1F2/F2F1, which looks weird.

      So how should we interprete the observation? How come the reads were not paired? Is the problem in library preparation?

      Comment


      • #4
        Originally posted by mrfox View Post

        The observation is that 1) the coverages of the two samples are similar, but 2) nearly no reads were properly paired in the bad sample. Actually if I move the mouse to an individual read, mostly likely I found "insert size = 0" and "Pair orientation=R1R2" or F1F2/F2F1, which looks weird.

        So how should we interprete the observation? How come the reads were not paired? Is the problem in library preparation?
        Is it possible that when making the .bam, you accidentally used read 1 twice, instead of read 1 and read 2? That would explain the insert sizes of 0, and both reads in the same direction.

        Comment


        • #5
          I'll agree with swbarnes -- probably your analysis was wrong. Alternatively the two files are the same; e.g R1 was copied to R2 or vice-versa. Other possibility is that you have an R1 from one sample and an R2 from another.

          Comment


          • #6
            I also realized this problem: I went back to check the bam files created half a year ago and found that indeed R2 was replaced by R1 by mistake. --I should have checked everything from the very beginning.Now the problem was solved. Thank you all for your help!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X