Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problems with Bioscope output

    Hello everyone,
    I mapped my Solid data with Bioscope. In the csfasta files are around 142 million reads. This value is shown in the pairing statistics file as well. But in the bam file with the mapped and unmapped reads are together just 119 million reads. I checked that with the help of samtools. What happened to the other reads? Does Bioscope filter any reads? I couldn't find anything about that.

    Furthermore the pairing statistics show more mapped pairs than I could find in the bam file. Could anybody explain that, please?

    I would be very happy about some help or comments. Thanks

  • #2
    you do mean that you counted the mapped and unmapped files? and the numbers do not add up?
    which version of Bioscope are you using? can you post the ini file?
    http://kevin-gattaca.blogspot.com/

    Comment


    • #3
      Hello Robby,

      You mentioned that the input csfasta file contains roughly 142 million reads. As you know not all reads are mappable so you should expect some losses in the bam file. To be very specific, bam file will only have an entry for an unmapped read IF the sister read (the other half of the pair) is mapped. If both reads from a pair are not mappable, then the pair will not show up in the bam file. This should explain the discrepancy you are observing.

      Regarding the differences between pairing statistics and bam file, could you post the stats from both approaches? It is difficult to troubleshoot without something more concrete.

      Thanks

      Comment


      • #4
        Thanks for your help and for parts of the solution. As hanktu mentioned all pairs of which both reads are mappable appear in the bam file with the mapped reads as two reads. If both reads from a pair are not mappable, then the 2 reads appear in the bam file with the unmapped reads. But if one read in a pair is mappable and the other not, then one entry appears in the bam file with the mappable reads and no entry in the file with the unmappable reads. So the unmapped read of the pair doesn't appear anywhere. The problem was, that Bioscope considers pairs and I counted reads.

        But nevertheless 80-160 reads (or 40-80 pairs) are missing. In the Bioscope statistics they are counted as mapped. Do you have any ideas for that as well?

        Comment


        • #5
          What version of BioScope?

          Hi Robby,

          What version of BioScope are you running? Can you copy and paste some lines from the pairing stats file as well as the numbers you get from the bam file?

          Thanks

          Comment


          • #6
            Could this be caused by redundantly mapped reads? If you're creating the bam file from 'primary' as the config setting (I think this is the default), then there will be reads that map to many locations that are 'mappable' but will not appear in your bam files unless they can be rescued by a primary/uniquely mapping mate.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 08:47 AM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            57 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X