Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mysteries of the Bioscope pairing pipeline

    Hello to all,

    I'm currently trying to extract some reasonable data from the Bioscope pairing tool.

    We have paired-end reads from a library size-selected for 200 bp.

    We have not set a value for the parameters
    insert.start and insert.end in the pairing.ini file.

    The description in the Bioscope manual says: "The minimum(maximum) insert size to define a good mate. If a value is not set, the tool tries to measure the best value"

    My question is: were can I see this value afterwards. It would be quite interesting, what measure defines a good mate/pair.

    Can somebody give a hint?

    Many thanks

  • #2
    The BAM file should have the insert ranges. But for a quick overview my understanding is that the lower and upper ranges in the pairing.dat.freq file (not the full file) gives the insert range. On the other hand, the recent LifeTech 'pairing_stats_n_clean_bam' which is supposedly generating 'official' statistics gives a different (and smaller) range than pairing.dat.freq file. Since the new program is looking through the BAM file (and taking forever to do so!) I'd trust it more.

    Comment


    • #3
      Thank you westerman. Yes that was exactly my confusion. The pairing.stats gives:
      Insert range 62-207 in the header, while the pairing.dat.freq file gives values from 35-207. If I want to take the 'official' numbers for AAA pairs from the pairing.stats which range do you think is used?

      What do you mean with 'new program'

      best regards

      Julia

      Comment


      • #4
        Ah, you have a 'pairing.stats' file. This indicates that you ran your analysis with bioscope version 1.2 -- the version before LifeTech took away the stats file. In v.1.3. they did away with the stats file but, within the last couple of weeks, they issued a program called 'pairing_stats_n_clean_bam' which restores the stats file as well as cleans up the mapped reads BAM file (which, erroneously, has unmapped reads in it.) You should not run the 'pairing_stats_n_clean_bam' program on v.1.2 and earlier files.

        In your case just take the range from 'pairing.stats'.

        Comment


        • #5
          Ah, most interesting! This might also explain another observation I made:

          I used Picard to remove duplicate reads in the BAM file. Picard reported a number of 'records' that did not match any of the numbers reported in the pairing.stats. I was scratching my head about this, too. Maybe it's best to switch to v. 1.3. - too much muddle here.

          THX J

          Comment


          • #6
            Hey, it's me again.

            In the meantime things cleared up. The faulty BAM files were introduced by v1.3. So it's better to stick with v1.2 right now. Right?

            I take the size range from the pairing.stats! BTW- it matches the values in the pairing.dat.freq, it did not before because I took the wrong file from a another library:-( Sorry for the confusion.

            The problem with the 'records' count acc. to Picard still remains. But I will try to figure this out next week.

            I go for weekend now.

            THX J

            Comment


            • #7
              Originally posted by jbeck View Post
              ... The faulty BAM files were introduced by v1.3. So it's better to stick with v1.2 right now. Right?
              You can use v.1.3 but should run the 'pairing_stats_n_clean_bam' program if you want a clean BAM file. Be aware that said program takes a long time to run. I hesitate to tell someone to not use the latest and greatest version since there should be bug fixes and speed-ups between v1.2 and v1.3.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X