Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mysteries of the Bioscope pairing pipeline

    Hello to all,

    I'm currently trying to extract some reasonable data from the Bioscope pairing tool.

    We have paired-end reads from a library size-selected for 200 bp.

    We have not set a value for the parameters
    insert.start and insert.end in the pairing.ini file.

    The description in the Bioscope manual says: "The minimum(maximum) insert size to define a good mate. If a value is not set, the tool tries to measure the best value"

    My question is: were can I see this value afterwards. It would be quite interesting, what measure defines a good mate/pair.

    Can somebody give a hint?

    Many thanks

  • #2
    The BAM file should have the insert ranges. But for a quick overview my understanding is that the lower and upper ranges in the pairing.dat.freq file (not the full file) gives the insert range. On the other hand, the recent LifeTech 'pairing_stats_n_clean_bam' which is supposedly generating 'official' statistics gives a different (and smaller) range than pairing.dat.freq file. Since the new program is looking through the BAM file (and taking forever to do so!) I'd trust it more.

    Comment


    • #3
      Thank you westerman. Yes that was exactly my confusion. The pairing.stats gives:
      Insert range 62-207 in the header, while the pairing.dat.freq file gives values from 35-207. If I want to take the 'official' numbers for AAA pairs from the pairing.stats which range do you think is used?

      What do you mean with 'new program'

      best regards

      Julia

      Comment


      • #4
        Ah, you have a 'pairing.stats' file. This indicates that you ran your analysis with bioscope version 1.2 -- the version before LifeTech took away the stats file. In v.1.3. they did away with the stats file but, within the last couple of weeks, they issued a program called 'pairing_stats_n_clean_bam' which restores the stats file as well as cleans up the mapped reads BAM file (which, erroneously, has unmapped reads in it.) You should not run the 'pairing_stats_n_clean_bam' program on v.1.2 and earlier files.

        In your case just take the range from 'pairing.stats'.

        Comment


        • #5
          Ah, most interesting! This might also explain another observation I made:

          I used Picard to remove duplicate reads in the BAM file. Picard reported a number of 'records' that did not match any of the numbers reported in the pairing.stats. I was scratching my head about this, too. Maybe it's best to switch to v. 1.3. - too much muddle here.

          THX J

          Comment


          • #6
            Hey, it's me again.

            In the meantime things cleared up. The faulty BAM files were introduced by v1.3. So it's better to stick with v1.2 right now. Right?

            I take the size range from the pairing.stats! BTW- it matches the values in the pairing.dat.freq, it did not before because I took the wrong file from a another library:-( Sorry for the confusion.

            The problem with the 'records' count acc. to Picard still remains. But I will try to figure this out next week.

            I go for weekend now.

            THX J

            Comment


            • #7
              Originally posted by jbeck View Post
              ... The faulty BAM files were introduced by v1.3. So it's better to stick with v1.2 right now. Right?
              You can use v.1.3 but should run the 'pairing_stats_n_clean_bam' program if you want a clean BAM file. Be aware that said program takes a long time to run. I hesitate to tell someone to not use the latest and greatest version since there should be bug fixes and speed-ups between v1.2 and v1.3.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:57 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-02-2024, 08:06 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-30-2024, 12:17 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Working...
              X