Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low pairing rate in SOLiD 4 pair-end transcriptome sequencing

    Dear all,

    We have conducted a transciptome sequencing projects on Zebrafish using SOLiD 4 PE protocal (50X35). After analysis using Bioscope 1.3.1 WTA pipeline, we found that among 80% mapped reads there are only 31% pairs located in same chromosomes, and more than 45% paired ends located in different chromosomes. Is this rate normal, or something was going wrong in our expriment and analysis?

    Best wishes
    Chen
    Last edited by amurocw; 03-14-2011, 06:25 PM.

  • #2
    Perhaps you should confirm your alignment results with BFAST or NovoalignCS. Both tools can write out SAM/BAM format from which you could generate some useful statistics on proper pairs, etc.
    The scenario you are describing sounds like a sample with a huge number of structural variations.

    Comment


    • #3
      Originally posted by zee View Post
      Perhaps you should confirm your alignment results with BFAST or NovoalignCS. Both tools can write out SAM/BAM format from which you could generate some useful statistics on proper pairs, etc.
      The scenario you are describing sounds like a sample with a huge number of structural variations.
      Thanks for your suggestion. I will try Bfast again.

      However, when using Tophat, nearly the same pairing rates were got. Although Zebrafish genome is preliminary assembled, the low pairing rate still can not be explained.

      Zee, do you have any SOLiD 4 PE data? What is the pairing rate looks like?

      Comment


      • #4
        Stats from a recent partial SOLiD 4 PE run mapped to Arabidopsis. 1st read is F3, 2nd read is F5. As usual the SOLiD gives all reads -- high quality or not -- and relies on the mapping to discard poorer reads.
        Note that "proper paired" reads (e.g., same chromosome, within a good insert distances) is 39% of the total reads and 66% of the total number of reads.


        81398098 the read is paired in sequencing
        40699049 Total first read (50.00% total read)
        40699049 Total second read (50.00% total read)
        33107767 the query sequence itself is unmapped (40.67% total read)
        10726487 Unmapped first read (26.36% total first read)
        22381280 Unmapped second read (54.99% total second read)
        48290331 Total mapped reads (59.33% total read)
        29972562 mapped first read (62.07% total mapped, 73.64% total first read)
        18317769 mapped second read (37.93% total mapped, 45.01% total second read)
        35022988 both reads mapped (72.53% total mapped, 43.03% total read)
        31774598 the read is mapped in a proper pair (65.80% total mapped, 39.04% total reads)
        33107767 singletons (mates unmapped) (40.67%)
        22613871 strand of the query is reverse
        22614187 strand of the mate is reverse
        0 the alignment is not primary
        0 the read fails platform/vendor quality checks
        274708 the read is either a PCR or an optical duplicate

        Comment


        • #5
          Originally posted by westerman View Post
          Stats from a recent partial SOLiD 4 PE run mapped to Arabidopsis. 1st read is F3, 2nd read is F5. As usual the SOLiD gives all reads -- high quality or not -- and relies on the mapping to discard poorer reads.
          Note that "proper paired" reads (e.g., same chromosome, within a good insert distances) is 39% of the total reads and 66% of the total number of reads.


          81398098 the read is paired in sequencing
          40699049 Total first read (50.00% total read)
          40699049 Total second read (50.00% total read)
          33107767 the query sequence itself is unmapped (40.67% total read)
          10726487 Unmapped first read (26.36% total first read)
          22381280 Unmapped second read (54.99% total second read)
          48290331 Total mapped reads (59.33% total read)
          29972562 mapped first read (62.07% total mapped, 73.64% total first read)
          18317769 mapped second read (37.93% total mapped, 45.01% total second read)
          35022988 both reads mapped (72.53% total mapped, 43.03% total read)
          31774598 the read is mapped in a proper pair (65.80% total mapped, 39.04% total reads)
          33107767 singletons (mates unmapped) (40.67%)
          22613871 strand of the query is reverse
          22614187 strand of the mate is reverse
          0 the alignment is not primary
          0 the read fails platform/vendor quality checks
          274708 the read is either a PCR or an optical duplicate

          Hi Westerman,
          Are you results from transcriptomics or resequencing studies? which mapping tool did you use for the analysis?

          Thanks in advance.
          Best regards,

          S.

          Comment


          • #6
            Originally posted by Sheila View Post
            Hi Westerman,
            Are you results from transcriptomics or resequencing studies? which mapping tool did you use for the analysis?

            Thanks in advance.
            Best regards,

            S.
            Those statistics was a resequencing run. So perhaps it is not as applicable to a transcriptome study. The tool is LifeTech's Bioscope software.

            Here are some statistics from a recent transcriptome run to Maize. This was a partial run thus the "small" number of reads. I'm still always amazed by the numbers we get from NGS machines compared to the Sanger methods of 5 years ago.

            You can see that the mapping went well enough at ~50%. As is normal with SOLiD these are from the raw reads without consideration of quality, thus that percentage is more-or-less expected ... although we have seen better. The amount of RNA we had was very low and we are suspicious that this may be contributing to this.


            11742104 the read is paired in sequencing
            5871052 Total first read (50.00% total read)
            5871052 Total second read (50.00% total read)

            4766137 the query sequence itself is unmapped (40.59% total read)
            2530271 Unmapped first read (43.10% total first read)
            2235866 Unmapped second read (38.08% total second read)

            6975967 Total mapped reads (59.41% total read)
            3340781 mapped first read (47.89% total mapped, 56.90% total first read)
            3635186 mapped second read (52.11% total mapped, 61.92% total second read)
            5301952 both reads mapped (76.00% total mapped, 45.15% total read)
            2385930 the read is mapped in a proper pair (34.20% total mapped, 20.32% total read
            s)

            4766137 singletons (mates unmapped) (40.59%)
            2640283 strand of the query is reverse
            2640358 strand of the mate is reverse

            0 the alignment is not primary
            0 the read fails platform/vendor quality checks
            0 the read is either a PCR or an optical duplicate

            402 Mean Insert Size
            50 - 14999 Insert Size Range

            Comment


            • #7
              Originally posted by westerman View Post
              Those statistics was a resequencing run. So perhaps it is not as applicable to a transcriptome study. The tool is LifeTech's Bioscope software.

              Here are some statistics from a recent transcriptome run to Maize. This was a partial run thus the "small" number of reads. I'm still always amazed by the numbers we get from NGS machines compared to the Sanger methods of 5 years ago.

              You can see that the mapping went well enough at ~50%. As is normal with SOLiD these are from the raw reads without consideration of quality, thus that percentage is more-or-less expected ... although we have seen better. The amount of RNA we had was very low and we are suspicious that this may be contributing to this.


              11742104 the read is paired in sequencing
              5871052 Total first read (50.00% total read)
              5871052 Total second read (50.00% total read)

              4766137 the query sequence itself is unmapped (40.59% total read)
              2530271 Unmapped first read (43.10% total first read)
              2235866 Unmapped second read (38.08% total second read)

              6975967 Total mapped reads (59.41% total read)
              3340781 mapped first read (47.89% total mapped, 56.90% total first read)
              3635186 mapped second read (52.11% total mapped, 61.92% total second read)
              5301952 both reads mapped (76.00% total mapped, 45.15% total read)
              2385930 the read is mapped in a proper pair (34.20% total mapped, 20.32% total read
              s)

              4766137 singletons (mates unmapped) (40.59%)
              2640283 strand of the query is reverse
              2640358 strand of the mate is reverse

              0 the alignment is not primary
              0 the read fails platform/vendor quality checks
              0 the read is either a PCR or an optical duplicate

              402 Mean Insert Size
              50 - 14999 Insert Size Range

              Hi Westerman,
              Thanks very much for the info!
              One last thing, what was the read length in this experiment? did you trim any of your reads before mapping?

              Our statistics for resequencing studies are very similar to yours but still not so similar to your transcriptomics results in maize in terms of the number of properly paired reads.

              Regards,

              S.

              Comment


              • #8
                Originally posted by Sheila View Post
                Hi Westerman,
                Thanks very much for the info!
                One last thing, what was the read length in this experiment? did you trim any of your reads before mapping?
                F3 is 50 bases, F5 is 35 bases. No trimming.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X