Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • matching unmapped paired SOLiD reads

    I know that the read order of xxx_F3.csfasta and xxx_F3_QV.qual are the same, but I can't find any information that describes how the read order of xxx_F3.csfasta and xxx_R3.csfasta are related. I'd appreciate any pointers to documentation that describe the order relationship.

  • #2
    Originally posted by smarkel View Post
    I know that the read order of xxx_F3.csfasta and xxx_F3_QV.qual are the same, but I can't find any information that describes how the read order of xxx_F3.csfasta and xxx_R3.csfasta are related. I'd appreciate any pointers to documentation that describe the order relationship.
    In practice, they will be sorted based on read name, with the numbers indicating panel, x-position, y-position. If a read has no mate, then it will only be present in one of the files. See solid2fastq programs/scripts like the ones in BFAST or MAQ that use the above properties.

    Comment


    • #3
      The files I'm using are from AB's site (http://solidsoftwaretools.com/gf/project/ecoli2x50/) and aren't sorted. I looked at Rosalind_20080729_2_Chris5_F3.csfasta.zip and Rosalind_20080729_2_Chris5_R3.csfasta.zip. My understanding is that the sorting happens when the reads are mapped. Maybe the posted files aren't representative.

      Comment


      • #4
        Originally posted by smarkel View Post
        The files I'm using are from AB's site (http://solidsoftwaretools.com/gf/project/ecoli2x50/) and aren't sorted. I looked at Rosalind_20080729_2_Chris5_F3.csfasta.zip and Rosalind_20080729_2_Chris5_R3.csfasta.zip. My understanding is that the sorting happens when the reads are mapped. Maybe the posted files aren't representative.
        I wasn't too descriptive in my explanation.

        Take a look at these reads from real data:
        Code:
        >1_6_55_F3
        T01100000000201002010120012300200011000100.01101131
        >1_6_64_F3
        T01203010110102003000000101100111000010100.01100131
        >1_6_69_F3
        T01031320200103032011110221111112110020111.11111131
        >1_6_97_F3
        I claim they are sorted based on read name. They have the form:
        >%d_%d_%d_F3
        where %d stands for some integer. It is sorted by the right-most integer, then middle integer, then left-most integer. The equivalent read (the mate) in the R3 file will be
        >%d_%d_%d_R3

        The "Rosalind" file follows the same pattern:
        Code:
        >469_26_42_F3
        T12113310031232112221003120021221223320222122212122
        >469_26_379_F3
        T31202223003310000130302323312223212011000010033200
        >469_26_540_F3
        T11012313031030123033113130100223110001231232303210
        >469_26_560_F3

        Comment


        • #5
          Thank you. Yes, you're right. Both files are individually sorted. I should have worded my original question differently. I expected xxx_F3.csfasta and xxx_R3.csfasta to have the same number of entries, with the nth entry in the F3 file being the mate of the nth entry in the R3 file. What does it mean for a paired read's mate to not exist?

          Comment


          • #6
            Originally posted by smarkel View Post
            Thank you. Yes, you're right. Both files are individually sorted. I should have worded my original question differently. I expected xxx_F3.csfasta and xxx_R3.csfasta to have the same number of entries, with the nth entry in the F3 file being the mate of the nth entry in the R3 file. What does it mean for a paired read's mate to not exist?
            It means that given some quality threshold, a proper the mate at that location could not be identified. I use "unpaired' reads without any problem. After >10 slides those extra reads really add up!

            Comment


            • #7
              Thank you for the explanation.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X