Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • matching unmapped paired SOLiD reads

    I know that the read order of xxx_F3.csfasta and xxx_F3_QV.qual are the same, but I can't find any information that describes how the read order of xxx_F3.csfasta and xxx_R3.csfasta are related. I'd appreciate any pointers to documentation that describe the order relationship.

  • #2
    Originally posted by smarkel View Post
    I know that the read order of xxx_F3.csfasta and xxx_F3_QV.qual are the same, but I can't find any information that describes how the read order of xxx_F3.csfasta and xxx_R3.csfasta are related. I'd appreciate any pointers to documentation that describe the order relationship.
    In practice, they will be sorted based on read name, with the numbers indicating panel, x-position, y-position. If a read has no mate, then it will only be present in one of the files. See solid2fastq programs/scripts like the ones in BFAST or MAQ that use the above properties.

    Comment


    • #3
      The files I'm using are from AB's site (http://solidsoftwaretools.com/gf/project/ecoli2x50/) and aren't sorted. I looked at Rosalind_20080729_2_Chris5_F3.csfasta.zip and Rosalind_20080729_2_Chris5_R3.csfasta.zip. My understanding is that the sorting happens when the reads are mapped. Maybe the posted files aren't representative.

      Comment


      • #4
        Originally posted by smarkel View Post
        The files I'm using are from AB's site (http://solidsoftwaretools.com/gf/project/ecoli2x50/) and aren't sorted. I looked at Rosalind_20080729_2_Chris5_F3.csfasta.zip and Rosalind_20080729_2_Chris5_R3.csfasta.zip. My understanding is that the sorting happens when the reads are mapped. Maybe the posted files aren't representative.
        I wasn't too descriptive in my explanation.

        Take a look at these reads from real data:
        Code:
        >1_6_55_F3
        T01100000000201002010120012300200011000100.01101131
        >1_6_64_F3
        T01203010110102003000000101100111000010100.01100131
        >1_6_69_F3
        T01031320200103032011110221111112110020111.11111131
        >1_6_97_F3
        I claim they are sorted based on read name. They have the form:
        >%d_%d_%d_F3
        where %d stands for some integer. It is sorted by the right-most integer, then middle integer, then left-most integer. The equivalent read (the mate) in the R3 file will be
        >%d_%d_%d_R3

        The "Rosalind" file follows the same pattern:
        Code:
        >469_26_42_F3
        T12113310031232112221003120021221223320222122212122
        >469_26_379_F3
        T31202223003310000130302323312223212011000010033200
        >469_26_540_F3
        T11012313031030123033113130100223110001231232303210
        >469_26_560_F3

        Comment


        • #5
          Thank you. Yes, you're right. Both files are individually sorted. I should have worded my original question differently. I expected xxx_F3.csfasta and xxx_R3.csfasta to have the same number of entries, with the nth entry in the F3 file being the mate of the nth entry in the R3 file. What does it mean for a paired read's mate to not exist?

          Comment


          • #6
            Originally posted by smarkel View Post
            Thank you. Yes, you're right. Both files are individually sorted. I should have worded my original question differently. I expected xxx_F3.csfasta and xxx_R3.csfasta to have the same number of entries, with the nth entry in the F3 file being the mate of the nth entry in the R3 file. What does it mean for a paired read's mate to not exist?
            It means that given some quality threshold, a proper the mate at that location could not be identified. I use "unpaired' reads without any problem. After >10 slides those extra reads really add up!

            Comment


            • #7
              Thank you for the explanation.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:35 AM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Working...
              X