Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bloosnail
    Member
    • Jul 2015
    • 17

    Comparing read lengths of duplicates

    Hi All,

    I can normally find answers to questions via searching but I couldn't this time so I made an account. I am wondering how to compare samples with different read lengths -- we have 5 75bp samples and 15 50bp samples with one duplicate b/w platforms of paired human reads from ATAC-seq. We've compared alignment statistics and peaks between the duplicates and have done biclustering between all samples, and while there are differences and the biclustering looks good we are unsure of how to interpret the results exactly.

    It seems to be undisputed that longer read lengths should give better result when looking for differential regions. I am feeling like this type of analysis should have been done before, but I could not find any empirical evidence when looking through these forums/Googling, to confirm this conclusion and find a hard quantitative difference. Any help is appreciated.
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    I think the easiest way to ensure consistency is to trim the 75bp reads to 50bp...

    Comment

    • bloosnail
      Member
      • Jul 2015
      • 17

      #3
      Brian, are you saying to trim the 75bp reads down to 50bp and then compare these trimmed reads to the ones that were originally 50bp?

      To clarify, the goal at this step is not to use both platforms in conjunction, but mainly to see quantitative differences between the two. It seems a good way to do this is to use the duplicate 50bp and 75bp samples in some way, but we are unsure of a good way to go about this exactly. In the future, we plan to use more 75bp reads over 50bp reads, but we want to be certain this is reasonable action.

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Ah, I see - you're trying to decide whether moving to 75bp gives you a better result. That's very hard to do unless you know the correct answer. If you did know the correct answer, I would generate results with 75bp reads, generate results with 75bp reads trimmed to 50bp (which should induce less noise than using real 50bp reads from another run, assuming the only difference is read length), and see which is closer to the truth. But if you don't know the correct answer... I generally recommend generating synthetic data for which you do know the correct answer.

        Otherwise, you can only assume that wherever 50bp results differ from 75bp results, the 75bp results are correct and 50bp are wrong. That's probably a safe assumption, if everything is done correctly (bear in mind that adapters and low-quality tails can make longer reads yield inferior alignments), but not provable. However, when doing that, noise could overwhelm the actual signal, which is why I suggest using the same reads and just trimming to different lengths.

        Comment

        • bloosnail
          Member
          • Jul 2015
          • 17

          #5
          Thank you for your informative response. What you said is making a lot of sense, and it seems to be a good choice to proceed with the original plan of just using the longer read platform.

          I am wondering though, are 75 bp reads trimmed down to 50 bp reads more accurate than those originally produced as 50 bp (more of a sequencing process question)? Or does base quality degrade at a similar rate regardless of length of read?

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            In general, the rate of quality decay depends only on read position, not on read length. But there are a couple exceptions:

            For single-ended reads, there should be no difference except in the last base; the last base tends to be substantially lower-quality then the second-to-last, so the 50th base of a 75bp read would be more accurate than the 50th base of a 50bp read, but the other bases should be the same.

            There might by a slight reduction in quality for read 2 with 2x75bp reads compared to 2x50bp reads, because the cluster has been on the machine longer when read 2 begins.

            But those differences should be extremely minor on a modern Illumina machine.

            Comment

            • bloosnail
              Member
              • Jul 2015
              • 17

              #7
              Great, thanks a lot! Very helpful.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Pathogen Surveillance with Advanced Genomic Tools
                by seqadmin




                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                03-24-2025, 11:48 AM
              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-20-2025, 05:03 AM
              0 responses
              49 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              57 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              50 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              201 views
              0 reactions
              Last Post seqadmin  
              Working...