Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Overlap of paired-end reads

    Hi,
    we are starting to analyze a PE 100 run of a resequencing project. Unfortunately the library is too short and the majority of paired-end reads overlap. In this study we are interested in SNPs, INDELs and big rearrangements.
    What could be the best option? Cut the 100 bp fragments and leave them in 75 bp (for example) or using the 100 bp overlapping reads? Do the advantages of the PE get lost if the reads are overlapped?
    Thanks in advance for your help.

  • #2
    Originally posted by suludana View Post
    Hi,
    we are starting to analyze a PE 100 run of a resequencing project. Unfortunately the library is too short and the majority of paired-end reads overlap. In this study we are interested in SNPs, INDELs and big rearrangements.
    What could be the best option? Cut the 100 bp fragments and leave them in 75 bp (for example) or using the 100 bp overlapping reads? Do the advantages of the PE get lost if the reads are overlapped?
    They should work fine as is, since you're using them for alignment.

    If it was for denovo, i'd suggest to merge them into longer single ended reads (at least the ones which have a single strong overlap), but i'm not sure there's any advantage to this for alignment.

    You should definitely check for adapter though, there is a fine line between a 'clean' overlapped read, and a read which is going into the 'opposite' adapter at the end.

    Comment


    • #3
      for rearrangements it better to have a gap in the pe sequences. bigger the gap the bigger the sequence product with both ends sequenced. If you were looking for bigger rearrangements mate pair would be a better bet.

      Comment


      • #4
        Definitely don't trim them. If you're looking for SNPs, any that you find that lie within the overlapping region will have been sequenced twice for that fragment, improving your accuracy.

        Comment


        • #5
          No problem with overalp

          I will also second (rather third) the approach that no need to trip rather the overalp will increase chances of accuracy

          Comment


          • #6
            Originally posted by niceday View Post
            for rearrangements it better to have a gap in the pe sequences. bigger the gap the bigger the sequence product with both ends sequenced. If you were looking for bigger rearrangements mate pair would be a better bet.
            All true. However the the gap needs to be in the actual library and not artificially created in the data set via throwing away part of the reads. Which is what the original poster is suggesting.

            I suppose that there might be some software which will work better with shorten reads but I would be concerned about getting false positives due to potentially poorer mapping with the shorter reads.

            Comment


            • #7
              Originally posted by niceday View Post
              for rearrangements it better to have a gap in the pe sequences. bigger the gap the bigger the sequence product with both ends sequenced. If you were looking for bigger rearrangements mate pair would be a better bet.
              Agreed, longer pairs would help (though mate pairs are a pain on several levels).

              But given that the library is already sequenced, i think the best results with the data will be using it as is.

              Comment


              • #8
                Thanks for all your comments.
                If I were to look only SNPs, I have no doubt: I will use the overlapping reads of 100 bp. But I am interested also in big INDELs and rearrangements. In this case a real PE (with a gap between reads) would be better, right?

                Comment


                • #9
                  Originally posted by suludana View Post
                  But I am interested also in big INDELs and rearrangements. In this case a real PE (with a gap between reads) would be better, right?
                  Yes, you'll need to resequence with a longer paired-ends or even mate pairs (if you have to find nasty rearrangements bordered by long repeats).

                  Comment


                  • #10
                    Thanks for your comments, but my question is: Do I lose the advantages of the PE if the reads overlap?

                    Comment


                    • #11
                      What is the advantage of PE---
                      It will depend on whether your questions are answered or not. Standard PE benefits will be there, However can you accurately study rearrangment?

                      Comment


                      • #12
                        Originally posted by suludana View Post
                        Thanks for your comments, but my question is: Do I lose the advantages of the PE if the reads overlap?
                        It's not so much a question of overlap as a question of paired distance.

                        Longer PE distance increases the likelihood of a given read pair spanning a given re-arrangement. More spanning read pairs 'disagreeing' when aligned to the reference within a specific area increases the confidence that something 'interesting' is going on there.

                        In the simple case, you'll see a pile of 'unhappy' pairs (which should span the region of the re-arrangement), and a lack of alignment / low agreement with consensus at the borders of the re-arrangement. Without PE, you'd just get the latter. And if it's a repeat rich region, you might not even get that - hence the importance of having long paired data as an indicator in such regions.

                        Comment


                        • #13
                          Identification of indels by many variant callers requires that the novel junction lie in the unsequenced region, so that the end reads can be accurately mapped to the reference. All indel-containing reads that overlap will not meet this criterion. An anchored split-read mapper, such as Pindel, will be required for the overlapping reads. Alternatively, you can create an artificial 'gap' by aligning some portion (say, the first 50bp) of each end, then screening the data for ends that map aberrantly (too far apart, or to different chromosomes). As a bonus, the novel junction will be present in the gap, so you can identify it at base-pair resolution.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 08:47 AM
                          0 responses
                          12 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          59 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          54 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X