Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the optimal read length to quantify splice variants: 50, 76 or 100 bp?

    Does anyone know what the best read length is to quantify splice variants from RNA seq data using an Illumina HiSeq. The reference genome has been sequenced so assembly is not too much of a problem.
    On one hand the longest possible read length will increase identification of splice variants. However, with a shorter read length, more fragments can be sequenced (for a similar price), which increases quantification.
    Is there such a thing as an optimal read length in this case?

    Thanks for any input!

  • #2
    The longer the read, the more likely it is to span a splice junction. Also, the marginal cost-per-base of longer reads is less (i.e., 2X reads @ 50bp is more expensive than 1X @ 100bp), and you'd have to sequence more than twice the number of shorter reads to obtain the same number of mappable junctions. So, the longer reads are cheaper as well.

    Comment


    • #3
      [QUOTE=Joke van Vugt;52959
      Is there such a thing as an optimal read length in this case?
      [/QUOTE]

      Probably. I do not have any data but rather just a rough guess.

      An equally spliced read would have 1/2 of the bases on one side of the splice and 1/2 of the bases on the other. Both of these segments need to be mapped to the reference. Thus a 50-base read would be mapping two 25-base partial-reads. Offset where the splicing occurs and you could be trying to map even fewer bases. And then there are sequencing errors and/or SNVs versus your reference. While some rescuing might occur (e.g., we know that both segments must map on the same chromosome, both should be within reasonable distance of each other, depth of coverage can take care of a lot of mismatches, splicing characteristics can be taken into account, etc.) I am simply not fond of mapping 25-mers much less 20-mers. So for me a 50-bp read is not good for splicing variants.

      76-base reads would have 33-bases on either side. Even considering offsets and sequencing errors, the worst partial-read being mapped is around 28-mers. That is much more comfortable.

      100-bases is, of course, even better but if you are concerned about cost then don't use them.

      Comment


      • #4
        Originally posted by HESmith View Post
        The longer the read, the more likely it is to span a splice junction. Also, the marginal cost-per-base of longer reads is less (i.e., 2X reads @ 50bp is more expensive than 1X @ 100bp), and you'd have to sequence more than twice the number of shorter reads to obtain the same number of mappable junctions. So, the longer reads are cheaper as well.
        This is very useful! Thanks!

        Comment


        • #5
          Originally posted by Joke van Vugt View Post
          Does anyone know what the best read length is to quantify splice variants from RNA seq data using an Illumina HiSeq. The reference genome has been sequenced so assembly is not too much of a problem.
          On one hand the longest possible read length will increase identification of splice variants. However, with a shorter read length, more fragments can be sequenced (for a similar price), which increases quantification.
          Is there such a thing as an optimal read length in this case?
          If you're aiming to identify novel variants, longer is clearly better. This would be doubly-true with de-novo, of course.

          For quantifying known variants, it's a bit more complex, at least in theory. You want the maximum number of reads which hit only one variant - longer reads are moderately more likely to cover an alternative splicing point, but once you have enough to confirm the splice, the rest is a waste. But all reads which fail to identify a specific splicing variant are effectively a waste.

          I guess the priority should be on thus on total bases, and most likely with current pricing, that means 100bp reads. I don't think you can apply paired reads easily, but if you can do it for less than 2x the cost, i probably would consider it.

          Comment


          • #6
            To follow up on this, http://www.biomedcentral.com/1471-2105/12/323 discusses the tradeoff of read length vs pairing vs more reads.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X