Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why doesn't pbtranscript.py classify call reads of inserts for films with 2 to 5 rds

    I've processed 8 pacbio Cells corresponding to 3 different IsoSeq libraries with the pbtranscript.py pipeline.
    In the classify step of the procedure about 20% of the films (ZMW) do not produce a read of insert (RoI).
    When I check the number of reads per film for reads giving and not giving RoIs I get the following result.



    Which shows that films with 2 to 5 reads do not produce RoI or produce much less RoIs than other films.

    Any idea why?

  • #2
    I would like some clarification on what you mean by "not producing a RoI".

    The Iso-Seq classify steps are:

    --- using the CCS algorithm (which is generic and used for many things in addition by Iso-Seq) to generate RoI reads (in the future, they may be called CCS reads again, sorry for all the naming changes!)

    --- look at the RoI reads to identify 5' and 3' cDNA primers on the ends. It then "classifies" those RoI reads into full-length (has both 5' and 3' primer and polyA tail), and non-full-length (missing at least one of the criteria).


    When you say "no RoI", do you mean:
    (a) there was no RoI/CCS read for that ZMW.
    or
    (b) it was not full-length

    Also, are all the libraries the same size? What is the avg. transcript length in these libraries?

    I'm not entirely sure how I would explain what you observe (since I've not seen this myself). I did a # of passes vs RoI full-length detection survey a while back and it's different from what you see and is closer to what I'd expect:
    GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.




    Also for reference, here is a tutorial on using classify. It explains the parameters in detail:
    GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.


    And another wiki to explain what to expect from classify output:
    GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

    Comment


    • #3
      "no RoI" means (a) there was no RoI/CCS read for that ZMW.

      I've simply compared the ZMW names in the initial subreads file with the names in the RoI file.

      The libraries are of three sizes (1-2kb, 2-3kb, 3-6kb). The average lengths are respectively 2kb, 2.5kb and 3.2kb.

      Comment


      • #4
        Your reads are likely being filtered out by one of the criteria used (and which can be set as options to the command).

        If using CCS2, you should see a report such as ccs_report.csv that gives a break down of what reads were filtered and why. If using a more recent version of CCS1, after the program finishes running it will print a report that indicates the yield loss due to various filters. If you can report either of these results here I can give more guidance.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Working...
        X