Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why doesn't pbtranscript.py classify call reads of inserts for films with 2 to 5 rds

    I've processed 8 pacbio Cells corresponding to 3 different IsoSeq libraries with the pbtranscript.py pipeline.
    In the classify step of the procedure about 20% of the films (ZMW) do not produce a read of insert (RoI).
    When I check the number of reads per film for reads giving and not giving RoIs I get the following result.



    Which shows that films with 2 to 5 reads do not produce RoI or produce much less RoIs than other films.

    Any idea why?

  • #2
    I would like some clarification on what you mean by "not producing a RoI".

    The Iso-Seq classify steps are:

    --- using the CCS algorithm (which is generic and used for many things in addition by Iso-Seq) to generate RoI reads (in the future, they may be called CCS reads again, sorry for all the naming changes!)

    --- look at the RoI reads to identify 5' and 3' cDNA primers on the ends. It then "classifies" those RoI reads into full-length (has both 5' and 3' primer and polyA tail), and non-full-length (missing at least one of the criteria).


    When you say "no RoI", do you mean:
    (a) there was no RoI/CCS read for that ZMW.
    or
    (b) it was not full-length

    Also, are all the libraries the same size? What is the avg. transcript length in these libraries?

    I'm not entirely sure how I would explain what you observe (since I've not seen this myself). I did a # of passes vs RoI full-length detection survey a while back and it's different from what you see and is closer to what I'd expect:
    GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.




    Also for reference, here is a tutorial on using classify. It explains the parameters in detail:
    GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.


    And another wiki to explain what to expect from classify output:
    GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

    Comment


    • #3
      "no RoI" means (a) there was no RoI/CCS read for that ZMW.

      I've simply compared the ZMW names in the initial subreads file with the names in the RoI file.

      The libraries are of three sizes (1-2kb, 2-3kb, 3-6kb). The average lengths are respectively 2kb, 2.5kb and 3.2kb.

      Comment


      • #4
        Your reads are likely being filtered out by one of the criteria used (and which can be set as options to the command).

        If using CCS2, you should see a report such as ccs_report.csv that gives a break down of what reads were filtered and why. If using a more recent version of CCS1, after the program finishes running it will print a report that indicates the yield loss due to various filters. If you can report either of these results here I can give more guidance.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X