SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
MetaCV: a composition-based algorithm to classify metagenomic reads and function whfwind Metagenomics 2 06-22-2016 08:47 AM
bcftools doesn't call suspected indel rwmills Bioinformatics 3 10-05-2015 09:43 AM
TopHat2 doesn't map spliced reads bioserv Bioinformatics 3 01-21-2014 04:20 AM
assembly with unsatisfying results - use new reads with larger inserts? martin_313 Bioinformatics 4 01-23-2012 12:52 AM
Newbler --- classify many reads as Repeat NSTbioinformatics 454 Pyrosequencing 2 06-10-2010 03:43 AM

Reply
 
Thread Tools
Old 01-29-2016, 04:37 AM   #1
cklopp
Member
 
Location: Toulouse France

Join Date: Sep 2009
Posts: 12
Default Why doesn't pbtranscript.py classify call reads of inserts for films with 2 to 5 rds

I've processed 8 pacbio Cells corresponding to 3 different IsoSeq libraries with the pbtranscript.py pipeline.
In the classify step of the procedure about 20% of the films (ZMW) do not produce a read of insert (RoI).
When I check the number of reads per film for reads giving and not giving RoIs I get the following result.



Which shows that films with 2 to 5 reads do not produce RoI or produce much less RoIs than other films.

Any idea why?
cklopp is offline   Reply With Quote
Old 03-01-2016, 01:44 PM   #2
Magdoll
Member
 
Location: Bay Area

Join Date: Aug 2011
Posts: 30
Default

I would like some clarification on what you mean by "not producing a RoI".

The Iso-Seq classify steps are:

--- using the CCS algorithm (which is generic and used for many things in addition by Iso-Seq) to generate RoI reads (in the future, they may be called CCS reads again, sorry for all the naming changes!)

--- look at the RoI reads to identify 5' and 3' cDNA primers on the ends. It then "classifies" those RoI reads into full-length (has both 5' and 3' primer and polyA tail), and non-full-length (missing at least one of the criteria).


When you say "no RoI", do you mean:
(a) there was no RoI/CCS read for that ZMW.
or
(b) it was not full-length

Also, are all the libraries the same size? What is the avg. transcript length in these libraries?

I'm not entirely sure how I would explain what you observe (since I've not seen this myself). I did a # of passes vs RoI full-length detection survey a while back and it's different from what you see and is closer to what I'd expect:
https://github.com/PacificBioscience...ngth-detection



Also for reference, here is a tutorial on using classify. It explains the parameters in detail:
https://github.com/PacificBioscience...l-length-reads

And another wiki to explain what to expect from classify output:
https://github.com/PacificBioscience...Cluster-Output
Magdoll is offline   Reply With Quote
Old 03-02-2016, 11:34 PM   #3
cklopp
Member
 
Location: Toulouse France

Join Date: Sep 2009
Posts: 12
Default

"no RoI" means (a) there was no RoI/CCS read for that ZMW.

I've simply compared the ZMW names in the initial subreads file with the names in the RoI file.

The libraries are of three sizes (1-2kb, 2-3kb, 3-6kb). The average lengths are respectively 2kb, 2.5kb and 3.2kb.
cklopp is offline   Reply With Quote
Old 03-03-2016, 10:44 AM   #4
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

Your reads are likely being filtered out by one of the criteria used (and which can be set as options to the command).

If using CCS2, you should see a report such as ccs_report.csv that gives a break down of what reads were filtered and why. If using a more recent version of CCS1, after the program finishes running it will print a report that indicates the yield loss due to various filters. If you can report either of these results here I can give more guidance.
ndelaney is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:42 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO