Seqanswers Leaderboard Ad

**GenoMax** · 09-05-2014, 03:22 AM

https://github.com/PacificBiosciences/pbdagcon may be able to do this but you will have to generate blasr alignments for your reads.

It should be a trivial (relatively) task for the sequence provider to run the "RS_ReadsOfInsert" protocol on SMRTcells and generate the data you need. Try asking them.

**bastianwur** · 09-05-2014, 07:33 AM

Thanks

!

The collaborators are a bit complicated, therefore I'd try to get around them.

BLASR alignments are not a problem.
Installing pbdagcon is though.
Somehow the organization of the folders seems to be messed up, it doesn't find the right header/cpp files at the right location (most likely related to the fact that some folders are not included in the default git download, or in the clone), and just doesn't compile. I've messed around with the file locations for some time, edited in another compiler flag (because it was complaining about some conversion), but...no...I don't get there.

Maybe I'm doing something wrong though.
Just make in the home directory of the download doesn't do anything, and in the cpp directory the problems begin.

Has anyone tested if the download compiles on another machine?

**mjhsieh** · 09-05-2014, 12:26 PM

Originally posted by bastianwur View Post

Has anyone tested if the download compiles on another machine?

You can try download again https://github.com/PacificBiosciences/pbdagcon since it just got updated.

**gconcepcion** · 09-05-2014, 12:49 PM

Originally posted by bastianwur View Post

Hi everyone,

I have a small problem at the moment.

From our collaborators I got
- 1 assembled genome
- the related PacBio subreads

I did NOT get the smrtcell data (I could ask, but well...not if I can get around it in an easy way).

For my genome submission, I now need to calculate the coverage of my genome. If I map the subreads to the genome, this will not give an accurate result. Therefore I'd like to create the CCS from the subreads.
I'd assume that the protocol "RS_subreads.1" in smrtportal would maybe do something like this. But I cannot even test that, because I cannnot import the subreads, because I don't have the related smrtcell data.

Does anyone have maybe any idea how I could solve this without handling the smrtcell data?

If you want high quality CCS, you need to start from SMRTCell data. Without the SMRTCell data, you are running the consensus calling algorithm quiver without the necessary quality value data (InsertionQV, DeletionQV, SubstitutionQV, and MergeQV) to generate highly accurate consensus calls.

**rhall** · 09-05-2014, 01:30 PM

If I am correct in understanding that you want the coverage of single molecules (inserts rather than the subread coverage), why don't you just select the longest subread from each read and map those against the genome. The accuracy gain from calculating consensus of the subreads from one insert (either using pbdagcon or CCS) will not result in any significant difference in the mapping, and the consensus is best calculated from all the subreads using quiver.

**bastianwur** · 09-07-2014, 11:38 PM

Originally posted by mjhsieh View Post

You can try download again https://github.com/PacificBiosciences/pbdagcon since it just got updated.

Thanks, it builds now

.

Originally posted by gconcepcion View Post

If you want high quality CCS, you need to start from SMRTCell data. Without the SMRTCell data, you are running the consensus calling algorithm quiver without the necessary quality value data (InsertionQV, DeletionQV, SubstitutionQV, and MergeQV) to generate highly accurate consensus calls.

mmhh....okay, will consider that, if I don't get good enough results.

Originally posted by rhall View Post

If I am correct in understanding that you want the coverage of single molecules (inserts rather than the subread coverage), why don't you just select the longest subread from each read and map those against the genome. The accuracy gain from calculating consensus of the subreads from one insert (either using pbdagcon or CCS) will not result in any significant difference in the mapping, and the consensus is best calculated from all the subreads using quiver.

That...actually makes sense, thanks.
Maybe I'll see if it makes a difference.

**flxlex** · 09-07-2014, 11:41 PM

Originally posted by rhall View Post

If I am correct in understanding that you want the coverage of single molecules (inserts rather than the subread coverage), why don't you just select the longest subread from each read and map those against the genome.

And, the read names will tell you which reads are subreads of the same ZMW ('well'). See https://github.com/PacificBioscience...#readexplained, scroll a bit down to the part that says

Code:

<movieName>/<ZMW number>/<subread start_subread end>

**bastianwur** · 09-14-2014, 11:49 PM

Thanks, I've already digged through the PacBio website

.

Originally posted by GenoMax View Post

https://github.com/PacificBiosciences/pbdagcon may be able to do this but you will have to generate blasr alignments for your reads.

Originally posted by bastianwur View Post

BLASR alignments are not a problem.

I might have been to fast with this ^^.
What do I exactly need to map to what?
Right now it seems that I'd need for every subread a separate alignment file...or am I wrong? Can do that, but would rather get around that.
(computer scientists are lazy people, right ^^?)

Unrelated: Tremendous difference between bowtie2 + blasr alignments for the longest subread.
First one maps 5%, the second maps 50%.
(library highly contaminated with e.coli + vectors, roughly up to 50%, so that fits)

**Brian Bushnell** · 09-15-2014, 08:33 AM

Just make a hash table for each read's name, and store a representative subread of each read in it. Then dump all of it to a single fasta/fastq file, and map that, so you get one sam file from which you can calculate coverage.

bowtie2 is not designed for high error rates; no point in using that with raw PacBio data.

**bastianwur** · 09-22-2014, 01:37 AM

Originally posted by Brian Bushnell View Post

Just make a hash table for each read's name, and store a representative subread of each read in it. Then dump all of it to a single fasta/fastq file, and map that, so you get one sam file from which you can calculate coverage.

Did that to get the above values

.

But yeah, I guess I'll stay with that, since I'm running out of time.

Thanks

.

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

How to create CCS from subreads without smrtcell data?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News