SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
handle zero in raw read count BFM Bioinformatics 2 07-16-2014 01:23 PM
How to find library fragment size for ENA raw sequence data JIrish Bioinformatics 0 12-16-2012 12:15 PM
Raw read counts for RNAseq biofreak RNA Sequencing 2 06-15-2011 06:56 AM
Raw error rate calculation: JohnK Bioinformatics 1 12-28-2010 03:09 PM

Reply
 
Thread Tools
Old 01-22-2016, 03:13 PM   #1
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default find error rate in Raw Read

Hi,
How can i find the overall error rate of the Raw reads? and the rate of (substitutions, deletions and insertions)?
how to do?
Thanks
mido1951 is offline   Reply With Quote
Old 01-22-2016, 04:08 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Look at @Brian's post (#18) here. You would either need a reference or you will have to assemble your data into a reference.
GenoMax is offline   Reply With Quote
Old 01-22-2016, 04:18 PM   #3
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

i used BBmap for mapping. it is a good tool?
this is my result:
Code:
   ------------------   Results   ------------------

Genome:                 1
Key Length:             13
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             108490  (52257243 bases)

Mapping:                4967.598 seconds.
Reads/sec:              21.84
kBases/sec:             10.52


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  29.4433%           31943        29.9910%           15672446
unambiguous:             29.0607%           31528        29.6165%           15476766
ambiguous:                0.3825%             415         0.3745%             195680
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0046%               5         0.0002%                126
semiperfect site:         0.0046%               5         0.0002%                126

Match Rate:                   NA               NA        85.4575%           14050523
Error Rate:              40.9037%           31938        14.5421%            2390951
Sub Rate:                40.9024%           31937         7.2366%            1189814
Del Rate:                40.7756%           31838         4.6777%             769083
Ins Rate:                40.7103%           31787         2.6278%             432054
N Rate:                   0.0013%               1         0.0003%                 55

Total time:             5126.959 seconds.
in this example: the global error rate is 40.9024% or 14.5421%(per bases)?
and for Sub, Del, Ins?
thanks
mido1951 is offline   Reply With Quote
Old 01-22-2016, 04:27 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

BBMap is a great tool but it needs to be applied appropriately.

You need to provide additional information to get more. What kind of data is this, what are you mapping against, the command line options you used. BTW: You have only 30% of the reads mapping so that is low numbers to begin with (if you are mapping against a reference).
GenoMax is offline   Reply With Quote
Old 01-22-2016, 04:33 PM   #5
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

yes, I use my raw Reads for mapping to the reference.
and i use the default parameters for BBmap.
in this example: the global error rate is 40.9024% or 14.5421%(per bases)?
and for Sub, Del, Ins?
thanks
mido1951 is offline   Reply With Quote
Old 01-22-2016, 05:29 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

It appears that you did not read the post #18 in the thread that I linked above that tells how to plot the rates you are looking for.

If you are working with MinION or PacBio type data (long reads) then you should be using a mapPacBio.sh instead of bbmap.sh. That error rate may not be meaningful as it stands now.

To calculate the error rate (for long reads) you may have to do something like this:

Code:
$ mapPacBio.sh in=your_reads.fa ref=ref.fa mhist=mhist.txt qhist=qhist.txt maxlen=2000
@Brian is likely to swing by this thread later tonight (and may have specific suggestions). That example above was for PacBio data but I assume it may work for MinION data.
GenoMax is offline   Reply With Quote
Old 01-22-2016, 06:51 PM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Well, I'm not really sure what those reads are, at an average length of ~481bp. Probably PacBio, though, considering the ~14.5% error rate.

As GenoMax said, you should map PacBio (or minIon) reads with mapPacBio.sh. The usage and algorithm are the same as bbmap.sh, but it is designed for the PacBio error model.

The error rates you want are in the "pct bases" column.
Brian Bushnell is offline   Reply With Quote
Old 01-22-2016, 06:55 PM   #8
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

i work with MinION reads with an average lenght ~5000bp.
For this i use mapPacBio.sh?

excuse me, for the error rate i work with "pct bases" column?
thanks
mido1951 is offline   Reply With Quote
Old 01-23-2016, 09:42 AM   #9
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by mido1951 View Post
i work with MinION reads with an average lenght ~5000bp.
For this i use mapPacBio.sh?
Yes. And you may need to use higher-than-default sensitivity, if the data is particularly low quality; you can adjust sensitivity with the "minid" flag. E.g. "minid=0.5" will try to map reads down to 50% identity.

Quote:
excuse me, for the error rate i work with "pct bases" column?
thanks
That's correct.
Brian Bushnell is offline   Reply With Quote
Old 01-23-2016, 10:01 AM   #10
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

I used mapPacBio.sh and mapPacBio8k.sh.
but the execution takes a long time compared to bbmap.sh.
that's logic?
mido1951 is offline   Reply With Quote
Old 01-23-2016, 11:18 AM   #11
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Yep, mapPacBio is slower, because it supports higher sensitivity and longer reads. Note that "mapPacBio8k.sh" is not in the latest release, so you may be using an older version that might also be somewhat slower.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO