SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Get the original read IDs from corrected reads by AllPath-LG error correction module abysslover Bioinformatics 1 05-28-2014 04:17 AM
How does picard determine reads that map to correct strand? batman Bioinformatics 5 09-16-2013 10:19 AM
Split Large FASTQ file in small FASTQ files with user defined number of reads Windows deepbiomed Bioinformatics 3 04-04-2013 08:14 AM
Ignore CCS reads - a correct assumption? ritzriya Pacific Biosciences 2 03-27-2012 10:36 PM
454 reads correct with illumina biocomfun 454 Pyrosequencing 6 02-12-2012 04:00 AM

Reply
 
Thread Tools
Old 09-11-2014, 12:46 PM   #1
sam789123
Junior Member
 
Location: asian

Join Date: Aug 2013
Posts: 3
Default how to verify large number corrected reads is correct?

Hi all,

I wrote a program of correction to correct pair-end reads.
But the method of verifying corrected reads is difficult for me.
I could only observe few corrected reads in the same time though blasting them and check the Identities.
I was wondering if there was a way to observe large number corrected reads.

Thanks.
sam789123 is offline   Reply With Quote
Old 09-11-2014, 02:14 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

BBMap can produce output that is helpful for this kind of thing:

Code:
Pairing data:           pct reads       num reads       pct bases          num bases

mated pairs:            100.0000%            1000       100.0000%             300000
bad pairs:                0.0000%               0         0.0000%                  0
insert size avg:          270.70


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                 100.0000%            1000       100.0000%             150000
unambiguous:             98.4000%             984        98.4000%             147600
ambiguous:                1.6000%              16         1.6000%               2400
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:       24.4000%             244        24.4000%              36600
semiperfect site:        24.4000%             244        24.4000%              36600
rescued:                  0.0000%               0

Match Rate:                   NA               NA        98.2567%             147385
Error Rate:              73.3000%             733         1.6680%               2502
Sub Rate:                73.3000%             733         1.6680%               2502
Del Rate:                 0.0000%               0         0.0000%                  0
Ins Rate:                 0.0000%               0         0.0000%                  0
N Rate:                  11.3000%             113         0.0753%                113


Read 2 data:            pct reads       num reads       pct bases          num bases

mapped:                 100.0000%            1000       100.0000%             150000
unambiguous:             98.7000%             987        98.7000%             148050
ambiguous:                1.3000%              13         1.3000%               1950
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:       22.0000%             220        22.0000%              33000
semiperfect site:        22.0000%             220        22.0000%              33000
rescued:                  0.0000%               0

Match Rate:                   NA               NA        98.2627%             147394
Error Rate:              75.0000%             750         1.6660%               2499
Sub Rate:                75.0000%             750         1.6660%               2499
Del Rate:                 0.0000%               0         0.0000%                  0
Ins Rate:                 0.0000%               0         0.0000%                  0
N Rate:                  10.7000%             107         0.0713%                107
That gives you the exact number reads and bases with errors.

It also has a few other useful flags - such as ehist and mhist - that produce histograms showing error rate distribution.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
correction, pair end reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO