![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
tool for mapping mobile elements? | ryanmcg | Bioinformatics | 3 | 01-16-2013 10:02 PM |
Tool to take mapping to gene list | jarthur | Bioinformatics | 1 | 01-16-2013 04:31 PM |
best RNAseq mapping tool for bacteria | sissi | Bioinformatics | 6 | 03-05-2012 04:48 PM |
mapping tool for resequencing | glacerda | Bioinformatics | 0 | 04-01-2011 08:22 AM |
Is a Gene mapping tool available? | ritzriya | Bioinformatics | 0 | 02-15-2011 01:45 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Tunisia Join Date: May 2014
Posts: 123
|
![]()
Hello,
I wanted to know if bbmap gives me the following informations because I have not found in the tool's description. Does BBmap only give me statistics on the mapping of reads on the reference genome? Can it give me information if the reads cover the entire reference genome ? thanks |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Tunisia Join Date: May 2014
Posts: 123
|
![]()
so BBmap do not do that?
i must use BEDToos? Thanks |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Bethesda MD Join Date: Oct 2009
Posts: 509
|
![]()
Try BBMap's 'covstats' command - I think it provides a summary of coverage.
|
![]() |
![]() |
![]() |
#5 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Yep - you can use "covstats=covstats.txt covhist=covhist.txt" to print the coverage information of the whole genome as well as on a per-scaffold basis, and also give a distribution of base coverage over the genome. For more information see the "Coverage output parameters" section of bbmap.sh. Once the reads are mapped, using the sam or bam file, you can do the same thing with pileup.sh.
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Tunisia Join Date: May 2014
Posts: 123
|
![]()
thank you for your response.
Can you explain me this output please. Average coverage: 25,49 Standard deviation: 6,38 Percent scaffolds with any coverage: 100,00 Percent of reference bases covered: 99,85 and in covstats.txt: Code:
#ID Avg_fold Length Ref_GC Covered_percent Covered_bases Plus_reads Minus_reads Read_GC Median_fold Std_Dev gi|556503834|ref|NC_000913.3| Escherichia coli str. K-12 substr. MG1655, complete genome 26,6251 4641652 0,5079 99,8479 4634590 123276 118657 0,5042 26 6,38 |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Did you inspect the alignment using IGV or a genome browser of some kind? It should be reasonably clear what those numbers mean. Looks like you have a small fraction of the genome (0.15%) that does not have at least one read covering it.
Differences like this could be due to your strain being slightly different than the reference sequence. |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Tunisia Join Date: May 2014
Posts: 123
|
![]()
I did a mapping of the long reads on a reference genome using bbmap (with covstats option). I wanted to know if all the reference genome is covred by the long reads.
I do not mean what the different words mean: "Average coverage, Standard deviation, Percent scaffolds with any coverage, Percent of reference bases covered".?? How can I know if the whole reference genomeis covered by long reads? Thanks. |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Since the Percent of reference bases covered: 99,85 is not 100% there are at least 0.15% of bases that are not covered by one read, long or otherwise.
Last edited by GenoMax; 06-08-2017 at 05:02 AM. |
![]() |
![]() |
![]() |
#10 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Bacteria have circular genomes, represented in fasta files as linear with a breakpoint somewhere. Mapping (and thus coverage calculation) is less accurate at the ends due to the artificial break; BBMap will place a read spanning the break on either the left end or the right end, but not both. You might want to look at the ends to see if the uncovered bases are there.
|
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Tunisia Join Date: May 2014
Posts: 123
|
![]()
hello Brian,
Can you explain me the algorithm of BBmap please? Thanks |
![]() |
![]() |
![]() |
#12 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
That would require weeks of work and is out of the scope of this forum... but I suggest you read this paper:
http://bib.irb.hr/datoteka/773712.Ig..._diplomski.pdf |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Tunisia Join Date: May 2014
Posts: 123
|
![]()
ok. thank you for the link.
I have one more question to the output file of BBmap. This is an output file: Code:
Read 1 data: pct reads num reads pct bases num bases mapped: 99,7746% 2365043 99,8117% 1168097245 unambiguous: 96,5758% 2289219 96,6551% 1131155596 ambiguous: 3,1988% 75824 3,1566% 36941649 low-Q discards: 0,0000% 0 0,0000% 0 perfect best site: 36,0494% 854511 36,4545% 426626981 semiperfect site: 36,0505% 854537 36,4556% 426639736 Match Rate: NA NA 99,5357% 1166772129 Error Rate: 63,8415% 1510509 0,4634% 5431521 Sub Rate: 13,2884% 314407 0,0677% 794173 Del Rate: 56,8578% 1345272 0,3512% 4117267 Ins Rate: 15,1778% 359111 0,0444% 520081 N Rate: 0,0128% 302 0,0009% 10862 Average coverage: 96,08 Standard deviation: 26,28 Percent scaffolds with any coverage: 100,00 Percent of reference bases covered: 99,28 and if "pct reads" is the percent of mapped reads. are there "num reads" are the number of mapped reads? because if t'is the number of mapped reads i haven't 2365043 reads? and why the Error rate for "pct reads" is high? thanks for your help. Last edited by mido1951; 06-11-2017 at 09:34 AM. |
![]() |
![]() |
![]() |
#14 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
"perfect" means every base in the read matched the reference (no mismatches, indels, or Ns). "semiperfect" is similar but it allows Ns in the reference.
The "num reads" column in "mapped" row indicates the number of reads that were mapped. Don't worry about the high error rate. That just indicates that 63.8% of your reads had at least one error (mismatching base, indel, etc). The more important number is the % of bases, which is only 0.46%, so the reads have roughly 99.5% identity to the reference. |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Tunisia Join Date: May 2014
Posts: 123
|
![]() |
![]() |
![]() |
![]() |
#16 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Oh - I see the confusion. BBMap is not capable of mapping unlimited-length reads. It can only map reads up to 600bp. mapPacBio.sh is a version of BBMap that supports longer reads, up to 6000bp. Reads longer than that will be shredded into shorter pieces which are mapped and tracked individually. So it's not ideal for long read platforms for Nanopore when you need full-length alignments.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|