You can reduce kmer length and increase sensitivity - change "k=13" to "k=11" and add "slow". But ultimately, since BBMap is a global aligner, it does not like large structural rearrangements when aligning long queries, since they do not fit into the model of "match/substitution/insertion/deletion" reported in single alignments.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
High error rate with BBmap?
Hi Brian,
while trying to call variants on a human MT DNA I noticed the unusually high error rate in mapping with bbmap (37.10 and 36.92 tested) compared to bwa (0.7.12-r1039). Both were run with the default parameters, and below is the output of samtools stats (after sorting and indexing bams). IGV shows many more inverted pairs with bbmap mapping, and this came as a surprise.
I can provide details, however I prefer to share them offline due to the sensitivity of some of the data. Any hints on how to track down the issue?
Many thanks!
BBmap
Code:BBM: raw total sequences: 537764 BBM: filtered sequences: 0 BBM: sequences: 537764 BBM: is sorted: 1 BBM: 1st fragments: 268882 BBM: last fragments: 268882 BBM: reads mapped: 256366 BBM: reads mapped and paired: 230042 # paired-end technology bit set + both mates mapped BBM: reads unmapped: 281398 BBM: reads properly paired: 228748 # proper-pair bit set BBM: reads paired: 537764 # paired-end technology bit set BBM: reads duplicated: 0 # PCR or optical duplicate bit set BBM: reads MQ0: 0 # mapped and MQ=0 BBM: reads QC failed: 4282 BBM: non-primary alignments: 0 BBM: total length: 67353071 # ignores clipping BBM: bases mapped: 32229667 # ignores clipping BBM: bases mapped (cigar): 29307 # more accurate BBM: bases trimmed: 0 BBM: bases duplicated: 0 BBM: mismatches: 1672477 # from NM fields BBM: error rate: 5.706749e+01 # mismatches / bases mapped (cigar) BBM: average length: 125 BBM: maximum length: 151 BBM: average quality: 30.4 BBM: insert size average: 255.7 BBM: insert size standard deviation: 117.0 BBM: inward oriented pairs: 60121 BBM: outward oriented pairs: 2977 BBM: pairs with other orientation: 48 BBM: pairs on different chromosomes: 0
Code:BWA: raw total sequences: 537764 BWA: filtered sequences: 0 BWA: sequences: 537764 BWA: is sorted: 1 BWA: 1st fragments: 268882 BWA: last fragments: 268882 BWA: reads mapped: 266324 BWA: reads mapped and paired: 248064 # paired-end technology bit set + both mates mapped BWA: reads unmapped: 271440 BWA: reads properly paired: 245908 # proper-pair bit set BWA: reads paired: 537764 # paired-end technology bit set BWA: reads duplicated: 0 # PCR or optical duplicate bit set BWA: reads MQ0: 178 # mapped and MQ=0 BWA: reads QC failed: 0 BWA: non-primary alignments: 0 BWA: total length: 67353071 # ignores clipping BWA: bases mapped: 33736838 # ignores clipping BWA: bases mapped (cigar): 32160973 # more accurate BWA: bases trimmed: 0 BWA: bases duplicated: 0 BWA: mismatches: 603462 # from NM fields BWA: error rate: 1.876380e-02 # mismatches / bases mapped (cigar) BWA: average length: 125 BWA: maximum length: 151 BWA: average quality: 30.4 BWA: insert size average: 384.2 BWA: insert size standard deviation: 1040.7 BWA: inward oriented pairs: 67383 BWA: outward oriented pairs: 3645 BWA: pairs with other orientation: 112 BWA: pairs on different chromosomes: 0
Comment
-
I noticed this line:
BBM: bases mapped (cigar): 29307 # more accurate
Also, I find it odd that the insert size average and stdev differ so much. Often the mode or median is more stable as it is less influenced by outliers.
By inverted pairs, do you mean both reads map to the same side of the reference? If not, I wonder if the fact that the reference is very small and circular could play some role in the differences.
Anyway, please feel free to email me both bam files and the reference, and I'll look at them. It's possible that the circularity, or the fact that bwa looks for local rather than global alignments, is the primary factor behind the difference.
Comment
-
-
Hi Kristian,
Is this a Nextera long-mate pair library? Those need special processing before they can be mapped. Or... can you give me any more information about the library construction, and the trimming methodology? The library has an extremely high error rate (particularly with read 2), less than half of the reads map to the mito, and it appears that both adapters and transposase are still present.... also, I'm measuring the median insert size as 159 (BBMap) or 133 (BBMerge), so there are a lot of pairs with insert size shorter than the sequenced read length; those might be displayed differently in IGV depending on whether the adapter portion was soft-clipped (which bwa would do by default) or not (bbmap does not soft-clip by default).
I adapter-trimmed the reads and error-corrected them, but still under 50% map. I'm not really sure what's wrong with the library. But, I don't see anything unusual about the pairing orientations. I get 45.5670% properly paired with "rcs=f" (require correct strand = false) and 45.5481% with "rcs=t", so only 0.02% map in the wrong orientation.Last edited by Brian Bushnell; 04-17-2017, 02:38 PM.
Comment
-
Hi Brian,
Originally posted by Brian Bushnell View PostIs this a Nextera long-mate pair library?Last edited by Kristian; 04-18-2017, 05:21 AM.
Comment
-
I was testing something with BBMap today and I realized I had forgotten something about how to use it and couldn't figure it out. I was mapping RNA-Seq and I thought that it would report spliced alignment cigars (using sam 1.3) as /[0-9]+M[0-9]+N[0-9]+M/ but it was reporting the introns as deletions with the [D] cigar value. Is that right or is there a way to get it to not do that?
Thanks-/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
Comment
-
Hi Brian,
Is there anyway to use bbmap (or any other of your tools) to map a read to a reference file and then trim anything to the left of the reference sequence?
For example
My reference is
XXXXXXXXXXXXXXXXXX
And my reads are
NNNNXXXXXXXXXXXXXXXXXX
NNNXXXXXXXXXXXXXXXXXX
NNXXXXXXXXXXXXXXXXXX
I want them to be
XXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXX
I basically want to just trim anything of the left of the reads that doesn't match my reference? Thanks in advance.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
128 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
||
Started by seqadmin, 11-08-2024, 06:13 AM
|
0 responses
96 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
67 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Comment