Unconfigured Ad

**Brian Bushnell** · 12-09-2016, 10:00 AM

I think, maybe, you are over-complicating it. The point of mapping is to place reads on their origin; discarding reads with low identity to their origin incurs ref-bias. So, reference-bias is generally an artifact of mapping programs that have insufficient sensitivity. I suggest you try BBMap, which has very high sensitivity (meaning, it can align reads with low identity to the reference).

In your specific case, since you have fasta files for two different mouse strains, I suggest using BBSplit to allocate the reads to the different strains, then use BBMap to map to each strain independently.

**kintany** · 12-09-2016, 02:12 PM

Brian, thank you for the answer!

Reference bias is generally not about sensitivity. It is allelic mapping bias: read carrying the alternative allele of a variant has at least one mismatch, and thus have lower probability to align correctly that the reference reads. And this would be true regardless overall sensitivity of aligner.

BBMap sounds good, thank you. You write

"I suggest using BBSplit to allocate the reads to the different strains".

What do you mean by "allocate"? You suggest to map reads to individual genomes, right? And is Pileup.sh able to calculate coverage for two alleles then? Thank you!

**Brian Bushnell** · 12-09-2016, 03:13 PM

Originally posted by kintany View Post

Reference bias is generally not about sensitivity. It is allelic mapping bias: read carrying the alternative allele of a variant has at least one mismatch, and thus have lower probability to align correctly that the reference reads.

I'm not sure I agree with that. Basically, a perfect aligner would map all reads somewhere. So even if a read has some mismatches, it should get mapped to its origin, as long as the sensitivity is sufficient. In rare cases, changes would make it map to somewhere else better, which would incur ref bias; but in my experience, the leading cause of ref bias is mapper insensitivity (meaning, reads that don't match the reference simply don't get mapped), rather than coincidental matches to other parts of the genome due to mutations or errors.

BBMap sounds good, thank you. You write

"I suggest using BBSplit to allocate the reads to the different strains".

What do you mean by "allocate"? You suggest to map reads to individual genomes, right? And is Pileup.sh able to calculate coverage for two alleles then? Thank you!

If you give BBSplit multiple reference fastas, it will take a single input fastq (or two paired fastqs) and produce multiple output fastqs, one per reference. The outputs will be the reads that best match each reference. You can specify what should be done with reads matching multiple references equally well with the "ambig2" flag (ambig2=toss, ambig2=all, etc).

As for Pileup - all it does is calculate the coverage according to a sam/bam file. So, for example, if all reads were mapped correctly:

Code:

pileup.sh in=mapped.sam out=stats.txt

That would tell you the coverage on a per-scaffold basis. It does not have any understanding of multiple alleles, but it will correctly report the coverage of a sam/bam file that was mapped to multiple concatenated references. representing different alleles.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 40 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 102 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 123 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 114 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Allelic Imbalance with GSNAP

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News