Seqanswers Leaderboard Ad

**Brian Bushnell** · 08-19-2016, 01:29 PM

You can capture long deletions when mapping with BBMap; it places them in a single gapped alignment rather than as two alignments, allowing you to analyze with the same tools you use for short indels. Just add the flag "maxindel=200000" (the default is 16000).

**JWolters** · 08-24-2016, 11:24 AM

Thanks for the tip!

I will give it a shot as that would be much simpler than having to map separately to each genome, undergo the analysis, and then synthesize the results.

**JWolters** · 09-01-2016, 12:44 PM

Hate to double post but I thought this might be better posted in this context rather than in the main BBmap support thread.

When I align my sequencing reads from the pool to the mixed reference, I get a lot of gap ('-') alleles being reported in regions that are actually almost completely conserved in the two parental genomes. The conserved parental allele is still found, but almost half of the alleles at these sites are reported as a gap allele.

This does not happen when mapping with BWA to one of the parental genomes (though that has its own set of issues).

This occurs even when I toss ambiguous mappings and reduce the max indel size to 5000, which is more realistic for these small (~85 kb) genomes.

Unfortunately, this results in extensive false positives in the final results. Any insight into why this might be happening?

**Brian Bushnell** · 09-01-2016, 03:43 PM

How do you know these are false positives? And, can you explain in more detail what you are doing? Like, what do you mean by the parent genomes, for example, and where the reads came from... only viruses have genomes ~85kb, but they don't have parents that I'm aware of. Does BBMap yield the same gaps when mapped to the parent genomes? I guess, a more thorough explanation of the experiment would be helpful.

**JWolters** · 09-02-2016, 07:22 AM

Apologies for a lack of clarity on my part, I have limited experience with short-read alignment. Thank you for your super fast replies.

Originally posted by Brian Bushnell View Post

Like, what do you mean by the parent genomes, for example, and where the reads came from... only viruses have genomes ~85kb, but they don't have parents that I'm aware of.

The "parental" genomes are yeast mitochondrial genomes which explains the short (but long compared to metazoan) genome lengths of ~85 kb though they do vary in size quite a bit between strains. These genomes are fairly problematic for alignments in some ways due to big repetitive AT rich intergenic areas interspersed with short (30 bp) repetitive GC rich areas, but the repeat lengths are generally much lower than the read length. I can imagine the lack of complexity giving rise to issues in the AT-rich areas.

I refer to them as parents because I conducted an experiment in which two haploid strains with identical nuclear genomes but different mtDNAs, are allowed to mate which, in yeast, can produce recombinant mtDNAs. I have the full mitochondrial genomes of each parent (and I am reasonably confident of the accuracy).

Originally posted by Brian Bushnell View Post

I guess, a more thorough explanation of the experiment would be helpful.

I sequenced one pool in which I selected for diploid cells after mating the two parental haploids, and sequenced a second pool in which an aliquot of the first pool had undergone a selection. The reads come from these two pools and each pool is being mapped to the same reference separately.

My end goal is to analyze the difference in allele frequencies in the selected pool vs the unselected pool to help map the genetic basis to phenotypic differences between the two parental mtDNAs.

Originally posted by Brian Bushnell View Post

How do you know these are false positives? And, can you explain in more detail what you are doing? Does BBMap yield the same gaps when mapped to the parent genomes?

I am not totally certain these are false positives and I can think of a couple of biological reasons for this to occur. I think its more straight-forward to explain what I do know that makes me suspect these alignments are spurious.

1. The parental references are nearly invariable in this region. The region does contain introns known to be absent in many strains and thus I cannot rule out such a change without further investigation.
2. The alignments are showing gaps mapping across exons of COX1, a critical respiratory gene. The second pool was grown in conditions requiring respiration to grow, but the alignment of these reads also shows a high frequency of '-' alleles in this region spanning across exons.

I am currently mapping the reads to one of the parental genomes and will update this post, or add one below, as soon as I have that information.

Edit: The '-' alleles are also appearing when aligning directly to one of the parental genomes.

Frankly, not nearly enough is known about the recombination dynamics of yeast mtDNA. If these alignments are not spurious that could prove to be extremely interesting.

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Are gaps in split alignments reported in mpileup output?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News