The case is I'm analyzing murine tumor tissues (RNA-seq) on retroviral insertions. I have made a custom miltifasta genome, in which I have added the retroviral genome as an additional chromosome.
When I map (BWA) I get an enormous background and many reads with mapping quality of zero because they map all around the murine chromosomes. Of course if one thinks about this, at least for the human genome an estimated 8-10% is of retroviral origin, hence background would be expected.
Now, is it possible in some way to mask off the reference genome for endogenous retroviral sequences. Im a newbie in bioinformatics, but essentially what I want to do is copy-paste the retroviral genome and let som algorithm search in (perhaps) 20-40 bp frame windows across MM9 and replace regions with NNNs, that match my input retroviral sequence.
Does anyone have some input on this? Or other considerations would be highly appriciated. Alternatively I could map against strain specific murien genome? Or transcriptome? But how to get this?
TIA
When I map (BWA) I get an enormous background and many reads with mapping quality of zero because they map all around the murine chromosomes. Of course if one thinks about this, at least for the human genome an estimated 8-10% is of retroviral origin, hence background would be expected.
Now, is it possible in some way to mask off the reference genome for endogenous retroviral sequences. Im a newbie in bioinformatics, but essentially what I want to do is copy-paste the retroviral genome and let som algorithm search in (perhaps) 20-40 bp frame windows across MM9 and replace regions with NNNs, that match my input retroviral sequence.
Does anyone have some input on this? Or other considerations would be highly appriciated. Alternatively I could map against strain specific murien genome? Or transcriptome? But how to get this?
TIA
Comment