Hi all,
when you download the human reference genome it usually includes many different kinds of unplaced supercontigs such as GL000217, GL000218, etc.
I am wondering whether they should be included in the reference genome when mapping, for example with tophat? I have noticed that a lot of people exclude them but I wonder what the reason is.
The reason I ask is because when I include them I get around 50% of my reads mapping to supercontig GL000220, which apparently is full of ribosomal sequences (I used rRNA removal, not poly A selection). Has anybody run into similar things?
When I only include the known chromosomes (i.e. 1-22,X,Y and MT) in the reference I get slightly more reads mapping to these but a significant increase in unmapped reads.
I am leaning towards it is better to include them to get a more 'correct' overview of the mapping but then exclude them for further downstream analysis.
Any thoughts? All input appreciated.
blanco
when you download the human reference genome it usually includes many different kinds of unplaced supercontigs such as GL000217, GL000218, etc.
I am wondering whether they should be included in the reference genome when mapping, for example with tophat? I have noticed that a lot of people exclude them but I wonder what the reason is.
The reason I ask is because when I include them I get around 50% of my reads mapping to supercontig GL000220, which apparently is full of ribosomal sequences (I used rRNA removal, not poly A selection). Has anybody run into similar things?
When I only include the known chromosomes (i.e. 1-22,X,Y and MT) in the reference I get slightly more reads mapping to these but a significant increase in unmapped reads.
I am leaning towards it is better to include them to get a more 'correct' overview of the mapping but then exclude them for further downstream analysis.
Any thoughts? All input appreciated.
blanco