Seqanswers Leaderboard Ad

**kmcarr** · 10-27-2011, 05:15 AM

Originally posted by ocs View Post

Hello,

currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:

Code:

because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file

(only related to chrM and rarely on some chrUn_*).

But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

Any hints on that?

Thanks in advance,
Oliver

Oliver,

Tophat (or any alignment program) aligns reads to a reference sequence, typically provided as a FASTA file and indexed by the alignment program. The GFF file is a feature annotation file, describing features (e.g. genes) relative to the reference sequence. TopHat aligned your reads to the reference genome provided by the FASTA file which included the sequences for chrM and chrUn_. However the GFF file associated with this genome contains no feature annotations for these sequences. This is a fairly typical occurrence with TopHat/htseq-count. It is safe to ignore these warnings.

**ocs** · 10-27-2011, 05:28 AM

Ok, thanks, that makes sense. I got there something wrong ...

**swbarnes2** · 10-27-2011, 07:50 AM

If your DNA sample really contains DNA from the mitochrondria, you ought to put that in the fasta, so those reads can align to where they really belong. If you leave it out, the aligner might wrongly place those reads somewhere else, which will mess up the accuracy of your alignment.

Probably, the software is just trying to warn you about something it thinks is strange. It's trying to warn you that you might have a messed up GFF, because it has no gene annotation for one of your chromosomes. But if that's how its supposed to be, then you should just carry on, because you know better than the computer.

If you dont like that warning, you could always filter the bam so that the chrM reads are filtered away. If you have no GFF annotation for that chromosome, you don't need them for anything.

**NextGenSeq** · 10-27-2011, 08:10 AM

Actually the mitochondrial genome is very important if you are doing human mutation screening. There are a lot of hereditary mitochondrial diseases. Mutations in mitochondria DNA have also been reported in cancer.

**ocs** · 11-01-2011, 02:18 AM

For some reason I thought that when I provide an annotation file to tophat that it only aligns to the annotated genes (and left out the fact that it only cares about the splice junctions). Actually that makes no sense, my mind messed there something up ;-)
Thanks to all for your answers!

**arrchi** · 11-02-2011, 05:30 AM

Hi ocs,

Did you use the latest version of cufflinks? And you still see genes on chrM? I used the latest version of cufflinks, I want to have the genes on chrM, but the genes are all gone. I don't know why?

**ocs** · 11-02-2011, 09:18 AM

I had been using cufflinks version 1.1.0 and TopHat version 1.3.3 and I have reads aligned to chrM but there are no annotated genes. Which reference file did you use?

**arrchi** · 11-02-2011, 10:05 AM

Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.

**arrchi** · 11-02-2011, 10:08 AM

Are you using human genome reference too? How many genes on chrM you got?

**ocs** · 11-02-2011, 10:21 AM

Originally posted by arrchi View Post

Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.

Hello arrchi,

first, you have to be more specific. There are several reference genome files for the human genome (Ensembl, NCBI, UCSC: http://tophat.cbcb.umd.edu/igenomes.html). I used hg19 (UCSC) which I mentioned in my very first post. I also used the annotation file from the iGenome (genes.gtf). Which one did you use and which annotation?

Second, I don't know what you actually want - you want to have genes on chrM and then you have genes on chrM?

If you have a look at the annotation file (in my case genes.gtf from the iGenome hg19 package) you will find no annotated genes on chrM:

Code:

$ grep chrM genes.gtf | wc -l
0

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Exclude chrM, chrUn* from reference // htseq-count warning on chrM

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News