Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exclude chrM, chrUn* from reference // htseq-count warning on chrM

    Hello,

    currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

    I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:
    Code:
    because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file
    (only related to chrM and rarely on some chrUn_*).

    But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

    Any hints on that?

    Thanks in advance,
    Oliver

  • #2
    Originally posted by ocs View Post
    Hello,

    currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

    I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:
    Code:
    because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file
    (only related to chrM and rarely on some chrUn_*).

    But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

    Any hints on that?

    Thanks in advance,
    Oliver
    Oliver,

    Tophat (or any alignment program) aligns reads to a reference sequence, typically provided as a FASTA file and indexed by the alignment program. The GFF file is a feature annotation file, describing features (e.g. genes) relative to the reference sequence. TopHat aligned your reads to the reference genome provided by the FASTA file which included the sequences for chrM and chrUn_. However the GFF file associated with this genome contains no feature annotations for these sequences. This is a fairly typical occurrence with TopHat/htseq-count. It is safe to ignore these warnings.

    Comment


    • #3
      Ok, thanks, that makes sense. I got there something wrong ...

      Comment


      • #4
        If your DNA sample really contains DNA from the mitochrondria, you ought to put that in the fasta, so those reads can align to where they really belong. If you leave it out, the aligner might wrongly place those reads somewhere else, which will mess up the accuracy of your alignment.

        Probably, the software is just trying to warn you about something it thinks is strange. It's trying to warn you that you might have a messed up GFF, because it has no gene annotation for one of your chromosomes. But if that's how its supposed to be, then you should just carry on, because you know better than the computer.

        If you dont like that warning, you could always filter the bam so that the chrM reads are filtered away. If you have no GFF annotation for that chromosome, you don't need them for anything.

        Comment


        • #5
          Actually the mitochondrial genome is very important if you are doing human mutation screening. There are a lot of hereditary mitochondrial diseases. Mutations in mitochondria DNA have also been reported in cancer.

          Comment


          • #6
            For some reason I thought that when I provide an annotation file to tophat that it only aligns to the annotated genes (and left out the fact that it only cares about the splice junctions). Actually that makes no sense, my mind messed there something up ;-)
            Thanks to all for your answers!

            Comment


            • #7
              Hi ocs,

              Did you use the latest version of cufflinks? And you still see genes on chrM? I used the latest version of cufflinks, I want to have the genes on chrM, but the genes are all gone. I don't know why?

              Comment


              • #8
                I had been using cufflinks version 1.1.0 and TopHat version 1.3.3 and I have reads aligned to chrM but there are no annotated genes. Which reference file did you use?
                Last edited by ocs; 11-02-2011, 10:22 AM. Reason: Had to correct myself, I see no genes on chrM.

                Comment


                • #9
                  Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.

                  Comment


                  • #10
                    Are you using human genome reference too? How many genes on chrM you got?

                    Comment


                    • #11
                      Originally posted by arrchi View Post
                      Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.
                      Hello arrchi,

                      first, you have to be more specific. There are several reference genome files for the human genome (Ensembl, NCBI, UCSC: http://tophat.cbcb.umd.edu/igenomes.html). I used hg19 (UCSC) which I mentioned in my very first post. I also used the annotation file from the iGenome (genes.gtf). Which one did you use and which annotation?

                      Second, I don't know what you actually want - you want to have genes on chrM and then you have genes on chrM?

                      If you have a look at the annotation file (in my case genes.gtf from the iGenome hg19 package) you will find no annotated genes on chrM:
                      Code:
                      $ grep chrM genes.gtf | wc -l
                      0

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Advancing Precision Medicine for Rare Diseases in Children
                        by seqadmin




                        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                        12-16-2024, 07:57 AM
                      • seqadmin
                        Recent Advances in Sequencing Technologies
                        by seqadmin



                        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                        Long-Read Sequencing
                        Long-read sequencing has seen remarkable advancements,...
                        12-02-2024, 01:49 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 12-17-2024, 10:28 AM
                      0 responses
                      26 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-13-2024, 08:24 AM
                      0 responses
                      43 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-12-2024, 07:41 AM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-11-2024, 07:45 AM
                      0 responses
                      42 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X