Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exclude chrM, chrUn* from reference // htseq-count warning on chrM

    Hello,

    currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

    I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:
    Code:
    because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file
    (only related to chrM and rarely on some chrUn_*).

    But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

    Any hints on that?

    Thanks in advance,
    Oliver

  • #2
    Originally posted by ocs View Post
    Hello,

    currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

    I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:
    Code:
    because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file
    (only related to chrM and rarely on some chrUn_*).

    But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

    Any hints on that?

    Thanks in advance,
    Oliver
    Oliver,

    Tophat (or any alignment program) aligns reads to a reference sequence, typically provided as a FASTA file and indexed by the alignment program. The GFF file is a feature annotation file, describing features (e.g. genes) relative to the reference sequence. TopHat aligned your reads to the reference genome provided by the FASTA file which included the sequences for chrM and chrUn_. However the GFF file associated with this genome contains no feature annotations for these sequences. This is a fairly typical occurrence with TopHat/htseq-count. It is safe to ignore these warnings.

    Comment


    • #3
      Ok, thanks, that makes sense. I got there something wrong ...

      Comment


      • #4
        If your DNA sample really contains DNA from the mitochrondria, you ought to put that in the fasta, so those reads can align to where they really belong. If you leave it out, the aligner might wrongly place those reads somewhere else, which will mess up the accuracy of your alignment.

        Probably, the software is just trying to warn you about something it thinks is strange. It's trying to warn you that you might have a messed up GFF, because it has no gene annotation for one of your chromosomes. But if that's how its supposed to be, then you should just carry on, because you know better than the computer.

        If you dont like that warning, you could always filter the bam so that the chrM reads are filtered away. If you have no GFF annotation for that chromosome, you don't need them for anything.

        Comment


        • #5
          Actually the mitochondrial genome is very important if you are doing human mutation screening. There are a lot of hereditary mitochondrial diseases. Mutations in mitochondria DNA have also been reported in cancer.

          Comment


          • #6
            For some reason I thought that when I provide an annotation file to tophat that it only aligns to the annotated genes (and left out the fact that it only cares about the splice junctions). Actually that makes no sense, my mind messed there something up ;-)
            Thanks to all for your answers!

            Comment


            • #7
              Hi ocs,

              Did you use the latest version of cufflinks? And you still see genes on chrM? I used the latest version of cufflinks, I want to have the genes on chrM, but the genes are all gone. I don't know why?

              Comment


              • #8
                I had been using cufflinks version 1.1.0 and TopHat version 1.3.3 and I have reads aligned to chrM but there are no annotated genes. Which reference file did you use?
                Last edited by ocs; 11-02-2011, 10:22 AM. Reason: Had to correct myself, I see no genes on chrM.

                Comment


                • #9
                  Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.

                  Comment


                  • #10
                    Are you using human genome reference too? How many genes on chrM you got?

                    Comment


                    • #11
                      Originally posted by arrchi View Post
                      Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.
                      Hello arrchi,

                      first, you have to be more specific. There are several reference genome files for the human genome (Ensembl, NCBI, UCSC: http://tophat.cbcb.umd.edu/igenomes.html). I used hg19 (UCSC) which I mentioned in my very first post. I also used the annotation file from the iGenome (genes.gtf). Which one did you use and which annotation?

                      Second, I don't know what you actually want - you want to have genes on chrM and then you have genes on chrM?

                      If you have a look at the annotation file (in my case genes.gtf from the iGenome hg19 package) you will find no annotated genes on chrM:
                      Code:
                      $ grep chrM genes.gtf | wc -l
                      0

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      27 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      26 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X