Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • issues of DE genes vs DE transcripts

    I’ve used DESeq, DESeq2 and edgeR for RNAseq DEG analysis (mapped to mouse transcriptome).

    Some little things are really annoying that I thought it should only happen with microarray in the old days.

    For example, I pulled out two RefseqID in my DEGs: NM_001025559 and NM_001025560 (with FDR < 0.05 from all three DESeq, DESeq2 and edgeR packages).
    After I updated them with MGI gene symbol, description, Ensembl gene ID and Entrez Gene ID, it turned out these two RefseqIDs mapped to exact the same MGI gene symbol, description, Ensembl gene ID and Entrez Gene ID.
    I went to NCBI and searched these two RefseqIDs manually and found that they are just two different transcript variants of the same gene.

    I knew for later network analysis, enrichment analysis and pathway analysis, mostly I will need a list of DE genes but not DE transcripts.
    What’s a reasonable way to deal wit this?

    Thanks for your suggestions.

  • #2
    Just quantitate over genes, rather than transcripts. This is simplest with Ensembl's annotation files.

    Comment


    • #3
      Originally posted by dpryan View Post
      Just quantitate over genes, rather than transcripts. This is simplest with Ensembl's annotation files.
      Thanks for your reply. Could you shed more light? Do you mean I should use Ensembl annotation file for my reference genome/transcriptome?
      At which step were you suggesting to change?

      I used the UCSC file refMrna.fa as reference transcriptome.
      Then I used bwa for alignment and a perl script to count the reads.

      I finally used the biomaRt package to update my refseqID to MGI symbol, etc.

      useDataset("mmusculus_gene_ensembl",mart=ensembl)

      Thanks.

      Comment


      • #4
        Ah, ditch UCSC and transcriptome alignments. The best method for RNAseq data is to use STAR or HISAT (or tophat2 if you enjoy wasting time) and align to the genome. These tools can be supplied with an annotation file (GTF or GFF format). The resulting SAM/BAM file can then be processed with featureCounts to produce gene-level counts. This is the process I personally use for my mouse datasets and it works quite well. I recommend Ensembl's reference sequence and annotation files, they're more convenient than UCSC's.

        Comment


        • #5
          I thought STAR is adapted to align long reads. Mine are short reads. I guess I might be wrong.
          Regarding the Ensembl reference genome/transcriptome for mouse RNAseq, is the Mus_musculus.GRCm38.79.gtf.gz the right one to use for now?

          Thanks.

          Comment


          • #6
            STAR work great with short reads, even small RNAs (e.g. miRNAs).

            Edit: Yes, that's the correct file. Get the fasta file too, since chromosome names differ between Ensembl and UCSC.

            Comment


            • #7
              Originally posted by dpryan View Post
              STAR work great with short reads, even small RNAs (e.g. miRNAs).

              Edit: Yes, that's the correct file. Get the fasta file too, since chromosome names differ between Ensembl and UCSC.
              Thanks dpryan. I finally did it with STAR and featurecounts.
              I have some following questions posted with a different topic.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              33 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X