Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEG analysis without gff/gtf file

    My goal is to see differentially expressed genes across different time points.
    However, I want to map allreads based solely on sequence and not on where they map to, because it is not certain whether my annotation of ghe genome is correct or complete. So I do not want to use an annotation.

    In this case, after running tophat without "-g option",
    what approaches could be used in the next step othar than HTSeq or cufflinks/cuffdiff?

    I have been told that cufflink/cuffdiff is not so powerful to see DEG, and have been advised to use HTSeq/EdgeR/DESeq. However, HTSeq requires GFF as an input file. So I need to take another approach. Would you please give me tips about what other programs could be used in my case?

    Thanks in advance.
    Last edited by syintel87; 01-07-2013, 09:06 AM.

  • #2
    Hi syintel87,

    I have been recently looking for a pipeline for RNA-Seq analysis and had the same doubt as you. As far as I know, in all cases (whether de novo assembly or reference-based mapping) you are going to need a GFF3/GTF file.

    Bernardo

    Comment


    • #3
      how to get DEG without gtf/gff?

      Is there a way to achieve my goal which is to see differentially expressed genes across different time points, without gff/gtf file?

      If I use the annotated file, reads will only map to annotated reads. This will exclude any reads that map to genes that have yet to be annotated.

      Comment


      • #4
        Originally posted by syintel87 View Post
        Is there a way to achieve my goal which is to see differentially expressed genes across different time points, without gff/gtf file?

        If I use the annotated file, reads will only map to annotated reads. This will exclude any reads that map to genes that have yet to be annotated.
        Well, at some point programs like rQuant, rDiff, DESeq or Cuffdiff are going to need a file with transcripts in order to quantify them in the *.bam files.

        Maybe there are other tools GFT/GFF3-independent that I still don't know.


        Bernardo

        Comment


        • #5
          Even if you do not use Cuffdiff for the DE analysis, you can run Cufflinks on your samples to get sample-specific .gtf files. These annotations (which can contain novel transcripts/genes) can be merged afterwards with a reference .gtf file that you prefer (e.g. Ensembl's) using Cuffmerge, and you can use the resulting merged .gtf file for the DESeq/edgeR analyses.

          Comment


          • #6
            Originally posted by adumitri View Post
            Even if you do not use Cuffdiff for the DE analysis, you can run Cufflinks on your samples to get sample-specific .gtf files. These annotations (which can contain novel transcripts/genes) can be merged afterwards with a reference .gtf file that you prefer (e.g. Ensembl's) using Cuffmerge, and you can use the resulting merged .gtf file for the DESeq/edgeR analyses.
            Oh!!! How helpful it is!!!
            Thank you so much!!!!!!!!!
            That GFF file is what I exactly want to have!!!

            Comment


            • #7
              Originally posted by syintel87 View Post
              Is there a way to achieve my goal which is to see differentially expressed genes across different time points, without gff/gtf file?
              Well... it may sound silly but to identify *differentially expressed* genes you need to identify *genes*.
              Either you provide them as known data in the form of an annotation file (GTF/GFF/BED/etc) or you'll have to infer them from the reads, which is a very challenging task if you expect complete gene models. You typically get differentially expressed "genomic regions" -aka "transcribed fragments" (transfrags), "transcriptionally active regions" (TAR), etc and not complete "genes".

              As adumitri indicated you can use cufflinks (or BEDtools) to extract those transcribed regions from the mapped reads and merge them with some reference annotation so that you can probe known and unknown regions.
              I would just recommend to merge the reads from all the samples altogether -along with the reference annotation- so that the statistical method you choose next will consider the exact same set of regions across samples/conditions. You should then find differentially expressed regions. Now defining if two transcribed regions belong to the same gene/transcript is another question.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X