Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CuffDiff Question

    Here's a line from a CuffDiff output (v 1.2.0) where I had used tophat and the -g (RABT option) and supplied an annotation gtf file.

    TCONS_00039273 XLOC_029459 Rpl10 X:160336635-160338842 525L1C2 614L5C2 NOTEST 0 0 0 0 1 1 no

    in the gene column Rpl10 is listed and no test was performed.

    However when I examine the lines which record significant tests I see an entry

    TCONS_00348425 XLOC_081856 - chrX:160336617-160339570 429L2S2 614L5C2 OK 0.641861 11.5201 4.16575 -7.16177 7.96474e-13 2.90794e-09 yes

    In this case a region chrX:160336617-160339570 was tested and IS significant.

    chrX:160336617-160339570 Significant
    X:160336635-160338842 NOTEST

    The NOTEST is nested within the transcript range that CuffDiff thinks IS significant.

    Anyone care to explain this? If allowed to select its own regions it can find significant expression differences. But cuffdiff fails score any significant differences for any region with a gene column. If I use tophat input with a -G option and force it to use a know set of transcripts, I get no significant differences. This is consistent with the -g behavior in that the named regions do not test significant.

    Second part what's the best way to see if all my CuffDiff significant regions actually overlap known genes. Obviously I can use a Graphics viewer for one or two but I have thousands to examine. Is there open source code that will do this?

    S

  • #2
    Hello,

    You may want to check that your chromosome names are the same in both your GTF file and the genome fasta you mapped to. It looks like you have both chrX and just plain X (unless you were just abbreviating to save time). The software may not think that your two regions overlap at all, because it thinks they are on two separate contigs.

    Comment


    • #3
      awk editing, and bowtie indexing not really working to help

      I looked at my ebwt index with bowtie and indeed it uses "chr" prefixes for all
      chromosomes. (eg chr1,chrX,chrM,chrUn,,,, etc)

      The ENSEMBL GTF file references all chromosomes without the "chr" prefix
      (eg 1,X,M, Un,,,, etc).

      I used an awk script to alter the ENSEMBL GTF by adding "chr" to every entry.
      This causes CUFFLINKS to fail to read the loci. I was not certain that my awk script preserved tab-delimted spacing so I passed the file through MS EXCEL (saving as tab delimited text) and tried again but this time cufflinks would not read the GTF file.

      I downloaded RAT chromosomes, changed the chromosome names from
      chrX, chr1 etc to X, 1 and tried to build a new index. Trouble was, the non-numeric chromosome names were transcribed to numerals ( am working with rat genome so I have chr1-20 plus X,M and Un. My bowtiebuild
      with the modified chr names gave me index names 1-20 but then "created"
      "0" (zero), 21, 22 rather than X, M and Un.

      How do I make an index which has all my chromosomes but calls them
      1-20,M,X,Un?

      Comment


      • #4
        Try downloading the Ensembl reference tarball from the cufflinks website. I have used the UCSC version for the entire pipeline and have not run into any issues. The directory includes bowtie indexes for tophat as well as the gtf files you need and the whole genome reference file.
        Linky: http://cufflinks.cbcb.umd.edu/igenomes.html

        Comment


        • #5
          Perfect. Thanks.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X