Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differential expression using TopHat/Cufflinks, analysis by CumeRbund

    I am trying to do differential expression analysis for 6 samples with no replicates

    The protocol used by me after following several threads and http://www.nature.com/nprot/journal/....2012.016.html
    is as follows

    1) after steps like alignment, etc

    2) I assembled transcripts for each sample
    cufflinks -p 8 -o <output directory> accepted_hits.bam

    5) Ran Cuffmerge to create a single merged transcriptome annotation:
    cuffmerge -g Homo_sapiens.GRCh37.75.gtf -s Homo_sapiens.GRCh37.75.dna.primary_assembly.fa -p 8 assemblies.txt

    6) ran cuffdiff using following options:
    --library-type fr-unstranded
    --dispersion-method blind -L C1,C2,C3,C4,C5,C6
    -b ./index/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
    -u ./merged_asm/merged.gtf

    7) used CumeRbund for analysis


    However, I was unable to get Ensemble id and instead got XLOC id for the genes.


    following the thread http://seqanswers.com/forums/showthread.php?t=18357
    I tried following two options

    1) featureNames(sigGenes)

    tracking_id gene_short_name
    1 XLOC_002638 HES4,RP11-54O7.17
    2 XLOC_005270 <NA>
    3 XLOC_005288 <NA>
    4 XLOC_005368 <NA>
    5 XLOC_007367 RP11-47A8.5
    6 XLOC_007664 <NA>
    7 XLOC_007703 <NA>

    But the names for some of XLOC were missing

    2) then i tired other method and i got error

    > names<-featureNames(sigGenes)
    > row.names(names)=names$tracking_id
    > sigGenesNames <-as.matrix(names)
    > sigGenesNames <- sigGenesNames [,-1]
    > sigGenesData<-diffData(sigGenes)
    > row.names(sigGenesData)= sigGenesData$gene_id
    Error in `row.names<-.data.frame`(`*tmp*`, value = c("XLOC_002638", "XLOC_002638", :
    duplicate 'row.names' are not allowed
    In addition: Warning message:
    non-unique values when setting 'row.names': ‘XLOC_002638’, ‘XLOC_005270’, [... truncated]


    Hence, I started to look into the ways to get Ensemble ids and the threads lead to lot of confusion and queries.

    1. should i use gtf file crated by cuffmerge or cuffcompare as input for cuffdiff ?

    2. while creating database using cummerbund should we use genome and gtf file?

    3. If the answer of 2nd question is yes, then should I use gtf from Ensemble or cuffmerge or cuffcomapre?

    4. how to get Ensemble id instead of XLOC id.

    5. Is there some error in my protocol?



    Thanks in advance.

  • #2
    Hello everybody,

    I have a line from genes.fpkm_tracking from cuffdiff output:

    XLOC_000001 - - XLOC_000001 NM_008866 TSS1 chr1:4807892-4846735 - - 18.3357 13.5137 23.1577 OK 18.0846 13.4121 22.7572 OK

    Here the aggregated control sample value is 18.3357

    However, from the 3 control transcript.gtf files I have these 3 values:

    ../Nsg1-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 16.1344 13.3363 18.9325 OK
    ../Nsg2-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 15.4661 12.8039 18.1283 OK
    ../Nsg3-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 13.0078 10.6329 15.3827 OK

    Here the aggregete value of 18.3 is larger than the individual values 16.1, 15.5, and 13.0. How does cuffdiff calculate the aggregate values from the 3 samples?
    Does it normalize? How does it do so?

    I've tried finding this in the Trapnell, 2012 Nature Protocol paper, but it doesn't say.

    Thanks, csmatyi

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    18 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    22 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    17 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    49 views
    0 likes
    Last Post seqadmin  
    Working...
    X