Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cuffcompare or cuffmerge for cuffdiff

    Hi ,all
    This is an old topic in our community.see here and here
    although C.Tapnell recommend cufflinks->cuffmerge->cuffdiff flow for diff exp analysis in hereand this new paper ,I must bring it again,beacause Too much confusion.

    I have 3 pair-end samples and hava two targets:
    [1] discovery new isoform and there structure
    [2] differential gene and transcript exp anlalysis and there structure
    tophat+cufflinks has no problem for 3 samples.

    for the this two aim.I use coffcompare analyze the transfrags which cufflinks assempled and cuffdiff analyze diff exp.

    one flow:
    cuffcompare -o compare -s genomic_seq.fa -r known.gtf tanscriptA.gtf transcriptB.gtf transcriptC.gtf
    cuffmerge -g known.gtf -s genomic_seq.fa 3_assembly_GTF_list.txt
    cuffdiff -o -b genomic_seq.fa -L A,B,C -u -p 6 merged.gtf A.bam B.bam C.bam
    I use cuffcompare,because cuffcompare output .refmap and tmap for each sample. I can extract every cuff_transcript's ref_gen and region from cufflinks result transcripts.gtf like this:
    for transcript ENSMUST00000048860
    IN sample A:
    Gene_name Transcript_id Class_code Cufflinks_transcript_id FPKM Coverage Transcript_length Ref_Transcript_length Chromosome Strand Start End Exon_num Exon_start-Exon_end;ditto
    Mreg ENSMUST00000048860 c Sample_A.442.1 9.998678 41.470300 243 2493 1 . 72205812 72206054 1 72205812-72206054;
    Mreg ENSMUST00000048860 = Sample_A.444.1 25.753304 108.941130 1695 2493 1 - 72206430 72258706 5 72206430-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258706;
    In sample B:
    Mreg ENSMUST00000048860 j Sample_B.478.1 0.355742 1.460597 1682 2493 1 - 72206370 72243058 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72243016-72243058;
    Mreg ENSMUST00000048860 = Sample_B.478.2 1.652110 6.783196 1742 2493 1 - 72206370 72258693 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
    and in compare.tracking
    TCONS_00001024 XLOC_000542 Mreg|ENSMUST00000048860 c q1:Sample_A.442|Sample_A.442.1|100|9.998678|4.225939|15.771418|41.470300|- - -
    TCONS_00001025 XLOC_000542 Mreg|ENSMUST00000048860 = q1:Sample_A.444|Sample_A.444.1|100|25.753304|24.064335|27.442272|108.941130|1695 q2:Sample_B.478|Sample_B.478.2|100|1.652110|1.194891|2.109330|6.783196|1742 -
    TCONS_00002413 XLOC_000542 Mreg|ENSMUST00000048860 j - q2:Sample_B.478|Sample_B.478.1|22|0.355742|0.086913|0.624571|1.460597|- -
    and this gene in cuffdiff result(treated):
    Tracking_id Gene_id Gene_name Class_code Nearest_ref_id TSS Locus Sample_1 Sample_2 FPKM_1 FPKM_2 Foldchange log2(fold_change) test_stat p_value q_value Significant
    TCONS_00004275 XLOC_001277 Mreg j ENSMUST00000048860 TSS2418 1:72205806-72258881 sample_A sample_B 8.99002 0.315128 0.0350531 -4.83432 3.98775 6.67042e-05 0.00727599 yes
    see if I foucus ENSMUST00000048860 due to cuffdiff result based foldchange.I need back compare result find this known transcript matched cufflinks assembled transcripts result to decide the assembled transcripts is known(class code = or c) or novel(class code j).
    But the cuffdiff id TCONS_00004275 is not same with cuffcompare TCONS_id and the Locus 1:72205806-72258881 also not same. This make me couldnot find interest ENSMUST00000048860's nearest structure in sample A and SampleB.IS Sample_A.442.1 or Sample_A.444.1 or other?


    so I change the workflow (without cuffmerge):

    cuffcompare -o compare -s genomic_seq.fa -r known.gtf tanscriptA.gtf transcriptB.gtf transcriptC.gtf
    cuffdiff -o -b genomic_seq.fa -L A,B,C -u -p 6 combined.gtfA.bam B.bam C.bam
    also use example ENSMUST00000048860
    for compare result(treated):
    IN sample A:
    Mreg ENSMUST00000048860 c Sample_A.443.1 9.998678 41.470300 243 2493 1 . 72205812 72206054 172205812-72206054;
    Mreg ENSMUST00000048860 = Sample_A.444.1 25.753304 108.941130 1695 2493 1 - 72206430 72258706 572206430-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258706;
    IN sample B:
    Mreg ENSMUST00000048860 j Sample_B.479.1 0.355742 1.460597 1682 2493 1 - 72206370 72243058 572206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72243016-72243058;
    Mreg ENSMUST00000048860 = Sample_B.479.2 1.652110 6.783196 1742 2493 1 - 72206370 72258693 572206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
    another confused,same cufflink+cuffcompare program but the cuff_id is diff ,Sample_A.442.1 Sample_A.444.1 with Sample_A.443.1 Sample_A.443.1 also in Sample_B

    in compare.tracking
    TCONS_00001024 XLOC_000542 Mreg|ENSMUST00000048860 c q1:Sample_A.443|Sample_A.443.1|100|9.998678|4.225939|15.771418|41.470300|- - -
    TCONS_00001025 XLOC_000542 Mreg|ENSMUST00000048860 = q1:Sample_A.444|Sample_A.444.1|100|25.753304|24.064335|27.442272|108.941130|1695 q2:Sample_B.479|Sample_B.479.2|100|1.652110|1.194891|2.109330|6.783196|1742 -
    TCONS_00002413 XLOC_000542 Mreg|ENSMUST00000048860 j - q2:Sample_B.479|Sample_B.479.1|22|0.355742|0.086913|0.624571|1.460597|- -
    TCONS_00003426 XLOC_000542 Mreg|ENSMUST00000048860 c - - q3:Sample_C.463|Sample_C.463.1|100|2.478294|1.853823|3.102766|10.598946|-
    TCONS_00003427 XLOC_000542 Mreg|ENSMUST00000048860 c - - q3:Sample_C.464|Sample_C.464.1|100|2.878927|1.557985|4.199870|11.712125|-
    this gene in cuffdiff result(treated):
    TCONS_00001025 XLOC_000542 Mreg = ENSMUST00000048860 TSS1655 1:72206327-72258693 sample_A sample_B 18.4693 1.30708 0.0707704 -3.82072 3.82424 0.000131174 0.0148275 yes

    the ENSMUST00000048860 TCONS_00001025 is same as one of comcompare TCONS_id and i konw it mapped Sample_A.444.1 and Sample_B.479.2. then i can find Sample_A.444.1 and Sample_B.479.2 structure
    Strand Start End Exon_num Exon_start-Exon_end;ditto
    - 72206430 72258706 5 72206430-72207593;72208896-72209059;72210646-72210736;72238617-- - 72238776;72258591-72258706;
    - 72206370 72258693 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
    Then i can do next analysis

    but from this two flow the cuffdiff result are very different about this trascript ENSMUST00000048860
    cuffdiff result(treated):
    Tracking_id Gene_id Gene_name Class_code Nearest_ref_id TSS Locus Sample_1 Sample_2 FPKM_1 FPKM_2 Foldchange log2(fold_change) test_stat p_value q_value Significant
    TCONS_00004275 XLOC_001277 Mreg j ENSMUST00000048860 TSS2418 1:72205806-72258881 sample_A sample_B 8.99002 0.315128 0.0350531 -4.83432 3.98775 6.67042e-05 0.00727599 yes
    TCONS_00001025 XLOC_000542 Mreg = ENSMUST00000048860 TSS1655 1:72206327-72258693 sample_A sample_B 18.4693 1.30708 0.0707704 -3.82072 3.82424 0.000131174 0.0148275 yes
    whatever class code,fpkm,foldchange,and also there are other diff between two pipeline. no same known transcrips in the two cuffdiff result.
    I want to know which cuffdiff result is more credible,and how workflow can meet the needs of my analysis.

    Thanks
    Shen
    Last edited by upper; 05-08-2012, 12:39 AM.

  • #2
    Hi Shen,

    when you use cuffmerge, do you see any skipped regions with mouse?
    Have you checked the transcript lengths of the new assembly?
    I have seen in my output extremely long merged genes which were in fact severel different refseq IDs.

    Comment


    • #3
      Originally posted by Kcornelius View Post
      Hi Shen,

      when you use cuffmerge, do you see any skipped regions with mouse?
      Have you checked the transcript lengths of the new assembly?
      I have seen in my output extremely long merged genes which were in fact severel different refseq IDs.
      Hi Kcornelius,

      you mean the merged.gtf that cuffmerge output?
      I check some transcript, but almost in know region. can you show a example.

      Comment


      • #4
        Sure,

        I have posted one in a related thread:

        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        Comment


        • #5
          Originally posted by Kcornelius View Post
          Sure,

          I have posted one in a related thread:

          http://seqanswers.com/forums/showthread.php?t=19533
          Hi Kcornelius,

          Yes, I found so long transcripts and it seems multiple loci are merged into a single locus when using cuffmerge!!

          I am totally confused whether to use cuffmerge or cuffcompare to merge assemblies from different experiemntal conditions.

          Comment


          • #6
            I think one alternative method is merging all the bam together, and run cufflinks once.
            Or, do in silico normalization for the large fastq files befor running tophat.

            Comment


            • #7
              Originally posted by ravipatel4 View Post
              Hi Kcornelius,

              Yes, I found so long transcripts and it seems multiple loci are merged into a single locus when using cuffmerge!!

              I am totally confused whether to use cuffmerge or cuffcompare to merge assemblies from different experiemntal conditions.
              Did you get the long merged loci even after specifying a reference gff to cuffmerge and using also reference based assemblies for cufflinks (providing -g gff to cufflinks)?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X