Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • issues found in using cufflinks/cuffcompare/cuffdiff

    I already wrote to Cole for the following questions. Post here in case others have the relevant questions (or answer).

    -----------------------
    Hi, Cole

    THanks first for the great tools you designed.

    I have found several issues during using the tools. I'm not sure it's reported by others already or not. Just let you know in case:

    1. Cufflinks -M mask.gtf seems not really mask reads mapped to mask.gtf file, when -G is set.
    I have checked the results from the following command:
    cufflinks -o \$HOME/scratch/HD_$inputfile/cufflinks_allknown -j 0.05 -p 8 -Q 0 -G $BOWTIE_INDEXES/hg19.gtf -M $BOWTIE_INDEXES/hg19.rRNA_MtRNA.gtf -r $BOWTIE_INDEXES/hg19.fa \$HOME/scratch/HD_$inputfile/tophat.mm2.accepted_hits.hg19.sam

    In the output file ../cufflinks_allknown/transcripts.gtf, I still can see rRNA and mitochondria RNAs there. Shouldn't they be masked out from the output gtf when -M is set?

    2. What's the suggested parameters for detecting novel transcripts using cufflinks?

    I think detecting novel transcripts is one of the popular tasks for RAN-seq processing. But I found only very short "transcripts" were assembled by cufflinks if without setting option -G. I compared the output from cufflinks (without -G) with annotated transcript in UCSC, and it's obviously seen that only a small fraction of reads are used to assemble the 'transcripts' (and they are shorter than the annotated ones). OK, if I switched to use -G in cufflinks, the output will of course show FPKM for each annotated transcript in the GTF; in this case, I cannot know how many reads are used to calculated that FPKM value (I can't find way to track this. Educate me if there is a way, please) So, by comparing these two results, I have an intuitive concern: even I use -G option, it might use only a small fraction of reads for calculating the FPKM? Can the fraction of used reads be increased if I turn on the -G option? I am slightly doubting this

    3. Option of "-o <outprefix>" in Cuffcompare should not contain '.' in the outprefix path.

    For example, if I set "cuffcompare -o /home/projects/HD.neuro/results/cuffcompare ...", the output of cuffcompare will be output to
    /home/projects/HD.combined.gtf
    /home/projects/HD.loci
    /home/projects/HD.tracking

    I guess this is because you wrongly stop until the first '.' in the path.

    4. How is the Test status NOTEST decided?

    In my cuffdiff output file gene_exp.diff, I saw many comparisons with test status NOTEST. It says "not enough alignments for testing" in the manual, but in my result file, I saw some genes with comparable FPKM values in both samples, but they are marked as NOTEST; while other genes with 0 FPKM in both sample, and they are marked as OK (e.g. the 2nd record in below example)

    Another confusion for the 'yes/no' significant column, It says "depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing". However, for cases with 0 uncorrected p-value, some are marked as 'yes', some are marked as 'no'. I am confused which one I should use to select differentially expression genes. The last column ?

    $ tail gene_exp.diff
    XLOC_048056 J01415.24 chrM:8293-8364 HD_B4430 HD_B4189 NOTEST 0 0 0 0 1 no
    XLOC_048057 J01415.19,J01415.2,J01415.21,J01415.23,J01415.5,J01415.8,MT-ATP6,MT-CO3,MT-ND3,MT-ND4,MT-ND4L,MT-ND5 chrM:8364-14742 HD_B4430 HD_B4189 OK 0 0 0 0 1no
    XLOC_048058 J01415.15,MT-CYB chrM:14745-15953 HD_B4430 HD_B4189 NOTEST 0 0 0 0 1 no
    XLOC_048059 J01415.22 chrM:3305-4400 HD_B4430 HD_B4189 NOTEST 3695.47 2047.85 -0.590317 21.4283 0 no
    XLOC_048060 J01415.1 chrM:5585-5655 HD_B4430 HD_B4189 NOTEST 1949.56 1234.71 -0.456771 12.5587 0 no
    XLOC_048061 J01415.18 chrM:5655-5729 HD_B4430 HD_B4189 NOTEST 970.36 667.756 -0.373745 7.43324 1.05915e-13 no
    XLOC_048062 J01415.11,J01415.14 chrM:5759-5891 HD_B4430 HD_B4189 OK 7858.87 5882.96 -0.289583 16.7969 0 yes
    XLOC_048063 J01415.13 chrM:5902-7514 HD_B4430 HD_B4189 OK 14129.9 10391.4 -0.30732 23.7808 0 yes
    XLOC_048064 J01415.16,MT-ND6 chrM:8364-14742 HD_B4430 HD_B4189 OK 55722.5 25892.8 -0.766417 77.8562 0 yes
    XLOC_048065 J01415.20 chrM:15954-16023 HD_B4430 HD_B4189 OK 14668.5 5896.61 -0.911327 59.1021 0 yes

    So many questions, I know. Sorry for that.

    Regards,

    Xianjun

  • #2
    I hope Cole have time for these questions

    Comment


    • #3
      annotated rRNA/MtRNA .gtf file

      Greetings sterding-

      I noticed that you have an annotated human rRNA_MtRNA file. I am trying to acquire such a file so that I may use the -M option when running cufflinks. Could you please advise on how to acquire/generate the file? Any assistance you could offer would be greatly appreciated.

      Many thanks

      Comment


      • #4
        I simply extracted the record with keywords 'rRNA' or 'Mt' from hg19.gtf file by using command 'grep'.

        Comment


        • #5
          Many, many, thanks sterding for your quick reply! Works great!

          Comment


          • #6
            Hi sterding,

            Is it better you using -g instead of -G in your situation?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Advancing Precision Medicine for Rare Diseases in Children
              by seqadmin




              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
              12-16-2024, 07:57 AM
            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin



              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has seen remarkable advancements,...
              12-02-2024, 01:49 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-17-2024, 10:28 AM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-13-2024, 08:24 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-12-2024, 07:41 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-11-2024, 07:45 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Working...
            X