Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by fangquan View Post
    Hi Dario,

    You are right. But if you don't go through compare step, you are still able to get some results from cuffdiff like this:

    Performed 3204 isoform-level transcription difference tests
    Performed 0 tss-level transcription difference tests
    Performed 3179 gene-level transcription difference tests
    Performed 0 CDS-level transcription difference tests
    Performed 0 splicing tests
    Performed 0 promoter preference tests
    Performing 0 relative CDS output tests


    It's no surprise there are some zero files because "Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use."


    fangquan
    Hi, fangquan.
    Could you tell me how you solve the problem. I am facing almost the same puzzle. I used merged.gtf from cuffmerge and combined.gtf from cuffcompare as input alternatively, but cuffdiff performed 0 splicing/promoter preference /relative CDS output tests all the time. Thanks.

    Comment


    • #32
      Hi all,

      I have run tophat2/cufflinks2.1.1/cuffmerge successfully. But when I run cuffdiff2 with merged gtf file, all the output *fpkm_tracking files have zero rpkm value, and the message of cuffdiff2 is:

      Code:
      [11:41:15] Loading reference annotation and sequence.
      Warning: No conditions are replicated, switching to 'blind' dispersion method
      [11:42:42] Inspecting maps and determining fragment length distributions.
      Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
      Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
      [11:53:13] Modeling fragment count overdispersion.
      > Map Properties:
      >       Normalized Map Mass: 0.50
      >       Raw Map Mass: 0.12
      >       Number of Multi-Reads: 70 (with 71 total hits)
      >       Fragment Length Distribution: Truncated Gaussian (default)
      >                     Default Mean: 200
      >                  Default Std Dev: 80
      > Map Properties:
      >       Normalized Map Mass: 0.50
      >       Raw Map Mass: 1.00
      >       Number of Multi-Reads: 154 (with 158 total hits)
      >       Fragment Length Distribution: Truncated Gaussian (default)
      >                     Default Mean: 200
      >                  Default Std Dev: 80
      [11:55:34] Calculating preliminary abundance estimates
      I have test the gtf file produced by cuffcompare, the results same. Could anyone tell me the reason?

      Thank you.

      Comment


      • #33
        I have found the reason for this problem. Because the coordination in the bam
        files is not consistent with the gtf.

        Thank you.

        Originally posted by pengchy View Post
        Hi all,

        I have run tophat2/cufflinks2.1.1/cuffmerge successfully. But when I run cuffdiff2 with merged gtf file, all the output *fpkm_tracking files have zero rpkm value, and the message of cuffdiff2 is:

        Code:
        [11:41:15] Loading reference annotation and sequence.
        Warning: No conditions are replicated, switching to 'blind' dispersion method
        [11:42:42] Inspecting maps and determining fragment length distributions.
        Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
        Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
        [11:53:13] Modeling fragment count overdispersion.
        > Map Properties:
        >       Normalized Map Mass: 0.50
        >       Raw Map Mass: 0.12
        >       Number of Multi-Reads: 70 (with 71 total hits)
        >       Fragment Length Distribution: Truncated Gaussian (default)
        >                     Default Mean: 200
        >                  Default Std Dev: 80
        > Map Properties:
        >       Normalized Map Mass: 0.50
        >       Raw Map Mass: 1.00
        >       Number of Multi-Reads: 154 (with 158 total hits)
        >       Fragment Length Distribution: Truncated Gaussian (default)
        >                     Default Mean: 200
        >                  Default Std Dev: 80
        [11:55:34] Calculating preliminary abundance estimates
        I have test the gtf file produced by cuffcompare, the results same. Could anyone tell me the reason?

        Thank you.

        Comment


        • #34
          So, does it mean that all reference files must be from same origin (ensembl or UCSC) ?
          Is it okay to ignore this warning (Warning: No conditions are replicated, switching to 'blind' dispersion method) and just let the cuffdiff continue. What impact will it give ?
          I have ignored Warning and my cuffdiff is finished. I have everything in my data such as genes and diff expression data without error. I used all ensemble ref but still got the same error ?

          Expecting your kind reply..

          Thank you.


          Originally posted by pengchy View Post
          I have found the reason for this problem. Because the coordination in the bam
          files is not consistent with the gtf.

          Thank you.

          Comment


          • #35
            If the gtf file is incompatible you will know, as your cufflinks output will show "0 FPKM" for every gene. As mentioned in the posts above, get it from igenomes just to be safe.

            If cuffdiff went back to the 'blind' setting that means that it is assuming you only have 1 replicate per treatment, and

            "This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment."

            Check the last paragraph on the cufflinks manual.

            If you have more than 1 replicate but it is still running it blind, could be that you didn't comma separate your replicates correctly.

            Hope this helps

            Comment


            • #36
              Thank you so much for your expert comments.
              I have some confusions/questions but searching the answers in previous posts. I will drop my questions here, if I can not find the answers.

              However, there is something which I like to ask you: I am sorry if this is too much disturbing you but I really need move on. Please answer if possible. Thank you:
              1. After the tophat alignment, I run cufflinks using tophat produced .bam file and then cufflinks stated "Warning: doesnt appear to be a .bam file, trying .sam...OK.." then it continued. Do you think this might has something to do with cuffdiff going to blind ?
              2. can you check please these cuffdiff; I used igenome(Ensemble) ref.
              Warning: couldn't find fasta record for 'HSCHR9_3_CTG35'!
              This contig will not be bias corrected.
              Warning: No conditions are replicated, switching to 'blind' dispersion method
              [17:12:12] Inspecting maps and determining fragment length distributions.
              [17:25:54] Modeling fragment count overdispersion.
              > Map Properties:
              > Normalized Map Mass: 21977740.46
              > Raw Map Mass: 23001324.33
              > Number of Multi-Reads: 493145 (with 1171488 total hits)
              > Fragment Length Distribution: Empirical (learned)
              > Estimated Mean: 233.43
              > Estimated Std Dev: 32.95
              > Map Properties:
              > Normalized Map Mass: 21977740.46
              > Raw Map Mass: 20859001.11
              > Number of Multi-Reads: 430276 (with 1094508 total hits)
              > Fragment Length Distribution: Empirical (learned)
              > Estimated Mean: 242.99
              > Estimated Std Dev: 19.76
              [17:27:41] Calculating preliminary abundance estimates
              > Processed 38664 loci. [*************************] 100%
              [19:01:04] Learning bias parameters.
              [19:24:10] Testing for differential expression and regulation in locus.
              > Processed 38664 loci. [*************************] 100%
              Performed 61095 isoform-level transcription difference tests
              Performed 41310 tss-level transcription difference tests
              Performed 18315 gene-level transcription difference tests
              Performed 28507 CDS-level transcription difference tests
              Performed 0 splicing tests
              Performed 0 promoter preference tests
              Performing 0 relative CDS output tests
              Writing isoform-level FPKM tracking
              Writing TSS group-level FPKM tracking
              Writing gene-level FPKM tracking
              Writing CDS-level FPKM tracking
              Writing isoform-level count tracking
              Writing TSS group-level count tracking
              Writing gene-level count tracking
              Writing CDS-level count tracking
              Writing isoform-level read group tracking
              Writing TSS group-level read group tracking
              Writing gene-level read group tracking
              Writing CDS-level read group tracking
              Writing read group info
              Writing run info

              3. For the cuffdiff of 5 samples,
              3.1 without merging:-
              CuffSet instance with:
              5 samples
              62149 genes
              273794 isoforms
              146887 TSS
              82429 CDS
              621490 promoters
              1468870 splicing
              192450 relCDS
              diff_expressed_gene_significant: 3183
              3.2 divided the data into 2 categories. First category, merged two .gtf (1 + 2) and, in second, three .gtf (3+4+5). Then run cuffdiff and got following details from cummeRbund:
              CuffSet instance with:
              2 samples
              62149 genes
              273794 isoforms
              146887 TSS
              82429 CDS
              62149 promoters
              146887 splicing
              19245 relCDS
              diff_expressed_gene_significant: 95
              (FPKM expression plot Image attached)
              does it indicate good cuffdiff process by your experience (even though used blind method) ?
              4. I think, for 3.1 cuffdiff (1 replicate), and for 3.2 First (2 replicates) and second (3 replicates). It is right ?

              Thank you in advance with
              Originally posted by nsl View Post
              If the gtf file is incompatible you will know, as your cufflinks output will show "0 FPKM" for every gene. As mentioned in the posts above, get it from igenomes just to be safe.

              If cuffdiff went back to the 'blind' setting that means that it is assuming you only have 1 replicate per treatment, and

              "This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment."

              Check the last paragraph on the cufflinks manual.

              If you have more than 1 replicate but it is still running it blind, could be that you didn't comma separate your replicates correctly.

              Hope this helps
              Attached Files

              Comment


              • #37
                Hi Charitra,
                I'm learning on the job like many and am not an expert.
                1. isn't a problem. I've experienced it too. But am not sure why that msg pops up.
                2. Could it be that there is no FASTA record b/c it is a pseudo gene? Not sure on this.
                3. I am not quite sure about what you are asking as I don't know the design of your experiment. Nevertheless, when you run them as 5 separate samples you have 3183 differentially expressed genes. and this number is reduced drastically to 95 when you run it as replicates. This indicates that there is a lot of variation. You really need to have replicates to make any conclusions about your data.
                4. Do your barplots represent 2 different genes? Either way the FPKM values are low and I would not concentrate on genes with very low FPKMs ( unless of course you have a prior knowledge and have reason to). Also, the error overlap terribly, so there is no significance.

                hope this help

                Comment


                • #38
                  Dear nls
                  Thank you so much for your expert comments . I got the point and thank you again for your help.
                  I like to write details on your comments no. 3. and 4. :
                  3. My first two sample (1. and 2.) are of sensitive group, so, I merged them. Sample (3., 4. and 5.) are of resistant group, so, i merged them. Now, I have two conditions, Sensitive vs Resistant. Thereafter, I run cuffdiff and got 93 diff genes. I got questions now:
                  a). Sensitive and Resistant have 2 and 3 replicates, respectively. It is true in this case ?
                  b). If the above condition is true (2 replicates in sensitive and 3 in resistant), then should I put the replicate number in when running cuffdiff/cuffmerge because, (as you may remember, it was going to blind method) ?
                  c). does cuffmerge/cuffdiff consider replicates automatically and switch to blind (Warning: No conditions are replicated, switching to 'blind' dispersion method) Or a command must be provided indicating number of replicates ?
                  4. In the attachment, ID XLOC_006036 is cuffdiff ID because cuffdiff does not give name of the gene. So, it is a single gene named CYP2C9 with cuffdiff ID XLOC_006036. How much FPKM value would you consider considered good enough or very low to count diff expression, just your point of view / experience ?

                  the most important question for me is, I think there are not enough replicates as it should be 3 at least and now the experiments are already done. Is there any way to get something out of these data which can be significant ? what would you like to recommend ?

                  Thank you in advance.


                  Originally posted by nsl View Post
                  Hi Charitra,
                  I'm learning on the job like many and am not an expert.
                  1. isn't a problem. I've experienced it too. But am not sure why that msg pops up.
                  2. Could it be that there is no FASTA record b/c it is a pseudo gene? Not sure on this.
                  3. I am not quite sure about what you are asking as I don't know the design of your experiment. Nevertheless, when you run them as 5 separate samples you have 3183 differentially expressed genes. and this number is reduced drastically to 95 when you run it as replicates. This indicates that there is a lot of variation. You really need to have replicates to make any conclusions about your data.
                  4. Do your barplots represent 2 different genes? Either way the FPKM values are low and I would not concentrate on genes with very low FPKMs ( unless of course you have a prior knowledge and have reason to). Also, the error overlap terribly, so there is no significance.

                  hope this help

                  Comment


                  • #39
                    Please somebody give me answer of my problem.
                    My RNAseq (PE) was conducted for 2 samples (antibiotic resistant and sensitive) without thinking of replication.
                    Is it possible to publish the differential gene, splicing in the journal. Most of the researcher said it is not possible
                    I want answer from this forum. What it is you think I should do .....?
                    Many thanks

                    Comment


                    • #40
                      jp,

                      I'm afraid that is fact. no replication would not allow you a stand alone publication

                      Comment


                      • #41
                        One more thing,
                        what about, if I try to get duplicates (1 more seq for each of two, biological replicate), duplicates will be okay as minimum or not ?


                        Originally posted by nsl View Post
                        jp,

                        I'm afraid that is fact. no replication would not allow you a stand alone publication

                        Comment


                        • #42
                          jp,

                          I've been dealing with ngs data for a short 3 yrs and not an expert. I started in 2010 with 1 replicate and after being exposed to the seqanswer and other bioinformatics communities realized the folly of my ways...we would never rely on no replication for bench work and same goes for this stuff. I went on to have 4 replicates each, and did one set a at a different time. I see quite a bit of variation in the samples that i did 6 months later. However, I am dealing with a very dynamic stage in development and variations may be showing the actual biology. long story short.... 2 reps better than 1. but also be mindful of the biology you are going after. cells, tissues, developmental stages can all show true variation at the rna level and the last thing you want is false positives due to library prep and sample handling.

                          Comment


                          • #43
                            Dear nls,
                            Thank you for your valuable advice. Your knowledge and experience is much higher than me. I really appreciate your help. However, it will be very kind of you, if you please answer few more of my questions below:
                            What is your opinion:
                            1. Which library size is better for human sample to study diff_exp, transcript discovery, splicing for PE seq Illumina (150bp or 50bp) (short / longer) or ..?
                            2. What if I for single cell sequencing ?
                            3. If single cell seq better than, can it be done on the same sequencer (PE Illumina 2000/2500) ?
                            4. If possible, plz write something about single cells vs normal PE sequencing differences in procedure (just few will be okay)
                            5. May I get your contact number so that I can call you with prior appointment. my e-mail id (med dot rdgmc at g mail dot com)
                            I have read enough but get confusion always, your opinion will help me a lot.
                            My english is not good enough..sorry

                            Thanks in advance



                            Originally posted by nsl View Post
                            jp,

                            I've been dealing with ngs data for a short 3 yrs and not an expert. I started in 2010 with 1 replicate and after being exposed to the seqanswer and other bioinformatics communities realized the folly of my ways...we would never rely on no replication for bench work and same goes for this stuff. I went on to have 4 replicates each, and did one set a at a different time. I see quite a bit of variation in the samples that i did 6 months later. However, I am dealing with a very dynamic stage in development and variations may be showing the actual biology. long story short.... 2 reps better than 1. but also be mindful of the biology you are going after. cells, tissues, developmental stages can all show true variation at the rna level and the last thing you want is false positives due to library prep and sample handling.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            30 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            32 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            53 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X