Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • narges
    Member
    • Aug 2012
    • 29

    quality value of fastq file and tophat

    Hi all,
    First forgive me if my question is trivial. I have an Illumina rna-seq dataset of two samples (the fastq files).
    I have mapped them using tophat and the result of mapping so satisfactory, like nearly 70% of the fragments are mapped and then I used cuffdiff in order to know just the differentially expressed genes like below:
    cuffdiff -c 0 -o output -p 12 -u /Genes/genes.gtf -L F,M sample1/accepted_hits.bam sample2/accepted_hits.bam

    But the problem is that there are too few significant differentially expressed genes in the gene_exp.diff result file and I know that the correct number is really more than this.
    according to my search through the net, some people believe it might be due to the inconsistency between the quality value version of my fastq flies and tophat version I have used. But I do not know how should I check it or can it be the real reason of this strange result or not?
    I have used tophat 1.4.0 and this is the quality value I have gained by applying "ShortRead"R package on my fastq file:
    > fastqs <- readFastq("sample1.fastq")
    > qualities <- quality (fastqs)
    > head(qualities)
    class: FastqQuality
    quality:
    A BStringSet instance of length 6
    width seq
    [1] 46 B@@BABA5!.68A<9>67:<.43540+7(7B%%%%%%%%%%%%%%%
    [2] 46 BCBA8<8@A5==9::A61?70;B=((/0:?)5:&0(:=5%%%%%%%
    [3] 46 A6AA7ABBBBAA;B=B%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [4] 46 BAA=>AABB=:BAA>>>%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [5] 46 BCBBBBA>62@B<0ABBA798-=2?82@?B%%%%%%%%%%%%%%%%
    [6] 46 BA@0@/,:B<AA;A7B=<3&5B59?.%%%%%%%%%%%%%%%%%%%%

    Any suggestion would be appreciated.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    You might try quality trimming your reads. Those quality scores are phred+33, so the %'s at the ends of the reads are low quality and are probably decreasing mapping efficiency. Regarding the number of differentially regulated genes, there are a couple of comments. Firstly, with only two samples you would not expect to find a huge amount simply due to the inability to properly judge biological variability. Secondly, how do you know that the correct number is higher? The only way to know that would be to do the experiment that you're currently doing.

    Comment

    • narges
      Member
      • Aug 2012
      • 29

      #3
      Originally posted by dpryan View Post
      Firstly, with only two samples you would not expect to find a huge amount simply due to the inability to properly judge biological variability. Secondly, how do you know that the correct number is higher? The only way to know that would be to do the experiment that you're currently doing.
      Dear dpryan, first thank you so much for your always useful answers.
      Actually, I also have increased number of biological replicates up to 12 and still there is no significant differentially expressed genes. I know there should be around 480 differentially expressed genes because this dataset has been under analysis using R packages.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Assuming you mean DESeq or edgeR by "R packages", I would tend to believe them over whatever cuffdiff tells you. You didn't mention what version of the cufflinks suite you're using, so perhaps try upgrading (or maybe downgrading) it. There have been a number of threads on here regarding the changes in output produced by various version of cufflinks.

        Oh, one note, I just noticed that you're using "-c 0". You're going to seriously decrease the number of DE genes by using that (more tests == crappier p-values for everything).
        Last edited by dpryan; 09-16-2012, 04:45 AM. Reason: mention -c 0 usage.

        Comment

        • narges
          Member
          • Aug 2012
          • 29

          #5
          Originally posted by dpryan View Post
          Assuming you mean DESeq or edgeR by "R packages", I would tend to believe them over whatever cuffdiff tells you. You didn't mention what version of the cufflinks suite you're using, so perhaps try upgrading (or maybe downgrading) it. There have been a number of threads on here regarding the changes in output produced by various version of cufflinks.
          .
          Yes the R packages are exactly what you mentioned.
          About the cufflinks version, first I used the latest version :2.0.2-beta
          and I gained only 8 DE genes with pvalue <0.01. But then after reading this thread :
          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

          I decided to use cufflinks/1.3.0 and I used -c 0 .
          So I will omit this -c 0 parameter but do you think still I should work with cufflinks/1.3.0 to be less conservative or not?

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          23 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          28 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          22 views
          0 reactions
          Last Post SEQadmin2  
          Working...