Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • quality value of fastq file and tophat

    Hi all,
    First forgive me if my question is trivial. I have an Illumina rna-seq dataset of two samples (the fastq files).
    I have mapped them using tophat and the result of mapping so satisfactory, like nearly 70% of the fragments are mapped and then I used cuffdiff in order to know just the differentially expressed genes like below:
    cuffdiff -c 0 -o output -p 12 -u /Genes/genes.gtf -L F,M sample1/accepted_hits.bam sample2/accepted_hits.bam

    But the problem is that there are too few significant differentially expressed genes in the gene_exp.diff result file and I know that the correct number is really more than this.
    according to my search through the net, some people believe it might be due to the inconsistency between the quality value version of my fastq flies and tophat version I have used. But I do not know how should I check it or can it be the real reason of this strange result or not?
    I have used tophat 1.4.0 and this is the quality value I have gained by applying "ShortRead"R package on my fastq file:
    > fastqs <- readFastq("sample1.fastq")
    > qualities <- quality (fastqs)
    > head(qualities)
    class: FastqQuality
    quality:
    A BStringSet instance of length 6
    width seq
    [1] 46 B@@BABA5!.68A<9>67:<.43540+7(7B%%%%%%%%%%%%%%%
    [2] 46 BCBA8<8@A5==9::A61?70;B=((/0:?)5:&0(:=5%%%%%%%
    [3] 46 A6AA7ABBBBAA;B=B%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [4] 46 BAA=>AABB=:BAA>>>%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [5] 46 BCBBBBA>62@B<0ABBA798-=2?82@?B%%%%%%%%%%%%%%%%
    [6] 46 BA@0@/,:B<AA;A7B=<3&5B59?.%%%%%%%%%%%%%%%%%%%%

    Any suggestion would be appreciated.

  • #2
    You might try quality trimming your reads. Those quality scores are phred+33, so the %'s at the ends of the reads are low quality and are probably decreasing mapping efficiency. Regarding the number of differentially regulated genes, there are a couple of comments. Firstly, with only two samples you would not expect to find a huge amount simply due to the inability to properly judge biological variability. Secondly, how do you know that the correct number is higher? The only way to know that would be to do the experiment that you're currently doing.

    Comment


    • #3
      Originally posted by dpryan View Post
      Firstly, with only two samples you would not expect to find a huge amount simply due to the inability to properly judge biological variability. Secondly, how do you know that the correct number is higher? The only way to know that would be to do the experiment that you're currently doing.
      Dear dpryan, first thank you so much for your always useful answers.
      Actually, I also have increased number of biological replicates up to 12 and still there is no significant differentially expressed genes. I know there should be around 480 differentially expressed genes because this dataset has been under analysis using R packages.

      Comment


      • #4
        Assuming you mean DESeq or edgeR by "R packages", I would tend to believe them over whatever cuffdiff tells you. You didn't mention what version of the cufflinks suite you're using, so perhaps try upgrading (or maybe downgrading) it. There have been a number of threads on here regarding the changes in output produced by various version of cufflinks.

        Oh, one note, I just noticed that you're using "-c 0". You're going to seriously decrease the number of DE genes by using that (more tests == crappier p-values for everything).
        Last edited by dpryan; 09-16-2012, 04:45 AM. Reason: mention -c 0 usage.

        Comment


        • #5
          Originally posted by dpryan View Post
          Assuming you mean DESeq or edgeR by "R packages", I would tend to believe them over whatever cuffdiff tells you. You didn't mention what version of the cufflinks suite you're using, so perhaps try upgrading (or maybe downgrading) it. There have been a number of threads on here regarding the changes in output produced by various version of cufflinks.
          .
          Yes the R packages are exactly what you mentioned.
          About the cufflinks version, first I used the latest version :2.0.2-beta
          and I gained only 8 DE genes with pvalue <0.01. But then after reading this thread :
          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

          I decided to use cufflinks/1.3.0 and I used -c 0 .
          So I will omit this -c 0 parameter but do you think still I should work with cufflinks/1.3.0 to be less conservative or not?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X