![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq | byb121 | Bioinformatics | 6 | 12-20-2013 02:26 AM |
Problem with quality in fastq file | ThePresident | Bioinformatics | 11 | 08-10-2012 05:54 AM |
Ideas on collecting quality scores per base in an illumina fastq file | brachysclereid | Bioinformatics | 11 | 12-05-2011 02:00 PM |
mapping quality score in tophat sam file | jjpurwar | Bioinformatics | 5 | 06-22-2011 12:55 PM |
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? | jjw14 | Illumina/Solexa | 2 | 06-01-2010 05:35 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Finland Join Date: Aug 2012
Posts: 29
|
![]()
Hi all,
First forgive me if my question is trivial. I have an Illumina rna-seq dataset of two samples (the fastq files). I have mapped them using tophat and the result of mapping so satisfactory, like nearly 70% of the fragments are mapped and then I used cuffdiff in order to know just the differentially expressed genes like below: cuffdiff -c 0 -o output -p 12 -u /Genes/genes.gtf -L F,M sample1/accepted_hits.bam sample2/accepted_hits.bam But the problem is that there are too few significant differentially expressed genes in the gene_exp.diff result file and I know that the correct number is really more than this. according to my search through the net, some people believe it might be due to the inconsistency between the quality value version of my fastq flies and tophat version I have used. But I do not know how should I check it or can it be the real reason of this strange result or not? I have used tophat 1.4.0 and this is the quality value I have gained by applying "ShortRead"R package on my fastq file: > fastqs <- readFastq("sample1.fastq") > qualities <- quality (fastqs) > head(qualities) class: FastqQuality quality: A BStringSet instance of length 6 width seq [1] 46 B@@BABA5!.68A<9>67:<.43540+7(7B%%%%%%%%%%%%%%% [2] 46 BCBA8<8@A5==9::A61?70;B=((/0:?)5:&0(:=5%%%%%%% [3] 46 A6AA7ABBBBAA;B=B%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [4] 46 BAA=>AABB=:BAA>>>%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [5] 46 BCBBBBA>62@B<0ABBA798-=2?82@?B%%%%%%%%%%%%%%%% [6] 46 BA@0@/,:B<AA;A7B=<3&5B59?.%%%%%%%%%%%%%%%%%%%% Any suggestion would be appreciated. |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
You might try quality trimming your reads. Those quality scores are phred+33, so the %'s at the ends of the reads are low quality and are probably decreasing mapping efficiency. Regarding the number of differentially regulated genes, there are a couple of comments. Firstly, with only two samples you would not expect to find a huge amount simply due to the inability to properly judge biological variability. Secondly, how do you know that the correct number is higher? The only way to know that would be to do the experiment that you're currently doing.
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: Finland Join Date: Aug 2012
Posts: 29
|
![]() Quote:
Actually, I also have increased number of biological replicates up to 12 and still there is no significant differentially expressed genes. I know there should be around 480 differentially expressed genes because this dataset has been under analysis using R packages. ![]() |
|
![]() |
![]() |
![]() |
#4 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Assuming you mean DESeq or edgeR by "R packages", I would tend to believe them over whatever cuffdiff tells you. You didn't mention what version of the cufflinks suite you're using, so perhaps try upgrading (or maybe downgrading) it. There have been a number of threads on here regarding the changes in output produced by various version of cufflinks.
Oh, one note, I just noticed that you're using "-c 0". You're going to seriously decrease the number of DE genes by using that (more tests == crappier p-values for everything). Last edited by dpryan; 09-16-2012 at 05:45 AM. Reason: mention -c 0 usage. |
![]() |
![]() |
![]() |
#5 | |
Member
Location: Finland Join Date: Aug 2012
Posts: 29
|
![]() Quote:
About the cufflinks version, first I used the latest version :2.0.2-beta and I gained only 8 DE genes with pvalue <0.01. But then after reading this thread : http://seqanswers.com/forums/showthr...t=17662&page=2 I decided to use cufflinks/1.3.0 and I used -c 0 . So I will omit this -c 0 parameter but do you think still I should work with cufflinks/1.3.0 to be less conservative or not? |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|