SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to remove 3'-adaptor sequence from illumina DGE expression data archory Bioinformatics 6 12-05-2011 06:55 AM
how to remove 3'-adaptor sequence from illumina DGE expression data archory Illumina/Solexa 0 11-29-2011 05:53 PM
How to confirm SNPs results from a proteomics approach didipao De novo discovery 15 07-28-2011 05:09 AM
the pileup result from samtools doesn't mach to read data Anney Bioinformatics 3 07-18-2011 04:44 PM
How to confirm the best parameter for assembling? anyone1985 Bioinformatics 6 05-04-2009 09:31 AM

Reply
 
Thread Tools
Old 09-15-2009, 12:51 AM   #1
xile
Member
 
Location: china

Join Date: Sep 2008
Posts: 14
Default The DGE result can't confirm the micorarray data

Hi,

It is my first time to use the house-make reagents to prepare DGE-tag profiling library for sequencing. The sequencing result seem to not perfectly comfirm the microarray result and they are from the same samples (seesample1.JPG). The eighty percent reads can map to the arabidopsis cDNA sequences, but only about fifty percent reads can map to tag table from the nearest restriction site to 3' teminal of cDNA. The tag table made by myself. Is it normal? Could you give me some advice on sample preparation?

Maybe the insufficiency of enzyme digestion result in the low proportion of reads mapping to tag table. But I don't think it's the main reason of the low consistency with microarray data.

Thanks.
xile is offline   Reply With Quote
Old 09-15-2009, 01:27 AM   #2
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Could you please precise how the "tag count" values were obtained? It seems that you are working with SAGE tags, is that correct?
Is there a published study where SAGE-based DGE is compared with microarray data (and with a good correlation)? It could help to identify what could differ from your experiment.
steven is offline   Reply With Quote
Old 09-15-2009, 01:50 AM   #3
xile
Member
 
Location: china

Join Date: Sep 2008
Posts: 14
Default

Quote:
Originally Posted by steven View Post
Could you please precise how the "tag count" values were obtained? It seems that you are working with SAGE tags, is that correct?
Is there a published study where SAGE-based DGE is compared with microarray data (and with a good correlation)? It could help to identify what could differ from your experiment.
Thanks.
We followed the protocol of "Preparing Samples for Digital Gene Expression-Tag Profiling with NlaIII" from illumina to generates a unique 16 bp for each transcript, anchored with the recognition site by the restriction enzyme NlaIII. The 16 bp sequencing data combined with the known NlaIII restriction site ("CATG") data generates the unique 20 bp tag used for annotation. The quantitative expression level of the unique transcripts is demonstrated by the number of times the sequence is detected. The tags come from the nearest restriction site to 3' teminal of arabidopsis cDNA.

There is an official introduction in the attachmentrnaDGETagProfiling.pdf.
xile is offline   Reply With Quote
Old 09-15-2009, 02:38 AM   #4
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Thanks for the info!
Indeed, a partial digestion can explain a part of the tags that do not match to your index. Considering the penultimate restriction site of the transcripts for the index could help to get some tags back. Also, as the annotation of the ends of plant genes is less reliable than for instance in human (lack of polyA site consensus), considering potentially longer transcripts by virtually extending a bit the 3'UTRs could also help (like done here).

However, I am afraid I can not help regarding the lack of correlation array/DGE..
Do you by chance have technical or biological replicates of one of these experiments?
Does the protocol only retain polyA+ long RNAs? No possible contamination by sRNAs?
Maybe something wrong with the index table (genome/annotation version..)? Did you double check the position of a few tags "manually"?
good luck..
steven is offline   Reply With Quote
Old 09-15-2009, 03:29 AM   #5
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

May be you can find the answer at this papers "H. Alexander Ebhardt etc. al., Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications, Nucleic Acids Research, 2009, 1–10, doi:10.1093/nar/gkp093", I am not a biologiest, so I don't know what the key point is in this matter.
BENM is offline   Reply With Quote
Old 09-15-2009, 05:16 AM   #6
xile
Member
 
Location: china

Join Date: Sep 2008
Posts: 14
Default

Quote:
Originally Posted by steven View Post
Thanks for the info!
Indeed, a partial digestion can explain a part of the tags that do not match to your index. Considering the penultimate restriction site of the transcripts for the index could help to get some tags back. Also, as the annotation of the ends of plant genes is less reliable than for instance in human (lack of polyA site consensus), considering potentially longer transcripts by virtually extending a bit the 3'UTRs could also help (like done here).

However, I am afraid I can not help regarding the lack of correlation array/DGE..
Do you by chance have technical or biological replicates of one of these experiments?
Does the protocol only retain polyA+ long RNAs? No possible contamination by sRNAs?
Maybe something wrong with the index table (genome/annotation version..)? Did you double check the position of a few tags "manually"?
good luck..
Thanks for your advice.
Another sample from different treatment on the same flewcell seem to be bad as this.
The protocol only retains polyA+ long RNAs.

Last edited by xile; 09-15-2009 at 05:34 AM.
xile is offline   Reply With Quote
Old 09-15-2009, 05:29 AM   #7
xile
Member
 
Location: china

Join Date: Sep 2008
Posts: 14
Default

Quote:
Originally Posted by BENM View Post
May be you can find the answer at this papers "H. Alexander Ebhardt etc. al., Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications, Nucleic Acids Research, 2009, 1–10, doi:10.1093/nar/gkp093", I am not a biologiest, so I don't know what the key point is in this matter.
Thanks. I think the proportion of RNA modifications is lower.
xile is offline   Reply With Quote
Old 09-15-2009, 08:19 AM   #8
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Also, when you annotate a tag to a transcript, do you make sure that it is the 3' most tag? We saw that though 3' most is the major tag, partial digestion causes other tags to be present as well, which need to be all summed up to get the final tag-count for the gene.
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 09-15-2009, 09:11 PM   #9
xile
Member
 
Location: china

Join Date: Sep 2008
Posts: 14
Default

Quote:
Originally Posted by bioinfosm View Post
Also, when you annotate a tag to a transcript, do you make sure that it is the 3' most tag? We saw that though 3' most is the major tag, partial digestion causes other tags to be present as well, which need to be all summed up to get the final tag-count for the gene.
I saw that about 20 % of total reads is not the 3' most tag.
xile is offline   Reply With Quote
Old 09-16-2009, 03:48 AM   #10
jwaage
Member
 
Location: Copenhagen, DK

Join Date: Sep 2008
Posts: 16
Default

Try to log2/log10 transform reads and intensity, and do plotting / linear modelling then...
jwaage is offline   Reply With Quote
Old 09-16-2009, 09:27 AM   #11
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Quote:
Originally Posted by xile View Post
I saw that about 20 % of total reads is not the 3' most tag.
Does including/excluding them make a difference to the correlation plot?
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 09-16-2009, 06:43 PM   #12
xile
Member
 
Location: china

Join Date: Sep 2008
Posts: 14
Default

Quote:
Originally Posted by jwaage View Post
Try to log2/log10 transform reads and intensity, and do plotting / linear modelling then...
Thanks. I redraw the plotlog.JPG.and it's looks better. Is it normal? I have no experience for this DGE-tag data.

Last edited by xile; 09-16-2009 at 06:49 PM.
xile is offline   Reply With Quote
Old 09-16-2009, 06:44 PM   #13
xile
Member
 
Location: china

Join Date: Sep 2008
Posts: 14
Default

Quote:
Originally Posted by bioinfosm View Post
Does including/excluding them make a difference to the correlation plot?
Thanks, It's not significant different.
xile is offline   Reply With Quote
Old 09-17-2009, 02:19 AM   #14
jwaage
Member
 
Location: Copenhagen, DK

Join Date: Sep 2008
Posts: 16
Default

Your graph looks decent. What's your correlation coefficient (R2) - our experience (and other litterature) typically shows between 55 and 70 depending on tissue and experimental setup.
jwaage is offline   Reply With Quote
Old 09-17-2009, 11:15 PM   #15
xile
Member
 
Location: china

Join Date: Sep 2008
Posts: 14
Default

Quote:
Originally Posted by jwaage View Post
Your graph looks decent. What's your correlation coefficient (R2) - our experience (and other litterature) typically shows between 55 and 70 depending on tissue and experimental setup.
After log2 transforming, the correlation coefficient is 0.6637376.
xile is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO