SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat mapping of pair ended library from scriptseq caodan Bioinformatics 1 05-21-2012 06:51 AM
Mate pair libraries with longer read length rakesh.ponnala Illumina/Solexa 4 01-16-2012 03:41 PM
tracking Illumina read pair information SES Bioinformatics 1 10-18-2011 07:40 AM
Read length for Illumina mate pairs Linnea Illumina/Solexa 2 06-09-2010 12:46 AM
read length of SOLiD and Solexa seqAll General 8 12-16-2009 05:50 AM

Reply
 
Thread Tools
Old 12-07-2011, 07:08 AM   #1
mrfox
Senior Member
 
Location: USA

Join Date: Aug 2010
Posts: 103
Talking Exome sequencing: Illumina? SOLiD? Read length? Pair-Ended?

Hi All,

My collaborators are interested in detecting SNPs in some cancer samples. Exome sequencing seems to be a good start but we have not much knowledge about exome seq and analysis. It will be appreciated if you could give some advice on the following questions:

1) Shall we use Illumina or SOLiD platform? We would like to use the one with better sequencing QUALITY.
2) What is the appropriate read length we shall use? The larger the better?
3) I am not sure if paired-end information is useful for SNP detection but I guess we had better use paired-end.
4)Could you recommend a good software if we want to identify potential SVs using the exome seq data?

Thank you very much.
mrfox is offline   Reply With Quote
Old 12-07-2011, 08:35 AM   #2
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 388
Default

Quote:
Originally Posted by mrfox View Post
Hi All,

My collaborators are interested in detecting SNPs in some cancer samples. Exome sequencing seems to be a good start but we have not much knowledge about exome seq and analysis. It will be appreciated if you could give some advice on the following questions:

1) Shall we use Illumina or SOLiD platform? We would like to use the one with better sequencing QUALITY.
2) What is the appropriate read length we shall use? The larger the better?
3) I am not sure if paired-end information is useful for SNP detection but I guess we had better use paired-end.
4)Could you recommend a good software if we want to identify potential SVs using the exome seq data?

Thank you very much.
1) I don't think it matters, but more tools are supported and more people use Illumina
2) Yes, we do 100bp PE on a HiSeq2000 for instance
3) Yes, but you will find it more useful for detecting indels. Lots of tools will expect paired-end data and there is no reason not to use it.
4) Samtools or GATK after alignment are both popular tools for calling SNPs. SNVmix might be more appropriate for cancer samples. Annovar or SnpEff or Ensembl's VEP for annotation.

Consider doing a paired/normal study if possible.

Last edited by Bukowski; 12-07-2011 at 08:39 AM.
Bukowski is offline   Reply With Quote
Old 12-07-2011, 08:42 AM   #3
mrfox
Senior Member
 
Location: USA

Join Date: Aug 2010
Posts: 103
Default

Thank you for your advice, Bukowski! One more question, if we perform CNV using the Exome Seq, what tool do you recommend? I know it is more challenging to do CNV only using Exome seq, compared to using whole genome data.
mrfox is offline   Reply With Quote
Old 12-07-2011, 08:43 AM   #4
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 388
Default

I Would probably be looking at ExomeCNV for that:

http://cran.r-project.org/web/packag...CNV/index.html

And I'm pretty sure that will require paired/normal data, but check.
Bukowski is offline   Reply With Quote
Old 01-22-2012, 08:44 AM   #5
Jayu
Member
 
Location: Ahmedabad

Join Date: Mar 2011
Posts: 14
Default

Can anyone tell me the pipeline for exome sequencing data analysis?
Jayu is offline   Reply With Quote
Old 01-23-2012, 12:44 AM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Quote:
Originally Posted by Jayu View Post
Can anyone tell me the pipeline for exome sequencing data analysis?
That depends on how you want to do the analysis.

Depending on how paranoid or pedantic you are, you can do a readjustment of read sequences based on the original intensity data. After that, you can do some pre-filtering or trimming of reads to exclude unlikely sequences.

Your happiness with the current exon boundary annotation of your genome will determine if you can go straight to mapping, or if there needs to be some sort of assisted (or possibly de-novo) assembly first.

If you care about isoforms, you will need to use a tool that can identify and distinguish different isoforms and estimate isoform proportions. This may be better achieved with a genome mapping with something that can split reads with very large gaps (something like Tophat). Otherwise you could map to the transcriptome, bearing in mind that isoform identification is much more difficult in that case.

Once you have reads (or estimated reads), they need to be normalised to account for sampling variation and other types of random and systematic error. After that you can finally get around to the actual data analysis, which will generally be up to the researcher.
gringer is offline   Reply With Quote
Old 01-23-2012, 01:18 AM   #7
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 388
Default

Quote:
Originally Posted by gringer View Post
That depends on how you want to do the analysis.

Depending on how paranoid or pedantic you are, you can do a readjustment of read sequences based on the original intensity data. After that, you can do some pre-filtering or trimming of reads to exclude unlikely sequences.

Your happiness with the current exon boundary annotation of your genome will determine if you can go straight to mapping, or if there needs to be some sort of assisted (or possibly de-novo) assembly first.

If you care about isoforms, you will need to use a tool that can identify and distinguish different isoforms and estimate isoform proportions. This may be better achieved with a genome mapping with something that can split reads with very large gaps (something like Tophat). Otherwise you could map to the transcriptome, bearing in mind that isoform identification is much more difficult in that case.

Once you have reads (or estimated reads), they need to be normalised to account for sampling variation and other types of random and systematic error. After that you can finally get around to the actual data analysis, which will generally be up to the researcher.
That sounds an awful lot like a recipe for RNA-Seq analysis not exome analysis. The poster (who shouldn't be tacking questions on to other people's threads) might be interested in http://seqanswers.com/wiki/How-to/exome_analysis
Bukowski is offline   Reply With Quote
Old 01-23-2012, 01:25 AM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Quote:
Originally Posted by Bukowski View Post
That sounds an awful lot like a recipe for RNA-Seq analysis not exome analysis.
Er, yes. Sorry, I got a little carried away there....
gringer is offline   Reply With Quote
Old 01-23-2012, 01:30 AM   #9
Aman Mahajan
Member
 
Location: India

Join Date: Jan 2012
Posts: 22
Default

I have a question not related to the thread though..

I assembled my illumina data using SOAP, now I want to carry out expression analysis using Rseq tool. it accepts only SAM format so I downloaded SAMTOOLS to convert my soap output to SAM. Can anyone tell me how to run it and convert, tutorial has been of no use yet!
Aman Mahajan is offline   Reply With Quote
Old 01-23-2012, 01:32 AM   #10
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Quote:
I have a question not related to the thread though..
This was just recently posted in this thread:

Quote:
The poster (who shouldn't be tacking questions on to other people's threads)
Please try to do what this comment suggests and start new threads for unrelated questions. It makes searching the forums much easier for other future browsers of questions and answers.
gringer is offline   Reply With Quote
Old 01-23-2012, 01:36 AM   #11
Aman Mahajan
Member
 
Location: India

Join Date: Jan 2012
Posts: 22
Default

This is actually my 1st post, can't figure out how to start a new thread. I'll try and post it there . Thanks if this has been answered before kindly pass me on the link to the thread.
Aman Mahajan is offline   Reply With Quote
Old 01-23-2012, 01:45 AM   #12
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Quote:
This is actually my 1st post, can't figure out how to start a new thread
From the SEQAnswers home page, click on the red 'Forums' link at the left, then click on the forum name, then click on the 'New Thread' button. You can also click on the link at the top of a thread page (SEQanswers > Bioinformatics > Bioinformatics) to go to the forum page.
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:14 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO