SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Large-Scale, Multi-Sample SNP Analysis Video DNASTAR Vendor Forum 0 10-26-2011 07:11 AM
An example, multi-sample VCF file? dagarfield Bioinformatics 0 10-18-2011 07:20 AM
NGS Sample Prep Luncheon August 2nd mikebLRIG Events / Conferences 0 07-25-2011 01:06 PM
Capillary and NGS data from the same sample szilva Bioinformatics 0 05-03-2011 05:33 AM
Can DESeq and edgeR deal with in-balanced RNA-seq data? asiangg Bioinformatics 3 05-26-2010 04:45 AM

Reply
 
Thread Tools
Old 03-21-2011, 08:08 PM   #1
ssnowfox
Junior Member
 
Location: AZ

Join Date: Feb 2011
Posts: 8
Smile How to deal with multi-sample NGS data?

Hello, everyone, i'm a fresh guy to NGS my boss gave me the fq files of about 180 different samples. and i need to do alignment and snp/indels calling for these samples. i need help on 2 questions:

1. Is there any way more efficient to do the alignment and variants calling for those 180 samples, or should i analysis them one by one?

2. each sample may produce a single vcf file. and how can i combine the calling of 180 samples and get the frequency for each snp?

Please help, thanks a lot
ssnowfox is offline   Reply With Quote
Old 03-21-2011, 10:33 PM   #2
zhanglu295
Junior Member
 
Location: HongKong

Join Date: Mar 2011
Posts: 8
Default

You can use the methods:
1.samtools mpileup function http://samtools.sourceforge.net/mpileup.shtml
2.SOAPSNP. you can write email to require the mulitple individual version.

You can analyze the 180 samples together then the software can create only one combined vcf file.
zhanglu295 is offline   Reply With Quote
Old 03-22-2011, 07:12 AM   #3
ssnowfox
Junior Member
 
Location: AZ

Join Date: Feb 2011
Posts: 8
Default

Quote:
Originally Posted by zhanglu295 View Post
You can use the methods:
1.samtools mpileup function http://samtools.sourceforge.net/mpileup.shtml
2.SOAPSNP. you can write email to require the mulitple individual version.

You can analyze the 180 samples together then the software can create only one combined vcf file.
Oh, that's great, i'll try as your advice. Thank you very much
ssnowfox is offline   Reply With Quote
Old 03-22-2011, 07:59 AM   #4
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

You can also use the GATK.

In case you're new to the whole thing: I recommend making templates for each of the analysis steps you take, and then run some script to replace the placeholders with your sampleinfo. I do hope you have some compute power at your disposal, 180 samples may take a while to analyse :P

I also recommend to incorporate some decent logging so you can easily find where things went wrong. Include versioning (version of the tool used, but also version of your complete analysis pipeline), that will help too. It might be a bigger job than you expect! Also think about how you would like to structure files and directories in advance and which intermediate files are worth keeping or not.

In case you're not new to the whole thing: perhaps it helps others
Bruins is offline   Reply With Quote
Old 03-22-2011, 08:08 AM   #5
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Has anyone looked at using generic databases for this kind of question ? It seems a lot of people are doing large scale exon or whole genome analysis these days.
colindaven is offline   Reply With Quote
Old 03-22-2011, 09:40 AM   #6
ssnowfox
Junior Member
 
Location: AZ

Join Date: Feb 2011
Posts: 8
Default

Quote:
Originally Posted by Bruins View Post
You can also use the GATK.

In case you're new to the whole thing: I recommend making templates for each of the analysis steps you take, and then run some script to replace the placeholders with your sampleinfo. I do hope you have some compute power at your disposal, 180 samples may take a while to analyse :P

I also recommend to incorporate some decent logging so you can easily find where things went wrong. Include versioning (version of the tool used, but also version of your complete analysis pipeline), that will help too. It might be a bigger job than you expect! Also think about how you would like to structure files and directories in advance and which intermediate files are worth keeping or not.

In case you're not new to the whole thing: perhaps it helps others
Thanks, the suggestion is very very useful. I have the unforgettable experience of pipeline debug. That really cost me plenty of time
ssnowfox is offline   Reply With Quote
Old 03-22-2011, 12:16 PM   #7
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

If those fq's are from mammalian samples, the alignment alone is going to take forever.

It would be worth it to spend some time asking around and looking around yourself if someone has already done the alignments. If if takes you a week to find them, you will probably save yourself a lot of time.

And yes, you can give a pile of .bams to samtools' mpileup command, and it will give you a combined .vcf file. The lines look something like this (with 11 samples, all on one line, of course):

chr3 23987415 . A C 999 . DP=6832;AF1=0.475;CI95=0.2727,0.6364;DP4=315,231,3814,1978;MQ=37;FQ=28.2;PV4=0.00017,0,1.5e-147,1
GT:PL:GQ
0/0:0,107,0:3
1/1:76,255,0:75
1/1:112,255,0:99
1/1:71,255,0:70
0/0:0,172,29:30
0/0:0,181,18:19
0/0:0,158,43:44
0/0:0,188,12:13
1/1:16,255,0:15
1/1:5,236,0:6
0/0:0,224,16:17

Learning what all that means is a whole other project.
swbarnes2 is offline   Reply With Quote
Old 03-22-2011, 01:49 PM   #8
ssnowfox
Junior Member
 
Location: AZ

Join Date: Feb 2011
Posts: 8
Default

Quote:
Originally Posted by swbarnes2 View Post
If those fq's are from mammalian samples, the alignment alone is going to take forever.

It would be worth it to spend some time asking around and looking around yourself if someone has already done the alignments. If if takes you a week to find them, you will probably save yourself a lot of time.

And yes, you can give a pile of .bams to samtools' mpileup command, and it will give you a combined .vcf file. The lines look something like this (with 11 samples, all on one line, of course):

chr3 23987415 . A C 999 . DP=6832;AF1=0.475;CI95=0.2727,0.6364;DP4=315,231,3814,1978;MQ=37;FQ=28.2;PV4=0.00017,0,1.5e-147,1
GT:PL:GQ
0/0:0,107,0:3
1/1:76,255,0:75
1/1:112,255,0:99
1/1:71,255,0:70
0/0:0,172,29:30
0/0:0,181,18:19
0/0:0,158,43:44
0/0:0,188,12:13
1/1:16,255,0:15
1/1:5,236,0:6
0/0:0,224,16:17

Learning what all that means is a whole other project.
i'm afraid i have to do the alignment by myself. This information is really wonderful, we need those frequency data to conduct follow up genotyping in larger samples. does the genotype order is the same as the inputted bams?

Thanks
ssnowfox is offline   Reply With Quote
Reply

Tags
frequency, multi-sample

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:26 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO