Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to deal with multi-sample NGS data?

    Hello, everyone, i'm a fresh guy to NGS my boss gave me the fq files of about 180 different samples. and i need to do alignment and snp/indels calling for these samples. i need help on 2 questions:

    1. Is there any way more efficient to do the alignment and variants calling for those 180 samples, or should i analysis them one by one?

    2. each sample may produce a single vcf file. and how can i combine the calling of 180 samples and get the frequency for each snp?

    Please help, thanks a lot

  • #2
    You can use the methods:
    1.samtools mpileup function http://samtools.sourceforge.net/mpileup.shtml
    2.SOAPSNP. you can write email to require the mulitple individual version.

    You can analyze the 180 samples together then the software can create only one combined vcf file.

    Comment


    • #3
      Originally posted by zhanglu295 View Post
      You can use the methods:
      1.samtools mpileup function http://samtools.sourceforge.net/mpileup.shtml
      2.SOAPSNP. you can write email to require the mulitple individual version.

      You can analyze the 180 samples together then the software can create only one combined vcf file.
      Oh, that's great, i'll try as your advice. Thank you very much

      Comment


      • #4
        You can also use the GATK.

        In case you're new to the whole thing: I recommend making templates for each of the analysis steps you take, and then run some script to replace the placeholders with your sampleinfo. I do hope you have some compute power at your disposal, 180 samples may take a while to analyse :P

        I also recommend to incorporate some decent logging so you can easily find where things went wrong. Include versioning (version of the tool used, but also version of your complete analysis pipeline), that will help too. It might be a bigger job than you expect! Also think about how you would like to structure files and directories in advance and which intermediate files are worth keeping or not.

        In case you're not new to the whole thing: perhaps it helps others

        Comment


        • #5
          Has anyone looked at using generic databases for this kind of question ? It seems a lot of people are doing large scale exon or whole genome analysis these days.

          Comment


          • #6
            Originally posted by Bruins View Post
            You can also use the GATK.

            In case you're new to the whole thing: I recommend making templates for each of the analysis steps you take, and then run some script to replace the placeholders with your sampleinfo. I do hope you have some compute power at your disposal, 180 samples may take a while to analyse :P

            I also recommend to incorporate some decent logging so you can easily find where things went wrong. Include versioning (version of the tool used, but also version of your complete analysis pipeline), that will help too. It might be a bigger job than you expect! Also think about how you would like to structure files and directories in advance and which intermediate files are worth keeping or not.

            In case you're not new to the whole thing: perhaps it helps others
            Thanks, the suggestion is very very useful. I have the unforgettable experience of pipeline debug. That really cost me plenty of time

            Comment


            • #7
              If those fq's are from mammalian samples, the alignment alone is going to take forever.

              It would be worth it to spend some time asking around and looking around yourself if someone has already done the alignments. If if takes you a week to find them, you will probably save yourself a lot of time.

              And yes, you can give a pile of .bams to samtools' mpileup command, and it will give you a combined .vcf file. The lines look something like this (with 11 samples, all on one line, of course):

              chr3 23987415 . A C 999 . DP=6832;AF1=0.475;CI95=0.2727,0.6364;DP4=315,231,3814,1978;MQ=37;FQ=28.2;PV4=0.00017,0,1.5e-147,1
              GT:PL:GQ
              0/0:0,107,0:3
              1/1:76,255,0:75
              1/1:112,255,0:99
              1/1:71,255,0:70
              0/0:0,172,29:30
              0/0:0,181,18:19
              0/0:0,158,43:44
              0/0:0,188,12:13
              1/1:16,255,0:15
              1/1:5,236,0:6
              0/0:0,224,16:17

              Learning what all that means is a whole other project.

              Comment


              • #8
                Originally posted by swbarnes2 View Post
                If those fq's are from mammalian samples, the alignment alone is going to take forever.

                It would be worth it to spend some time asking around and looking around yourself if someone has already done the alignments. If if takes you a week to find them, you will probably save yourself a lot of time.

                And yes, you can give a pile of .bams to samtools' mpileup command, and it will give you a combined .vcf file. The lines look something like this (with 11 samples, all on one line, of course):

                chr3 23987415 . A C 999 . DP=6832;AF1=0.475;CI95=0.2727,0.6364;DP4=315,231,3814,1978;MQ=37;FQ=28.2;PV4=0.00017,0,1.5e-147,1
                GT:PL:GQ
                0/0:0,107,0:3
                1/1:76,255,0:75
                1/1:112,255,0:99
                1/1:71,255,0:70
                0/0:0,172,29:30
                0/0:0,181,18:19
                0/0:0,158,43:44
                0/0:0,188,12:13
                1/1:16,255,0:15
                1/1:5,236,0:6
                0/0:0,224,16:17

                Learning what all that means is a whole other project.
                i'm afraid i have to do the alignment by myself. This information is really wonderful, we need those frequency data to conduct follow up genotyping in larger samples. does the genotype order is the same as the inputted bams?

                Thanks

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                58 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                45 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X