Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to deal with multi-sample NGS data?

    Hello, everyone, i'm a fresh guy to NGS my boss gave me the fq files of about 180 different samples. and i need to do alignment and snp/indels calling for these samples. i need help on 2 questions:

    1. Is there any way more efficient to do the alignment and variants calling for those 180 samples, or should i analysis them one by one?

    2. each sample may produce a single vcf file. and how can i combine the calling of 180 samples and get the frequency for each snp?

    Please help, thanks a lot

  • #2
    You can use the methods:
    1.samtools mpileup function http://samtools.sourceforge.net/mpileup.shtml
    2.SOAPSNP. you can write email to require the mulitple individual version.

    You can analyze the 180 samples together then the software can create only one combined vcf file.

    Comment


    • #3
      Originally posted by zhanglu295 View Post
      You can use the methods:
      1.samtools mpileup function http://samtools.sourceforge.net/mpileup.shtml
      2.SOAPSNP. you can write email to require the mulitple individual version.

      You can analyze the 180 samples together then the software can create only one combined vcf file.
      Oh, that's great, i'll try as your advice. Thank you very much

      Comment


      • #4
        You can also use the GATK.

        In case you're new to the whole thing: I recommend making templates for each of the analysis steps you take, and then run some script to replace the placeholders with your sampleinfo. I do hope you have some compute power at your disposal, 180 samples may take a while to analyse :P

        I also recommend to incorporate some decent logging so you can easily find where things went wrong. Include versioning (version of the tool used, but also version of your complete analysis pipeline), that will help too. It might be a bigger job than you expect! Also think about how you would like to structure files and directories in advance and which intermediate files are worth keeping or not.

        In case you're not new to the whole thing: perhaps it helps others

        Comment


        • #5
          Has anyone looked at using generic databases for this kind of question ? It seems a lot of people are doing large scale exon or whole genome analysis these days.

          Comment


          • #6
            Originally posted by Bruins View Post
            You can also use the GATK.

            In case you're new to the whole thing: I recommend making templates for each of the analysis steps you take, and then run some script to replace the placeholders with your sampleinfo. I do hope you have some compute power at your disposal, 180 samples may take a while to analyse :P

            I also recommend to incorporate some decent logging so you can easily find where things went wrong. Include versioning (version of the tool used, but also version of your complete analysis pipeline), that will help too. It might be a bigger job than you expect! Also think about how you would like to structure files and directories in advance and which intermediate files are worth keeping or not.

            In case you're not new to the whole thing: perhaps it helps others
            Thanks, the suggestion is very very useful. I have the unforgettable experience of pipeline debug. That really cost me plenty of time

            Comment


            • #7
              If those fq's are from mammalian samples, the alignment alone is going to take forever.

              It would be worth it to spend some time asking around and looking around yourself if someone has already done the alignments. If if takes you a week to find them, you will probably save yourself a lot of time.

              And yes, you can give a pile of .bams to samtools' mpileup command, and it will give you a combined .vcf file. The lines look something like this (with 11 samples, all on one line, of course):

              chr3 23987415 . A C 999 . DP=6832;AF1=0.475;CI95=0.2727,0.6364;DP4=315,231,3814,1978;MQ=37;FQ=28.2;PV4=0.00017,0,1.5e-147,1
              GT:PL:GQ
              0/0:0,107,0:3
              1/1:76,255,0:75
              1/1:112,255,0:99
              1/1:71,255,0:70
              0/0:0,172,29:30
              0/0:0,181,18:19
              0/0:0,158,43:44
              0/0:0,188,12:13
              1/1:16,255,0:15
              1/1:5,236,0:6
              0/0:0,224,16:17

              Learning what all that means is a whole other project.

              Comment


              • #8
                Originally posted by swbarnes2 View Post
                If those fq's are from mammalian samples, the alignment alone is going to take forever.

                It would be worth it to spend some time asking around and looking around yourself if someone has already done the alignments. If if takes you a week to find them, you will probably save yourself a lot of time.

                And yes, you can give a pile of .bams to samtools' mpileup command, and it will give you a combined .vcf file. The lines look something like this (with 11 samples, all on one line, of course):

                chr3 23987415 . A C 999 . DP=6832;AF1=0.475;CI95=0.2727,0.6364;DP4=315,231,3814,1978;MQ=37;FQ=28.2;PV4=0.00017,0,1.5e-147,1
                GT:PL:GQ
                0/0:0,107,0:3
                1/1:76,255,0:75
                1/1:112,255,0:99
                1/1:71,255,0:70
                0/0:0,172,29:30
                0/0:0,181,18:19
                0/0:0,158,43:44
                0/0:0,188,12:13
                1/1:16,255,0:15
                1/1:5,236,0:6
                0/0:0,224,16:17

                Learning what all that means is a whole other project.
                i'm afraid i have to do the alignment by myself. This information is really wonderful, we need those frequency data to conduct follow up genotyping in larger samples. does the genotype order is the same as the inputted bams?

                Thanks

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Working...
                X