Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to identify different snps between two groups?

    Dear seq members

    I am fairly new to SNP analysis. In my study, I have 2 groups, treatment and control, each containing 3 biological replicates. My goal is to find different snps between the two groups and I'm using samtools mpileup | bcftools in my analysis.

    Shown below are my codes:

    samtools mpileup -C 50 -uDS -q 10 -f MYPATH/human_ref_genome.fa output/tr1.bam output/tr2.bam output/tr3.bam output/ctr1.bam output/ctr2.bam output/ctr3.bam | bcftools view -bvcg - > ./snp_output/snp_cmp.bcf

    bcftools view -c -1 -3 ./snp_output/snp_cmp.bcf | vcfutils.pl varFilter -D 100 > ./snp_output/snp_cmp.vcf


    Did I use the correct command? Why did the output snp_cmp.vcf file contain millions of snps while I'm expecting hundreds of snps of interest? How did I only limit (filter) my scope to the different snps between the two groups and which program should I use to achieve this goal?

    Thanks in advance.

  • #2
    I am also interested in this question. Hope some experts could help.
    Xi Wang

    Comment


    • #3
      Why did the output snp_cmp.vcf file contain millions of snps while I'm expecting hundreds of snps of interest?
      See here:

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      Hundreds of thousands of SNPs are expected and perfectly normal. However, I think millions would be pushing it a bit too much for comparing only a few individuals.

      In my study, I have 2 groups, treatment and control, each containing 3 biological replicates
      Three biological replicates seems quite a low number for SNP frequency studies. I'm more used to hundreds or thousands of individuals compared (e.g. WTCCC). I'd expect you'll have a lot of trouble distinguishing between false and true positive results with only 6 individuals.

      Edit: I've just noticed that the thread I linked to was also from you discussing the same issue. Oh well, I guess it must have needed some clarification.
      Last edited by gringer; 01-03-2012, 11:35 PM.

      Comment


      • #4
        I'm actually newbie in bioinformatics and maybe I am missing something, but as far as I understand you should be interested in those SNPs which are different among the two groups. If so, you could use some of the tools designed for finding somatic mutations.

        FYI, now I am getting a try to Varscan. It uses the group1_pileup and group2_pileup files as input and then outputs some nice statistics about the 'somatic status' of any variant.

        hope it helps,

        cheers
        david

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X