Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fluctuating Genotype Quality scores

    Hi everyone,

    I have DNA reads generated on the SOLiD 4 sequencer and Illumina MiSeq platform. The reads are from multiple strains of 3 species. Only one platform has been used for each strain. I've aligned the reads to their respective reference sequences using BWA and then called SNPs using samtools (commands below). So far, so good.

    I've come to plot divergence from the reference sequences as a function of genotype quality (basically, the number of times each GQ score is seen relative to genome size). Indels have been removed from the SAMtools output and I've taken the values from SAMtools' resulting vcf file and extracted the 'GQ' field's scores for each SNP. The vast majority of GQ scores are an encouraging '99'. However, for the relative few that have been given scores below this, I see very strange patterns in their frequency. Every 7th GQ score (3, 10, 17 etc) the frequency shoots up massively, then drops down again.

    I can't find any reason for this or anyone that's found this before. Perhaps I'm missing something very obvious but any help would be greatly appreciated! Anybody know why this might happen?

    Thanks very much,

    Ian


    ###########
    ###########
    ###########

    Commands used:

    bwa aln -n 4 -R 10 -t 10 -c ./REF_SEQ_SOLID/BWA_HM1 ~/Hist_Comparisons/2_SRA_FILES/HM1_A.fastq > ./READS/HM1_A.sai;

    bwa samse -r '@RG\tID:HM1\tPL:SOLiD\tLB:S\tSM:HM1_A' -n 2 ./REF_SEQ_SOLID/BWA_HM1 ./READS/HM1_A.sai ~/Hist_Comparisons/2_SRA_FILES/HM1_A.fastq > ./READS/HM1_A.sam

    awk '$2 != 4 {print $0}' HM1_A.sam > HM1_Afiltered.sam;

    cat HM1_Afiltered.sam | perl -ne 'if ($_ =~ /^@/) {print $_} elsif ($_ =~ /^\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+XT\:A\:U/) {print $_}' > HM1_Afiltered_UNIQ.sam;

    samtools view -uT ~/Hist_Comparisons/3_BWA/REF_SEQ_SOLID/BWA_HM1.fa HM1_Afiltered_UNIQ.sam | \
    samtools sort - ~/Hist_Comparisons/4_GATK/STRAINS/HM1_A/HM1_Afiltered_UNIQ_sort;
    samtools index ~/Hist_Comparisons/4_GATK/STRAINS/HM1_A/HM1_Afiltered_UNIQ_sort.bam;


    ## call bases and snps
    ## Make a vcf and a bcf file containing SNPs, indels and non variants (i.e. sites without polymorphisms):

    samtools mpileup -uD -f ~/Hist_Comparisons/3_BWA/REF_SEQ_SOLID/BWA_HM1.fa HM1_Afiltered_UNIQ_sort.bam | bcftools view -bcg - > HM1_samtools.raw.bcf
    samtools mpileup -uD -f ~/Hist_Comparisons/3_BWA/REF_SEQ_SOLID/BWA_HM1.fa HM1_Afiltered_UNIQ_sort.bam | bcftools view -cg - > HM1_samtools.raw.vcf

    ## calculate q50 and q95 of coverage
    cat HM1_samtools.raw.vcf | grep -v 'INDEL' | perl -ne 'if ($_ =~ /^\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+([0-9]*\.?[0-9]+)\s+\S+\s+DP=(\d+)/) {if ($1 >= 20) {print $2."\n"}}' | sort -n | perl -e '$q50=.5; $q95=.95; @l=<>; print "$l[int($q50*$#l)]"; print "$l[int($q95*$#l)]";'

    ## 34
    ## 141

    ## Make a vcf file containing only SNPs with a minimum quality score of 20, a minimum depth of 5, a maximum depth of 141,
    ## SNPs farther than -w bp from a gap, using a window size of 30.
    ## These are the closest values it’s possible to get now:

    bcftools view HM1_samtools.raw.bcf | vcfutils.pl varFilter -Q 20 -d 5 -D 141 -w 5 -W 30 > HM1_filtered.vcf;
    Last edited by IanWilson; 09-17-2014, 12:59 AM.

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
66 views
0 likes
Last Post seqadmin  
Working...
X