Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fluctuating Genotype Quality scores

    Hi everyone,

    I have DNA reads generated on the SOLiD 4 sequencer and Illumina MiSeq platform. The reads are from multiple strains of 3 species. Only one platform has been used for each strain. I've aligned the reads to their respective reference sequences using BWA and then called SNPs using samtools (commands below). So far, so good.

    I've come to plot divergence from the reference sequences as a function of genotype quality (basically, the number of times each GQ score is seen relative to genome size). Indels have been removed from the SAMtools output and I've taken the values from SAMtools' resulting vcf file and extracted the 'GQ' field's scores for each SNP. The vast majority of GQ scores are an encouraging '99'. However, for the relative few that have been given scores below this, I see very strange patterns in their frequency. Every 7th GQ score (3, 10, 17 etc) the frequency shoots up massively, then drops down again.

    I can't find any reason for this or anyone that's found this before. Perhaps I'm missing something very obvious but any help would be greatly appreciated! Anybody know why this might happen?

    Thanks very much,

    Ian


    ###########
    ###########
    ###########

    Commands used:

    bwa aln -n 4 -R 10 -t 10 -c ./REF_SEQ_SOLID/BWA_HM1 ~/Hist_Comparisons/2_SRA_FILES/HM1_A.fastq > ./READS/HM1_A.sai;

    bwa samse -r '@RG\tID:HM1\tPL:SOLiD\tLB:S\tSM:HM1_A' -n 2 ./REF_SEQ_SOLID/BWA_HM1 ./READS/HM1_A.sai ~/Hist_Comparisons/2_SRA_FILES/HM1_A.fastq > ./READS/HM1_A.sam

    awk '$2 != 4 {print $0}' HM1_A.sam > HM1_Afiltered.sam;

    cat HM1_Afiltered.sam | perl -ne 'if ($_ =~ /^@/) {print $_} elsif ($_ =~ /^\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+XT\:A\:U/) {print $_}' > HM1_Afiltered_UNIQ.sam;

    samtools view -uT ~/Hist_Comparisons/3_BWA/REF_SEQ_SOLID/BWA_HM1.fa HM1_Afiltered_UNIQ.sam | \
    samtools sort - ~/Hist_Comparisons/4_GATK/STRAINS/HM1_A/HM1_Afiltered_UNIQ_sort;
    samtools index ~/Hist_Comparisons/4_GATK/STRAINS/HM1_A/HM1_Afiltered_UNIQ_sort.bam;


    ## call bases and snps
    ## Make a vcf and a bcf file containing SNPs, indels and non variants (i.e. sites without polymorphisms):

    samtools mpileup -uD -f ~/Hist_Comparisons/3_BWA/REF_SEQ_SOLID/BWA_HM1.fa HM1_Afiltered_UNIQ_sort.bam | bcftools view -bcg - > HM1_samtools.raw.bcf
    samtools mpileup -uD -f ~/Hist_Comparisons/3_BWA/REF_SEQ_SOLID/BWA_HM1.fa HM1_Afiltered_UNIQ_sort.bam | bcftools view -cg - > HM1_samtools.raw.vcf

    ## calculate q50 and q95 of coverage
    cat HM1_samtools.raw.vcf | grep -v 'INDEL' | perl -ne 'if ($_ =~ /^\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+([0-9]*\.?[0-9]+)\s+\S+\s+DP=(\d+)/) {if ($1 >= 20) {print $2."\n"}}' | sort -n | perl -e '$q50=.5; $q95=.95; @l=<>; print "$l[int($q50*$#l)]"; print "$l[int($q95*$#l)]";'

    ## 34
    ## 141

    ## Make a vcf file containing only SNPs with a minimum quality score of 20, a minimum depth of 5, a maximum depth of 141,
    ## SNPs farther than -w bp from a gap, using a window size of 30.
    ## These are the closest values it’s possible to get now:

    bcftools view HM1_samtools.raw.bcf | vcfutils.pl varFilter -Q 20 -d 5 -D 141 -w 5 -W 30 > HM1_filtered.vcf;
    Last edited by IanWilson; 09-17-2014, 12:59 AM.

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
31 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X