Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating vcf files with samtools

    I am new to next gen sequence analysis and I was wondering if anyone could help me with some questions that I have regarding the use of SAMtools to generate vcf files:

    I have used the following command lines to generate a vcf file:
    samtools mpileup -gf GRCh37.fa sample.bam > mpileup_sample
    bcftools view -bvcg mpileup_sample > bcf_sample
    bcftools view bcf_sample > vcf_sample
    vcfutils.pl varFilter -D 500 vcf_sample > vcf_filtered_sample

    In the resulting vcf file, the columns "ID" and "FILTER" are empty, is there any way that I can obtain values for these columns?

    Also, I would like to filter out all those samples that have a read depth less than 10 and Quality scores less than 20. Is there a way that I can do this in SAMtools?

    Lastly, I am unclear as to what the last command line (vcfutils.pl varFilter -D 500 vcf_sample > vcf_filtered_sample) does.

    Any help or suggested reading would be much appreciated!

  • #2
    The ID column represents the ID of dbsnp database, i.e if the variant is reported in dbsnp then its ID will be given there, and u can reduce the read depth by setting the option -D 10 in vcfutils command.
    vcfutil command converts the binary file bcf to a more readable format of variants and also we can use various filters to get more novel variants

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin


      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
      Yesterday, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    39 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    41 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    35 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X