Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best tools for SNP calling with resequenced mutant?

    Hi

    I am a total newby when it comes to working with Next-Gen data and only have limited bioinformatics skills, so please bear with me...

    About the Project:

    I have various bacterial strain with increased thermal tolerance, these were generated through several rounds of random mutagenesis (chemical mutagens & UV irradiation) and selective growth. Now the first of these (monoclonal) strains has been sequenced using Iontorrent PGM (314 Chip), with the rest to follow shortly. The run itself had some technical issues and will be repeated, this (and the sequencing of the other strains) is on hold until the issues can be resolved.
    I do however have about 6x coverage (20X was expected) which I want to use to generate some preliminary results and to establish the workflow for the data analysis.


    What I have tried so far:

    For the first approach, reads where mapped to the wt reference with GS mapper, and SNPs then searched with samtools.
    After some tests to remove false positives based on homopolymers, I got good results using the following commands:

    Code:
    samtools mpileup -d 10000 -L 1000 -Q 7 -h 50 -o 10 -e 17 -m 4 -uf Reference.fna IonTorrentContigs.bam | bcftools view -bvcg - > var-woh.raw.bcf
    bcftools view var-woh.raw.bcf | vcfutils.pl varFilter -D100 > var-woh.flt.vcf
    The output looks good and does not appear to contain false SNPs based on homopolymers. However I am not merely interested in which positions contain mutations, but in determining affected genes. So I wrote a quick perl script to parse the output file and look up which genes are affected based on position of the mutation (and also to classify them based on functionality).
    The final list contained some 280 mutations on various genes.

    However some relevant data is still needed, such as which AS where exchanged (if any!). I could add this functionality to my script, however I do not wish to reinvent the wheel, especially if good tools with the needed functionality are available.


    I asked a friend with some experience with nextgen Sequencing data and he suggested to use MIRA, as it assigns genes to mutations, checks for AS exchange, and has nice options for output, such as a html-file.

    I assembled again with mira and generated the output files:

    Code:
    mira --project=c5k --job=mapping,genome,accurate,iontor -AS:nop=1 -SB:bsn=DH10B_wt:bft=gbf:bbq=30 IONTOR_SETTINGS -ASSEMBLY:mrpc=100 -SB:ads=yes:dsn=DH10B_mut COMMON_SETTINGS  -GENERAL:not=4 |tee log_assembly.txt
    
    convert_project -f caf -t asnp c5k_out.caf output
    convert_project -f caf -t hsnp c5k_out.caf output_html
    The format of the output is nice (it includes AS exchange for example), but it contains a ton of false positives. Is there an easy way to filter this data with a parameter when generating the output? I didnt find anything in the MIRA documentation...

    Or was my first approach better? Maybe I should be using different tools altogether?




    Any help or even a nudge in the right direction would be much appreciated.

    Cheers,
    Uli

  • #2
    Dear All,
    I am a learner in Bioinformatics, i am running SNP -Calling through GATK Tool kit. i have set of SNPs , which i want to locate now in my genome (Ptrichocarpa), would anybody tell me please, which tool is the best for this process.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 03-27-2024, 06:37 PM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-27-2024, 06:07 PM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    68 views
    0 likes
    Last Post seqadmin  
    Working...
    X