SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Strelka: Somatic small-variant calling workflow for matched tumor-normal samples ctsa Bioinformatics 15 12-15-2014 02:38 AM
GATK SNP calling complete workflow thedamian Bioinformatics 18 10-09-2013 06:41 AM
Variant Calling with mpileup asebastian Bioinformatics 0 03-31-2013 09:53 PM
variant calling kjaja Bioinformatics 1 11-04-2011 08:16 AM

Reply
 
Thread Tools
Old 06-30-2017, 09:50 AM   #1
Eurioste
Junior Member
 
Location: Brazil

Join Date: Jun 2017
Posts: 4
Question My first variant calling workflow

Hello, I'm currently learning how to process data from NGS using the Galaxy platform. This is the first time I work with NGS data and I find myself currently overwhelmed with the abundance of different variant call workflows and available tools. I have molecular biology background and I'm learning this on my own through on-line courses so I wish to have some feedback in case I'm not making mistakes. While I can code in python, I wish to make this workflow in Galaxy as part of a course.

For the purpose of learning, I was given FASTQ raw reads from an Illumina MiSeq, sequenced as paired ends to 125bp in length. The data refers to targetted re-sequencing data for a father, mother and child trio.I need to create a workflow to identify polymorphic sites in all three individuals.

I started a workflow based on the references bellow:

http://folk.uio.no/jonkl/StuffForMBV...s/AAltmann.pdf
https://www.biomedcentral.com/conten...0-7-314-S1.pdf

My current incomplete attempt is available at the link bellow. Some steps from the references were skipped for the sake of simplicity. I'm making my best effort to actually understand what each step really does and why to use it. You can import the worklow on Galaxy for better view:

https://usegalaxy.org/u/eurioste/w/v...alling-on-trio

Briefly, the paired end reads had 3' 10 bps trimmed (based on FASTQ report, not in the workflow), resulting in high quality reads of about 140bps. The paired reads for each individual with were aligned to the reference human_g1k_v37 with BWA-MEN, generating different read group informations. The resulting alignment BAM for each individual was pre-processed with Picard sorting, removal of ambiguous reads and duplicates and update of mate-pair information. I'm omitting indel re-alignment and base quality recalibration on purpose. The resulting 3 BAMs could be used for variant calling, but now I have some questions.

I'm expected to count the number of variants of different types above a certain quality threshold.

I'm in doubt if was it a good choice to align the data for each individual separately. Is it correct to do variant calling in each individual separately? May I still merge these BAM files with Picard and do variant calling, will they retain the correct alignment information? Or I should merge the read information before the alignment? Can these alter the results of the workflow? I've read about converting FASTQ to SAM/BAM and merging them in an unmapped BAM before the alignment and subsequent pre-processing. Do I really need to do it?

Is my workflow actually producing useful data? Please let me know if I'm making a mistake, I'm a little confused if what I did is right. Make sure you describe things well because I'm still unfamiliar with NGS data processing.

Thanks in advance


Eduardo
Eurioste is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO