Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • My first variant calling workflow

    Hello, I'm currently learning how to process data from NGS using the Galaxy platform. This is the first time I work with NGS data and I find myself currently overwhelmed with the abundance of different variant call workflows and available tools. I have molecular biology background and I'm learning this on my own through on-line courses so I wish to have some feedback in case I'm not making mistakes. While I can code in python, I wish to make this workflow in Galaxy as part of a course.

    For the purpose of learning, I was given FASTQ raw reads from an Illumina MiSeq, sequenced as paired ends to 125bp in length. The data refers to targetted re-sequencing data for a father, mother and child trio.I need to create a workflow to identify polymorphic sites in all three individuals.

    I started a workflow based on the references bellow:




    My current incomplete attempt is available at the link bellow. Some steps from the references were skipped for the sake of simplicity. I'm making my best effort to actually understand what each step really does and why to use it. You can import the worklow on Galaxy for better view:



    Briefly, the paired end reads had 3' 10 bps trimmed (based on FASTQ report, not in the workflow), resulting in high quality reads of about 140bps. The paired reads for each individual with were aligned to the reference human_g1k_v37 with BWA-MEN, generating different read group informations. The resulting alignment BAM for each individual was pre-processed with Picard sorting, removal of ambiguous reads and duplicates and update of mate-pair information. I'm omitting indel re-alignment and base quality recalibration on purpose. The resulting 3 BAMs could be used for variant calling, but now I have some questions.

    I'm expected to count the number of variants of different types above a certain quality threshold.

    I'm in doubt if was it a good choice to align the data for each individual separately. Is it correct to do variant calling in each individual separately? May I still merge these BAM files with Picard and do variant calling, will they retain the correct alignment information? Or I should merge the read information before the alignment? Can these alter the results of the workflow? I've read about converting FASTQ to SAM/BAM and merging them in an unmapped BAM before the alignment and subsequent pre-processing. Do I really need to do it?

    Is my workflow actually producing useful data? Please let me know if I'm making a mistake, I'm a little confused if what I did is right. Make sure you describe things well because I'm still unfamiliar with NGS data processing.

    Thanks in advance


    Eduardo

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
70 views
0 likes
Last Post seqadmin  
Working...
X