Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Blasr/Quiver - Best Practice for Current Versions? jpummil Pacific Biosciences 17 11-29-2018 01:27 PM
need a small illumina run data for practice Etherella Bioinformatics 5 08-26-2013 06:11 AM
About the Recalibration of Best Practice Variant Detection with GATK v3 Applemelon Bioinformatics 0 06-24-2012 01:13 AM
PubMed: Next generation sequencing--implications for clinical practice. Newsbot! Literature Watch 0 03-02-2012 02:10 AM

Thread Tools
Old 02-05-2019, 11:07 AM   #1
Junior Member
Location: Nevada

Join Date: Feb 2019
Posts: 2
Default Common / Best Practice Workflow Questions

Over the past 8 years, I’ve been assisting in genomic alignment and vcf analysis in cancer research and endogenous retroviruses. Most everything that I’ve learned has been pieced together from web searches, man pages, scripts, etc. without any experienced direction. I’m hoping to get a much better understanding of the common or best practices of WGS / DNASeq / RNASeq workflow processing from raw FASTQ/FASTA reads to VCF. The closest I’ve found to this is GATK’s Best Practices and a related published article, but it doesn’t really answer any of my questions. It actually added 2. I understand that this is a big question, but hopefully these answers can be used by many. Can anyone direct me to a guide that provides more detailed advice regarding the best practices for genomic analysis for scientific study and publication that answers the following questions? Or perhaps even answer them?

Do any aligners use the quality score in FASTQ during alignment or will FASTA align exactly the same? Or is the score only used beforehand for filtering and possibly downstream for calling?

Should I be aligning “local” or “end-to-end”? The sequencing lab aligns “local”. When aligning “local”, allowing for soft clipping on the ends, many more reads align, but the clipping raises questions as to the validity of the alignment.

What about the many other aligner options and restrictions? The developers describe what they do, but I’ve found no one that speaks to whether they should be used, when or why.

When aligning to human references like hg19 or hg38, is there a preference to aligning with or without all of the alternates? I suppose the larger reference would result in more alignments, but may also cause the map quality scores to drop in some instances due to multiple mapping locations if the alternate region is too similar?

After aligning a sample “local” and “end-to-end”, samtools depth shows some positions with depths of 300,000 to 50,000 respectively! The local alignments are always higher, but this is from a 30x sample. I understand that there can be some biasing during sample prep and sequencing that would result in some regions having higher coverage, but this seems ridiculous. Is this common or suggesting some other issue? Or is this just representative of common, repeated sequences in the reference genome? Is there a way to force aligners to reconsider the alignments in these ultra high depth regions?

Some suggest removing duplicates. It’s not entirely clear to me as to what this means or why. Where did these duplicates “come from”? Are we simply removing read pairs that are identical? Wouldn’t this corrupt any coverage depth or copy number analysis?

What about “re-calibrating base quality scores”? I’ve read that this is meant to resolve a number of potential issues created during sequencing? Obviously, this requires that FASTQ, not FASTA, were used in the initial alignment. And it would also suggest that something downstream uses these quality scores. Does this have a noticeable impact? Is this worth doing?

Prior to creating VCF files from these final BAM files, is it proper to filter, perhaps using only those flagged as PROPER PAIRs? With high mapping quality? In regions with the expected depth?

Creating VCFs with different products produces VCFs with some minor differences. These software packages provide a number of options just as read aligners do. And again they provide loose definitions of these options with little guidance on which to use, when and why.

Thank you for your help
JakeW is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 11:22 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO