Requesting Review for my Pipeline

Nikvailo

Member

Join Date: Apr 2013
Posts: 15

Requesting Review for my Pipeline

01-29-2015, 02:37 AM

Hello everyone,

I created a pipeline for alignment/variant calling for exomes that gathered from NextSeq500. Could you review this pipeline and share your thoughts about it?
What are the possible missing parts, flaws or unnecessary steps of this pipeline?

Code:

cat ./*.fastq.gz > ./merged.fastq.gz  

bwa aln -t 12 ./refgenome.fa ./merged.fastq.gz > ./raw.sai  

bwa samse ./refgenome.fa ./raw.sai ./merged.fastq.gz > ./raw.sam  

samtools view -b -S ./raw.sam > ./raw.bam  

samtools view -bF 4 ./raw.bam > ./filtered.bam  

samtools sort ./filtered.bam ./sorted.bam 

rm ./*.sai  

rm ./*.sam

java -Xmx1024m -jar Picard/AddOrReplaceReadGroups.jar I= ./sorted.bam O= ./sorted_all.bam SORT_ORDER=coordinate RGID=ID RGLB=${PWD##*/}  RGPL=Illumina RGSM=${PWD##*/} RGPU=NXT001  RGCN=Done CREATE_INDEX=True  

java -Xmx1024m -jar GATK/gatk.jar -T UnifiedGenotyper -nct 12 -R ./refgenome.fa -I ./sorted_all.bam --dbsnp /dbsnp/dbsnp_138.hg19.vcf -o ./variant.vcf -stand_call_conf 50.0 -stand_emit_conf 10.0 -glm BOTH  

rm ./variant.vcf.idx 

java -Xmx1024m -jar GATK/gatk.jar -R ./refgenome.fa -T SelectVariants --variant ./variant.vcf -select "DP >= 5.0" -o ./variant_filtered.vcf --intervals exome.bed

Thank you.

Tags: None

IonTom

Member

Join Date: Apr 2014

Posts: 32
- Share
- Tweet
#2

01-29-2015, 05:02 AM

Best forget about bwa and replace by bwa-mem.
Thus you have just one step rather than aln ans samse

You can pipe from bwa-me drectly into samtools view to transfrom the sam output into a bam.

The read groups can be set directly in bwa men so you get rid of the AddOrReplaceReadGroup step.

For variant calling you can use Platypus which is faster and also includes a complete set of
QC steps. If your reads are longer than normal Ilumina reads you have to change the maximum read length in the newest Paltypus version.
Comment

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Requesting Review for my Pipeline

Comment

Latest Articles

ad_right_rmr

News