SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality control steps before alignment? kjaja Bioinformatics 3 10-08-2011 12:48 AM
QC steps in Mate Pair Library Prep busypops SOLiD 0 05-02-2011 01:33 AM
Which of these steps in RNA-Seq pipeline do you do? PFS Bioinformatics 4 04-13-2011 10:08 AM
Steps to Genome Assembly?? niazi84@hotmail.com Bioinformatics 2 02-06-2010 02:56 AM

Reply
 
Thread Tools
Old 06-21-2011, 03:13 PM   #1
CNVboy
Member
 
Location: boston

Join Date: Jun 2011
Posts: 27
Default About script to combine steps of BWA

For BWA, first do alignment (generate .sai), then sampe the paired-ends (generate .sam), then convert .sam to .bam. Now I'm going to write script to combine all these steps. ie, input is fastq, then output is .bam file.

Now a big question is, I need to guarantee, say, the first step (generating .sai) is COMPLETE (instead of being aborted) before going to the second step; and the second step (generating .sam) before the third. I there any way to check that for BWA? Or what's the sign for the completeness of each step?

btw, for combining of steps of BWA, I just need to put those three commands together? Or any other trick?

thanks!
CNVboy is offline   Reply With Quote
Old 06-21-2011, 08:22 PM   #2
dp05yk
Member
 
Location: Brock University

Join Date: Dec 2010
Posts: 66
Default

Just write a shell script that calls your three commands in order... unless you are executing BWA on a server with a scheduler, in which case you can just submit all three commands independently with wait conditions.
dp05yk is offline   Reply With Quote
Old 06-21-2011, 09:24 PM   #3
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

If you are looking for one to cut and paste there is one on my introduction thread that will take samples from fastq files straight to sorted bams with dupsmarked. It doesn't do a check for errors after each step, but others could likely suggest a simple method to capture if the exit status of the previous step. Generally, I've not had many issue with errors except when I messed up...
Jon_Keats is offline   Reply With Quote
Old 06-21-2011, 11:17 PM   #4
gprakhar
Member
 
Location: India

Join Date: Aug 2010
Posts: 78
Default

Hello CNVboy,

If you are running your combined script on a Scheduler then, only problem with giving all the commands together is that you will get to know about the errors only at the end, that too only "IF" you are not keeping an eye on the log file.
(use "$tail -f <filename>" for that, keep looking at it every 10-15 mins)
I use SGE and just tail -f the err and out files

Same way for running the script directly, only difference is your output is now the terminal and not a file.

Hope this helps,
prakhar
gprakhar is offline   Reply With Quote
Old 06-22-2011, 05:13 AM   #5
CNVboy
Member
 
Location: boston

Join Date: Jun 2011
Posts: 27
Default

THanks everyone.

I think the core question is: how can we guarantee the quality of the generated .sai, .sam file? How do you know the programming at this step is complete? Just look at the size of files; or ,say judge the quality of .sai file by checking if it can produce .sam file?

For example, some of my .sai file looks good size-wise, but somehow cannot produce .sam file with correct size (.sam file is only 8.0kb); then I just reproduce .sai file, and it can generate .sam file with correct size.

Also, seems BWA returns exit codes; can we use this as the sign of completeness of each step? And try to make it like, command 2 will execute only if command 1 runs successfully

Last edited by CNVboy; 06-22-2011 at 05:19 AM.
CNVboy is offline   Reply With Quote
Old 06-22-2011, 06:16 AM   #6
dp05yk
Member
 
Location: Brock University

Join Date: Dec 2010
Posts: 66
Default

In my experience BWA won't really ever error out unless there's a problem with your data or your command (or lack of memory). As long as your data/commands are correct you can just use a shell script - command n in a shell script will wait until command (n - 1) is complete before running so you're guaranteed to start in the correct order.

As far as you yourself knowing if the .sai stage is complete, just check your stderr output - if it says "n sequences have been processed", where n = the number of sequences in your FASTQ file, then it's complete. But the shell will check this for you if it's running a script.
dp05yk is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO