SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
multiple sample differential RNA editing/SNP after novoalignment 12jrowley2 RNA Sequencing 2 07-17-2014 08:39 PM
Add Readgroups to Tophat multiple sample input? Pamela Mukhopadhyay Bioinformatics 0 12-14-2011 08:55 PM
How to separate individual sample variants from multiple samples meher Bioinformatics 0 06-30-2011 12:11 PM
How convert multiple .sra files into .fastq in one go? TuA Bioinformatics 5 05-27-2011 08:32 AM
Multiple sample comparion or Segregation analysis 2007lab Genomic Resequencing 2 11-26-2010 09:08 AM

Reply
 
Thread Tools
Old 04-27-2012, 12:31 PM   #1
epi
Member
 
Location: USA

Join Date: Jan 2012
Posts: 38
Default aligning multiple fastq for the same sample

Hi everyone, I am trying to align fastq against reference with bowtie. The input data contains same sample run in multiple lanes, 2, 3 or 4, as is commonly the case when expected depth could not be reached by single lane.

What could be the best strategy to align these, align one by one, or merge fastq first. The context is ChIP-Seq.

Thanks for response.
epi is offline   Reply With Quote
Old 04-27-2012, 04:57 PM   #2
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

since alignments are alignments you could align them separately and output as SAM files then use Samtools to merge and sort the alignments.
sdriscoll is offline   Reply With Quote
Old 04-30-2012, 05:53 AM   #3
epi
Member
 
Location: USA

Join Date: Jan 2012
Posts: 38
Default

Thanks for response. But will the alignment be not different when aligned together or in isolation. eg unique matches
epi is offline   Reply With Quote
Old 04-30-2012, 06:14 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The individual alignments will be the same regardless. Depending on how you made your library, it might make sense to align the lanes separately (for accurate PCR duplicate calling, which is presumably what you meant by "unique match"). Aside from that, there's no difference aside from the number of keystrokes required.
dpryan is offline   Reply With Quote
Old 04-30-2012, 06:33 AM   #5
epi
Member
 
Location: USA

Join Date: Jan 2012
Posts: 38
Default

You point is well taken. But there is another situation in addition, which is when you want only reads matching to genome at one place, not various. If you align in batches, you can not have this done accurately.
epi is offline   Reply With Quote
Old 04-30-2012, 07:17 AM   #6
Alex Renwick
Member
 
Location: Houston, Texas

Join Date: Jul 2011
Posts: 44
Default

Quote:
Originally Posted by epi View Post
You point is well taken. But there is another situation in addition, which is when you want only reads matching to genome at one place, not various. If you align in batches, you can not have this done accurately.
Could you explain more what you mean by this? Typically, each read is aligned independently of others, then the results are merged for subsequent analysis.
Alex Renwick is offline   Reply With Quote
Old 04-30-2012, 07:47 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by epi View Post
You point is well taken. But there is another situation in addition, which is when you want only reads matching to genome at one place, not various. If you align in batches, you can not have this done accurately.
This is a bit ambiguous in English. "reads matching to genome at one place" can either mean "uniquely mapped reads" (most likely you mean this) or "reads mapping only to a specific region of the genome" (presumably you don't mean that). In neither case will the results differ depending on whether you invoke bowtie once or multiple times. I recall there being auxiliary flags that indicate multiple alignments of which only one was returned and/or a flag to just not return those (something like -m in bowtie1, haven't used it in a while though).

As you quoted Simon Andrews as saying in another thread, "For straight forward alignments (Bowtie, BWA etc) then the two operations would be the same".
dpryan is offline   Reply With Quote
Old 04-30-2012, 09:04 AM   #8
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Quote:
Originally Posted by epi View Post
You point is well taken. But there is another situation in addition, which is when you want only reads matching to genome at one place, not various. If you align in batches, you can not have this done accurately.
You might be thinking of this backwards. Each read, of which you have millions, is unique but could in fact all align to the same genomic region. What is meant by unique alignments in RNA-Seq is for each read to only be able to align in one spot. What you WANT is for reads to align on top of one another....that's how we are able to measure gene expression and do anything, really.

Just align with Bowie using the -m 1 -k 1 options. That will produce unique alignments per read.
sdriscoll is offline   Reply With Quote
Old 05-02-2012, 05:08 AM   #9
epi
Member
 
Location: USA

Join Date: Jan 2012
Posts: 38
Default

Nice to see the discussion. I guess it depends on individual experiment how much of an issue PCR duplicates might be. Won't this be a good practice to always merge fastq before align to remove any possible bias.
epi is offline   Reply With Quote
Old 05-02-2012, 05:54 AM   #10
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by epi View Post
Nice to see the discussion. I guess it depends on individual experiment how much of an issue PCR duplicates might be. Won't this be a good practice to always merge fastq before align to remove any possible bias.
If you want to remove PCR duplicates, then you should merge all data before removing PCR duplicates if all of the data comes from the same prepped library. If the data comes from different prepped libraries, you should merge after removing the duplicates.
Heisman is offline   Reply With Quote
Old 05-02-2012, 08:36 AM   #11
epi
Member
 
Location: USA

Join Date: Jan 2012
Posts: 38
Default

Quote:
Originally Posted by Heisman View Post
If you want to remove PCR duplicates, then you should merge all data before removing PCR duplicates if all of the data comes from the same prepped library. If the data comes from different prepped libraries, you should merge after removing the duplicates.
In other words, if library corresponds to sample, which i believe is the case with the data I have, same sample run in multiple lanes should be merged and then aligned.
This clarifies a lot. I have heard some opinions from bioinformaticians that this is immaterial. In fact, even further breaking down the fastq into smaller fragments (for whatever reasons) should not matter for alignment.
epi is offline   Reply With Quote
Old 05-02-2012, 08:56 AM   #12
Alex Renwick
Member
 
Location: Houston, Texas

Join Date: Jul 2011
Posts: 44
Default

Quote:
Originally Posted by epi View Post
In other words, if library corresponds to sample, which i believe is the case with the data I have, same sample run in multiple lanes should be merged and then aligned.
This clarifies a lot. I have heard some opinions from bioinformaticians that this is immaterial. In fact, even further breaking down the fastq into smaller fragments (for whatever reasons) should not matter for alignment.
Heisman points out that if you have different samples you should align first, remove duplicates, then merge. You conclude that since you have just one sample, you need to merge first and then align. That conclusion does not logically follow. The fallacy is common enough to have it's own name: Denial of the Antecedent.

It really sounds like you had your mind made up before coming here with your question. Everyone who responded has told you that it doesn't matter whether you align then merge or vice versa. You don't have to believe them, but if someone takes the time to offer guidance you should at least do them the curtesy of plainly stating the basis of your disagreement.
Alex Renwick is offline   Reply With Quote
Old 05-07-2012, 10:53 AM   #13
rnaseek
Member
 
Location: USA

Join Date: Nov 2011
Posts: 22
Default

I think it is better to do the alignment individually. This will help check for lane specific biases, if there is any. In addition, aligning individually will help do the alignment in parallel.
rnaseek is offline   Reply With Quote
Old 05-08-2012, 08:08 AM   #14
analyst
Member
 
Location: US

Join Date: Jan 2011
Posts: 18
Default

When using splice aligners for RNA-Seq, must merge and then align for obvious reasons. For regular aligners (bowtie etc.) I still do merge first and remove PCR duplicates and then align. As far as speed, it does not bother me as it takes only a few minutes to align anyways. Also using a parellelized tool as bowtie, I would rather dedicate all available nodes to merged lane than splitting them among 2 individual lanes running simultaneously. After all you have to merge them at some stage anyways for the actual analysis, file management can be cleaner to do it right from the beginning. I see from comments people do it alternatively as well, I guess its just my preference for the analysis. I also do not understand Alex's comments, epi's interpretation of Heisman's response seems fine.
Logically, it should not matter if you can take care of PCR duplicates at some stage in your pipeline. But practically, i have some strange experiences using combination of publicly available tools and their behavior. I will have to do a complete analysis by myself to believe if splitting would cause any real issue or not. if anyone has gone on to do the same, please share here. With ll due respect, I am sticking to my approach till then.

Last edited by analyst; 05-08-2012 at 08:13 AM.
analyst is offline   Reply With Quote
Old 05-10-2012, 05:15 AM   #15
epi
Member
 
Location: USA

Join Date: Jan 2012
Posts: 38
Default

Thanks for commenting analyst, I just don't care about responses like Alex's. Unfortunately he is not the only person in public forums and in scientific world who like to get personal in scientific discussion. Basically, it seems they try to push their own agenda and preferences onto the other without even understanding what is being discussed, like this case. May be he is a big advocate of one particular strategy and feels insecure if some one even mentions any other. Or may be he just is looking for places to use the phrase of the day he learnt, this tendency is even more common and has it's own name: talking through the hat. Unlike his example, this even fits.
But overall this is an excellent forum with good collection of people and experts. Actually, I am not familiar with the steps upstream of the NGS data generation, like sample and library prep, so I feel I am more educated after these discussions. Some people state their opinion and some even the reasons behind it, both are useful.
epi is offline   Reply With Quote
Old 05-10-2012, 07:04 AM   #16
Alex Renwick
Member
 
Location: Houston, Texas

Join Date: Jul 2011
Posts: 44
Default

Quote:
Originally Posted by epi View Post
Thanks for commenting analyst, I just don't care about responses like Alex's. Unfortunately he is not the only person in public forums and in scientific world who like to get personal in scientific discussion. ... But overall this is an excellent forum ...
I apologize if my tone sounded personal. My intent was to invite discussion that would clarify the points of disagreement. You seemed not to accept the unanimous opinion of the other commentors without quite saying why. I agree that this is an excellent forum, and it is so because people make an effort to understand each other and offer discussion. In this thread I saw that effort coming from one side only.

Edit: After rereading it, I see I misunderstood your response to Heisman. I still think your conclusion is mistaken, but it's not the mistake I indicated previously.

Last edited by Alex Renwick; 05-10-2012 at 07:18 AM.
Alex Renwick is offline   Reply With Quote
Old 05-11-2012, 12:10 PM   #17
epi
Member
 
Location: USA

Join Date: Jan 2012
Posts: 38
Default

Quote:
Originally Posted by Alex Renwick View Post
My intent was to invite discussion that would clarify the points of disagreement.
Nobody can imagine that after reading your message.

Quote:
Originally Posted by Alex Renwick View Post
You seemed not to accept the unanimous opinion of the other commentors without quite saying why.
There is no unanimous opinion, and I did not disagree with anything. It is obvious I do not have an opinion that is why I am asking this question, please read title post of the thread.

Quote:
Originally Posted by Alex Renwick View Post
In this thread I saw that effort coming from one side only.

I still think your conclusion is mistaken, but it's not the mistake I indicated previously.
This sums up your attitude here. You are certain that others are at mistake, no matter what. if it is not one thing, it must be another. You are convinced that others are arguing and challenging the very foundation of your thoughts and understanding. Basically you suffer from the same mentality that you are accusing others of, and this seems difficult for you to fathom.

Anyways, that's all I have to say, I am not going to continue this discussion as it is not a productive activity (for me).
epi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO