![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Merge multiple *.sai files in bwa ? | hosseinv | Bioinformatics | 4 | 09-15-2012 06:50 PM |
Merge multiple fq read files | hosseinv | Bioinformatics | 7 | 08-23-2012 07:36 PM |
Does the read depth distribution(not read depth) matters | yuhao | Bioinformatics | 0 | 08-09-2012 06:22 AM |
BWA: specifying SAM/BAM file header fields before read alignment? | nora | Bioinformatics | 3 | 12-04-2010 10:11 PM |
PubMed: Metagenomics: Read length matters. | Newsbot! | Literature Watch | 0 | 02-13-2008 12:48 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]()
Hi,
I have multiple fastq files coming from different sequencing lanes but of the same tissue origin. So I want to align them using BWA. I have tried two ways of doing this: a: BWA align individual fastq and get bam file for each and then use samtools to merge the individual bam file into one final bam file b: Merge fastq first, then BWA align and end up with one bam file. The results differs only a little bit (7 reads out of 100+million) when I counted the number of mapped reads. But I thought they should be the same. Does anyone know why is the difference between method a and b? Thanks! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 535
|
![]()
I haven't used BWA but perhaps there is inherent randomness in regards to your settings with mapping reads that don't map uniquely. Regardless, if only 7 out of 100+ million reads are different, it probably won't be significant downstream. How did you determine only 7 reads were different?
Also, different topic, but I prefer using Picard to merge .bam files if you have proper @RG headers in each one as it will maintain those. |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]() Quote:
Do you mean samtools merge don't maintain those header lines? Last edited by gene_x; 07-12-2013 at 12:35 PM. |
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 535
|
![]()
In the header file I don't believe it does, although it may in a field for each aligned read. This also only matters if you have the aligner include header information when you align. If you do and you have two separate .bam files now (with different header information), you can go ahead and try samtools merge and also Picard MergeSamFiles separately and see exactly what it keeps with samtools view -h <merged.bam>. I don't remember exactly what they do differently so I don't want to guess incorrectly.
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]()
I didn't realize I can look at the bam file...
Just checked it since I have done it and the header is there and are the same between the merged bam file using samtools merge and the bam file produced from aligning merged fastq files. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 535
|
![]()
Right, I think the difference would be if you have different "@RG\tID:" tags for each file that you merge together. I just checked and it doesn't affect the aligned reads themselves, just the header section. For example, one .bam file has "RG:A1_ACAGT_L003" and another has "RG:A2_AGTGAC_L005", and those show up with the RG flag in each aligned read, that will be maintained in the merged file with either program. With samtools merge, however, the header lines will only have one of the RG lines, while with Picard it will have both of them.
If you just type "samtools merge" there's a note at the bottom that actually says this. |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]()
I asked you in another post what RG LB IDs are..
Do you mean with Picard, it will have both of the header files (no matter they are the same or different) in the beginning of the alignment file? When I run samtools merge with a bunch of bam files. I didn't specify any options. All the bam file are alignment of reads coming from the same sequencer (with reads from multiple lanes). Then the merged bam file has the same header as the individual bam file. I guess it just retain the header when the to-be-merged bam files have the same header. |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 535
|
![]()
Yeah, that sounds right. If these are all from the same library just sequenced many times it doesn't really matter. If these are all from the same sample but different libraries and each library was sequenced one time it still probably doesn't matter too much. If there are different libraries and some were sequenced more than once, then this will become more important for properly removing duplicate reads.
Try to read through what I linked in the other thread and if there are still things unclear feel free to provide a more thorough description/flowchart of the overall experiment (detailing the number of samples, how many libraries were made of each sample, and how many times each library was sequenced) and we can try to figure out the best way to analyze the data. |
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: MO Join Date: May 2010
Posts: 108
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|