![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Separating a .sam or .bam file that is based on alignment to multiple sequences | sequence_hard | Bioinformatics | 7 | 02-09-2016 07:05 AM |
Write the subset of reads from BAM file into new SAM/BAM file, using R tools. | Old Pioneer | Bioinformatics | 0 | 01-27-2016 05:41 AM |
Alignment produces BAM file with sorted reads, but cannot see alignment | alch | Bioinformatics | 11 | 04-14-2015 06:59 PM |
Fastest way to extract differing positions from each alignment in a BAM file | CHRYSES | Bioinformatics | 5 | 12-14-2011 12:28 PM |
BWA: specifying SAM/BAM file header fields before read alignment? | nora | Bioinformatics | 3 | 12-04-2010 10:11 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Sri Lanka Join Date: Aug 2016
Posts: 6
|
![]()
Hi,
I have a PE dataset 300bp inserts by illumina MiSeq. I aligned the raw data using BWA-mem. Mapping statistics generated using Samtools flagstat are below. 5541008 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 76008 + 0 supplementary 0 + 0 duplicates 5413610 + 0 mapped (97.70% : N/A) 5465000 + 0 paired in sequencing 2732500 + 0 read1 2732500 + 0 read2 5266140 + 0 properly paired (96.36% : N/A) 5319406 + 0 with itself and mate mapped 18196 + 0 singletons (0.33% : N/A) 32368 + 0 with mate mapped to a different chr 8821 + 0 with mate mapped to a different chr (mapQ>=5) I also used Trimmomatic on the same dataset, ILLUMINACLIP to remove any adapter sequences, trimmed reads sliding window 4:10, leading & trailing bases <3, length <39bp. Aligned this set using BWA-mem and got the results as below. 5529752 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 65642 + 0 supplementary 0 + 0 duplicates 5396698 + 0 mapped (97.59% : N/A) 5464110 + 0 paired in sequencing 2732055 + 0 read1 2732055 + 0 read2 5263982 + 0 properly paired (96.34% : N/A) 5308488 + 0 with itself and mate mapped 22568 + 0 singletons (0.41% : N/A) 23856 + 0 with mate mapped to a different chr 4865 + 0 with mate mapped to a different chr (mapQ>=5) 1) Can I use this information to select a best alignment based on mapped %. Raw data gave 97.7% mapping which is higher than trimmed data. So can I select BAM I got from raw data as the best? 2) I used "samtools view -c -f 3 data.bam" to find the properly paired reads. But the value I got is different to the value for that parameter by flagstat for both datasets. I checked some other parameters like itself & mate mapped they too gave different results. What could be the reason. Appreciate your answers. Thanks in advance. Regds Rangika |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Vienna Join Date: Oct 2011
Posts: 123
|
![]()
Hi Rangika,
2nd first: You need to be aware of the fact that samtools flagstat produces statistics on alignments. Meaning, a read can align multiple time and will occur multiple times in the flagstat output. You may check your alignment file with e.g. bam_stat.py from the RSeQC tools. Furthermore, I'd check the read files with FastQC before and after trimming. So: 1) I'd check a set of different data sets to choose which way to go. Also, I would not rely on the %mapped from samtools flagstat. Cheers, Michael |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Sri Lanka Join Date: Aug 2016
Posts: 6
|
![]()
Thank you Michael. My dataset is DNA-seq. Can I use RSeQC tools to check alignment for DNA data as well. Do you suggest RSeQC statistics would lead in to better BAM selection?
Appreciate if you would clarify this a bit more since I'm new to this. Regards Rangika |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Vienna Join Date: Oct 2011
Posts: 123
|
![]()
The bam_stat.py was a suggestion since it also works for DNA-seq alignments (you'll hopefully don't see spliced reads).
You can also have a look at the QC-metrics from Picard tools, or have a look at GATK. Or you can extract the aligned reads (samtools view) and count e.g. how often each read is aligned. Without trimming you might have a high %mapping rate given by samtools flagstat, but you don't know how many reads were aligned with a high confidence to a single or few positions. Most of the library preps have also a small section of how to deal with the analysis. Additionally, there are a plethora of publications describing their approach to DNA-Seq analysis. Cheers, Michael |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Sri Lanka Join Date: Aug 2016
Posts: 6
|
![]()
Thank you Michael for your answer.
Regards Sumudu |
![]() |
![]() |
![]() |
Tags |
bwa-mem, mapping statistics |
Thread Tools | |
|
|