I used the latest version of BWA. I tried the program 4 different ways on the same paired-end sequence to see which gives me the best quality.
First way involved using mem. I used one paired-end read that had the adaptor sequences chopped off. I then chopped off poor quality bases from that same file and ran BWA again.
Second way involved using aln and sampe. I tried this two different ways like the first way.
After this process, I used samtools for each sam file produced. For each sam file, I converted to bam. Then I sorted the bam file. Then I used the index command on the bam file. Finally I used idxstats for stats.
My questions:
1. After using bwa to align/map and then using samtools to sort and index, I checked out each final bam file by converting them to a sam file and I viewed them in the terminal.
I couldn't seem to find the chromosome, I think in the third column. Why?
Example from SAM file:
2. What does the last line mean after running idxstats?
Serratia 5113802 307778 2900
* 0 0 155004
And just for clarification, the first line reads reference sequence name, sequence length, # of mapped reads and # of unmapped reads?
First way involved using mem. I used one paired-end read that had the adaptor sequences chopped off. I then chopped off poor quality bases from that same file and ran BWA again.
Second way involved using aln and sampe. I tried this two different ways like the first way.
After this process, I used samtools for each sam file produced. For each sam file, I converted to bam. Then I sorted the bam file. Then I used the index command on the bam file. Finally I used idxstats for stats.
My questions:
1. After using bwa to align/map and then using samtools to sort and index, I checked out each final bam file by converting them to a sam file and I viewed them in the terminal.
I couldn't seem to find the chromosome, I think in the third column. Why?
Example from SAM file:
Code:
M00532:8:000000000-A17VF:1:1101:16380:1451 83 Serratia 3298780 29 229M1S = 3298620 -389 TGTCGTTCGCCAACTTCAGCGTGCTCTGGACCTCAATGGCCTTTNTGCTCGCCGCGCCGCCGTTCAACTATTCCGAGGGAGTGATCGGGCTGTTCGGCCTGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCANCTGGCGGACAAAGGCAAGGCCGGNCTGACNACCACCGTCGGCCTGGTGTTNCTGCTGCTGTCCTGGATCCCTATCGCGTTCGCCAAN D>ED>4'8?1*1*?*AEED>FEA?A1*A???:??A?8A8)8800#;DDDDDDDD?D8D;ECECA?E?C?CC;EDFEEEFFFEDDDDEE?:DDDDDA8)0)0.#####################################?44#EEEEEEFFFFFFFFFFHHFF@?4#HFD?5#HHHEHHHHHHHIHIHHFEA5#IIHHIHIIIHHIIIFFFFFBDDDDDDDD@@???<5# XT:A:M NM:i:49 SM:i:29 AM:i:29 XM:i:7 XO:i:0 XG:i:0 MD:Z:44C34G22T0G0G0G0C0G0C0C0G0C0C0G0G0G0G0C0G0C0T0G0G0C0C0G0C0T0T0C0G0C0G0C0G0C0C0G0G3T14T2A5G0T4C20G0T1A26A5
2. What does the last line mean after running idxstats?
Serratia 5113802 307778 2900
* 0 0 155004
And just for clarification, the first line reads reference sequence name, sequence length, # of mapped reads and # of unmapped reads?
Comment