Hi,
I have this output in BAM format.
(This is data from the 1000 genomics project.)
I'm constructing a pipeline to study variations (I get fast-q sequence, index it, align it to ref.seq hg18, do a couple of format conversions and get BAM, call indels and snps, add them to a db, call larger variations, look if they've been reported before, give out fancy graphs and charts, display the alignment, submit a report).
I'm learning about BWA aligner and the BAM format right now. I'm using pilot data on un-aligned sequences from the 1000 genomes project (because I will have similar BAM outputs).
I have to study and make sense out of this BAM format. I've read this tutorial on understanding the SAM/ BAM format with little help. Could someone give me further pointers?
Thanks a lot!
Joker!sAce
I have this output in BAM format.
NA06984-SRR006041.1145152 1040 1 113040605 57 325M * 0 0 TTGATCACTTCACACACATCTTCATCGATGAGGCTGGCCA
CTGCATGGAGCCTGAGAGTCTGGTAGCTATAGCAGGTGAGGGACTCAGGTGGGGCTGCAGGTATACACCCTGTGTGGGTCAGAGAGGTTGCACCACTTACCTTTCTTCCCACACCTCTTCTGCTTCCCAGGGCTGATGGAAGTA
AAGGAAACAGGTGATCCAGGAGGGCAGCTGGTGCTGGCAGGAGACCCTCGGCAGCTGGGGCCTGTGCTGCGTTCCCCACTGACCCAGAAGCATGGACTGGGATACTCACTGCTGGAGCGGCTGCTCACCTACAACTCCCTG 7
99::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::88:::::::::::;;;;;;;;;;;;;::888:;;;;;;;;;;;;;;;;;;;;;;;;;;
;;888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:9::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;::::;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;::::::::::::::::::::::::: RG:Z:SRR006041 NM:i:0
CTGCATGGAGCCTGAGAGTCTGGTAGCTATAGCAGGTGAGGGACTCAGGTGGGGCTGCAGGTATACACCCTGTGTGGGTCAGAGAGGTTGCACCACTTACCTTTCTTCCCACACCTCTTCTGCTTCCCAGGGCTGATGGAAGTA
AAGGAAACAGGTGATCCAGGAGGGCAGCTGGTGCTGGCAGGAGACCCTCGGCAGCTGGGGCCTGTGCTGCGTTCCCCACTGACCCAGAAGCATGGACTGGGATACTCACTGCTGGAGCGGCTGCTCACCTACAACTCCCTG 7
99::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::88:::::::::::;;;;;;;;;;;;;::888:;;;;;;;;;;;;;;;;;;;;;;;;;;
;;888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:9::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;::::;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;::::::::::::::::::::::::: RG:Z:SRR006041 NM:i:0
I'm constructing a pipeline to study variations (I get fast-q sequence, index it, align it to ref.seq hg18, do a couple of format conversions and get BAM, call indels and snps, add them to a db, call larger variations, look if they've been reported before, give out fancy graphs and charts, display the alignment, submit a report).
I'm learning about BWA aligner and the BAM format right now. I'm using pilot data on un-aligned sequences from the 1000 genomes project (because I will have similar BAM outputs).
I have to study and make sense out of this BAM format. I've read this tutorial on understanding the SAM/ BAM format with little help. Could someone give me further pointers?
Thanks a lot!
Joker!sAce
Comment