SEQanswers

Go Back   SEQanswers > Applications Forums > Genomic Resequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Understanding the qmap bis-seq format gwilson Epigenetics 3 10-31-2016 12:47 PM
SAM/BAM format to wiggle format pinki999 Bioinformatics 19 08-12-2015 12:35 AM
Understanding VCF format ketan_bnf Bioinformatics 48 07-09-2014 08:24 PM
problems understanding pileup format pi101 Bioinformatics 2 11-14-2012 02:47 PM

Reply
 
Thread Tools
Old 02-28-2011, 06:05 AM   #1
Joker!sAce
Member
 
Location: Denmark

Join Date: Feb 2011
Posts: 21
Thumbs up Understanding BAM format.

Hi,

I have this output in BAM format.

Quote:
NA06984-SRR006041.1145152 1040 1 113040605 57 325M * 0 0 TTGATCACTTCACACACATCTTCATCGATGAGGCTGGCCA
CTGCATGGAGCCTGAGAGTCTGGTAGCTATAGCAGGTGAGGGACTCAGGTGGGGCTGCAGGTATACACCCTGTGTGGGTCAGAGAGGTTGCACCACTTACCTTTCTTCCCACACCTCTTCTGCTTCCCAGGGCTGATGGAAGTA
AAGGAAACAGGTGATCCAGGAGGGCAGCTGGTGCTGGCAGGAGACCCTCGGCAGCTGGGGCCTGTGCTGCGTTCCCCACTGACCCAGAAGCATGGACTGGGATACTCACTGCTGGAGCGGCTGCTCACCTACAACTCCCTG 7
99::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::88:::::::::::;;;;;;;;;;;;;::888:;;;;;;;;;;;;;;;;;;;;;;;;;;
;;888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:9::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;::::;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;::::::::::::::::::::::::: RG:Z:SRR006041 NM:i:0
(This is data from the 1000 genomics project.)

I'm constructing a pipeline to study variations (I get fast-q sequence, index it, align it to ref.seq hg18, do a couple of format conversions and get BAM, call indels and snps, add them to a db, call larger variations, look if they've been reported before, give out fancy graphs and charts, display the alignment, submit a report).

I'm learning about BWA aligner and the BAM format right now. I'm using pilot data on un-aligned sequences from the 1000 genomes project (because I will have similar BAM outputs).

I have to study and make sense out of this BAM format. I've read this tutorial on understanding the SAM/ BAM format with little help. Could someone give me further pointers?

Thanks a lot!
Joker!sAce

Last edited by Joker!sAce; 02-28-2011 at 06:15 AM.
Joker!sAce is offline   Reply With Quote
Old 02-28-2011, 10:37 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

What specific questions about the format do you have?
nilshomer is offline   Reply With Quote
Old 02-28-2011, 11:36 AM   #3
Joker!sAce
Member
 
Location: Denmark

Join Date: Feb 2011
Posts: 21
Default

I understand that there are a lot of columns in this record.

NA06984-SRR006041.1145152
1040
1
113040605
57
325M
*
0
0
TTGATCACTTCACACACATCTTCATCGATGAGGCTGGCCACTGCATGGAGCCTGAGAGTCTGGTAGCTATAGCAGGTGAGGGACTCAGGTGGGGCTGCAGGTATACACCCTGTGTGGGTCAGAGAGGTTGCACCACTTACCTTTCTTCCCACACCTCTTCTGCTTCCCAGGGCTGATGGAAGTAAAGGAAACAGGTGATCCAGGAGGGCAGCTGGTGCTGGCAGGAGACCCTCGGCAGCTGGGGCCTGTGCTGCGTTCCCCACTGACCCAGAAGCATGGACTGGGATACTCACTGCTGGAGCGGCTGCTCACCTACAACTCCCTG
799::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::88:::::::::::;;;;;;;;;;;;;::888:;;;;;;;;;;;;;;;;;;;;;;;;;;;;888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:9::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;::::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:::::::::::::::::::::::::
RG:Z:SRR006041
NM:i:0

I'd like to know what they mean. I do have faint ideas but I'd like to know about it anyways.
Joker!sAce is offline   Reply With Quote
Old 02-28-2011, 12:20 PM   #4
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

You'll get much better answers if you post specific questionswhich can't be easily found in the SAM format documentation.
krobison is offline   Reply With Quote
Old 03-10-2011, 03:32 AM   #5
Joker!sAce
Member
 
Location: Denmark

Join Date: Feb 2011
Posts: 21
Default

My study involves divergence study on the gene p53 on short arm of chromosome 17. I need to extract this part of the sequence.

I understand that I can do this in two ways:
1. Get raw fasta reads.
2. Extract from the aligned(to hg18) data(in BAM format).

How do I do it the 2'nd part?
Joker!sAce is offline   Reply With Quote
Old 03-10-2011, 08:31 AM   #6
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

If you know the chromosomal coordinates for your gene (which you can find in the UCSC files or via the browser), then SAMtools can extract this efficiently
krobison is offline   Reply With Quote
Old 03-16-2011, 03:30 PM   #7
Joker!sAce
Member
 
Location: Denmark

Join Date: Feb 2011
Posts: 21
Default

This sequence has been aligned to hg18. I know the chromosomal co-ordinates for hg18 (chr17:7,520,037-7,531,588 - That's the tp53 repressor gene)

How do I proceed from here?
Joker!sAce is offline   Reply With Quote
Old 03-16-2011, 06:55 PM   #8
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

samtools view aligned.bam chr17:7520037-7531588 > tp53.sam
krobison is offline   Reply With Quote
Reply

Tags
1000 genomes project, bam, bam files, understanding bam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO