SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Reverse engineering BAM files: BAM -> FASTQ gene coder Bioinformatics 3 01-03-2012 02:42 PM
Comparing and combining vcf files gavin.oliver Bioinformatics 3 11-30-2011 02:16 AM
How to get mapped reads ID from bam files using samtools? fabrice Bioinformatics 15 08-20-2011 06:03 PM
NEw to Chip-seq and have .bam/.sam/.bam.bai files... then what? NGS newbie Bioinformatics 11 05-25-2011 07:48 AM
Counting entries in bam-files: difficulties with samtools and bamToBed Azazel Bioinformatics 4 01-17-2011 07:31 PM

Reply
 
Thread Tools
Old 02-13-2012, 06:22 PM   #1
khoops66
Junior Member
 
Location: Grand Forks, ND

Join Date: Jan 2012
Posts: 5
Default Comparing two BAM files using SAMtools

Hello, SEQanswers.

This is my first time posting on this forum, I've been lurking for sometime now. I am currently attempting to compare specific chromosomes from different BAM files using the following command.
Code:
samtools view <file.bam>
I'm curious as to what the .bai files are for also, and I'm unsure if I need these for accuracy purposes. In the end, all I really want to accomplish is comparing and view two different files simultaneously. Any help with this would be greatly appreciated.
khoops66 is offline   Reply With Quote
Old 02-13-2012, 11:59 PM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

The BAI files are optional indexes for BAM files used for efficient random access of coordinate sorted BAM files (e.g. Show me all the reads mapped to region 100 to 200 of chr7 is done via the BAI file to avoid scanning the entire BAM file).

Be careful about the sorting when comparing two SAM or BAM files.
maubp is offline   Reply With Quote
Old 02-14-2012, 02:33 AM   #3
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

You need to tell us what you mean by compare.
pbluescript is offline   Reply With Quote
Old 02-14-2012, 02:56 AM   #4
Crypticfortune
Member
 
Location: Tokyo, Japan

Join Date: Jul 2011
Posts: 14
Smile Recommending Samscope

I think I might have a toy for you! Go check out Samscope. You can then open both bam files with a command like:

Code:
samscope foo.bam bar.bam
First you'll be looking at coverage data from foo.bam. Then hit w to open up a second window, which will be showing coverage data from bar.bam by default. Scroll and zoom and both windows will stay synchronized.

Depending on what features you want to view and compare, you may or may not need the .bai index files. Samtools tview requires .bai indexes, and if you want to see individual reads, so does samscope. If you just want to see aggregate features like coverage, samscope will work fine without indexes.

The .bai files are simply indexes of the .bam files that say basically "reads at chr1:4096 start at byte 18010 of the .bam file!" kind of thing. This allows finding all reads that map to a given position without plowing through the whole bam file searching. Note that the .bam file has to be sorted first for indexes to be created ("samtools sort foo.bam foo.sorted" will do that. Also, samscope has scripts to handle all that too).

If you're hooked on samtools, you may want to consider the "samtools pileup" or "samtools tview" commands, but this can be hard to read/view for large regions. Tools like IGV or Tablet might also help.
Crypticfortune is offline   Reply With Quote
Old 02-14-2012, 07:04 AM   #5
khoops66
Junior Member
 
Location: Grand Forks, ND

Join Date: Jan 2012
Posts: 5
Default

Quote:
Originally Posted by pbluescript View Post
You need to tell us what you mean by compare.
I apologize, like I said I'm new here. Okay, my boss (I'm a freshman at a University) asked me to learn how to use samtools to 'compare' two .bam files and analyze the indels (inserstions/deletions) of those files. I have the .bam files downloaded with their corresponding .bai files. All I'd really like to get accomplished is use something like the ./samtools view file.bam chr1:100-200 file2.bam chr1:100-200. Is there a command line to accomplish something similar to this?
khoops66 is offline   Reply With Quote
Old 02-14-2012, 08:02 AM   #6
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by khoops66 View Post
I apologize, like I said I'm new here. Okay, my boss (I'm a freshman at a University) asked me to learn how to use samtools to 'compare' two .bam files and analyze the indels (inserstions/deletions) of those files. I have the .bam files downloaded with their corresponding .bai files. All I'd really like to get accomplished is use something like the ./samtools view file.bam chr1:100-200 file2.bam chr1:100-200. Is there a command line to accomplish something similar to this?
Use samtools mpileup with the -r option to specify your region. This will give you the indels and snps in that region of interest.

From there it depends on exactly how you want to compare the two bam files. If you want to see which variants are in one bam and not the other, then you will have to run mpileup individually on each bam, then compare the output using something like bedtools.
chadn737 is offline   Reply With Quote
Old 02-14-2012, 08:44 AM   #7
khoops66
Junior Member
 
Location: Grand Forks, ND

Join Date: Jan 2012
Posts: 5
Default

Quote:
Originally Posted by chadn737 View Post
Use samtools mpileup with the -r option to specify your region. This will give you the indels and snps in that region of interest.

From there it depends on exactly how you want to compare the two bam files. If you want to see which variants are in one bam and not the other, then you will have to run mpileup individually on each bam, then compare the output using something like bedtools.
This will also allow me to compare the indels of two files at the same time? I'd like to look at chr1:554200-554700 of both of the following bam files for indels.
khoops66 is offline   Reply With Quote
Old 02-14-2012, 08:48 AM   #8
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by khoops66 View Post
This will also allow me to compare the indels of two files at the same time? I'd like to look at chr1:554200-554700 of both of the following bam files for indels.
Specify what you mean by compare? Do you want to find indels in common between the two samples? Do you want to find indels that differ between the two samples? Do you want to simply find all indels in both files?

Specify very clearly what you want to accomplish, exactly what sort of output you want, and its much easier.
chadn737 is offline   Reply With Quote
Old 02-14-2012, 08:53 AM   #9
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Yeah, like another poster said, you can do this with samtools mpileup. Though you can analyze both samples together, along with the reference fasta. SAMTools mpileup will take multiple .bams as input, and the output vcf you eventually make will have one column for each sample. You can also tell samtools mpileup to only make its files for that region of that chromosome.

All this assumes that the .bam was made with software that will actually call indels.

Though with such a small project, you could also eyeball. You can use the Broad's IGV to just look at the .bams in this region. You'll see discrepancies between the read and the reference, and you can load multiple .bams at once. (espeically if you use samtools view to trim them down to just the region you want).
swbarnes2 is offline   Reply With Quote
Old 02-14-2012, 08:55 AM   #10
khoops66
Junior Member
 
Location: Grand Forks, ND

Join Date: Jan 2012
Posts: 5
Default

Quote:
Originally Posted by chadn737 View Post
Specify what you mean by compare? Do you want to find indels in common between the two samples? Do you want to find indels that differ between the two samples? Do you want to simply find all indels in both files?

Specify very clearly what you want to accomplish, exactly what sort of output you want, and its much easier.
I'd like to find the similarities and differences between the two files. And since I only need to use chr1 of the two files I'm attempting to cut these down in size as well.
khoops66 is offline   Reply With Quote
Old 02-14-2012, 09:46 AM   #11
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by khoops66 View Post
I'd like to find the similarities and differences between the two files. And since I only need to use chr1 of the two files I'm attempting to cut these down in size as well.
Do you want to get coverage, and error rate beween the two .bams too?

If you want to call SNPs and indels, you'll also need the reference sequence that was used to align the files. The exact file is best, if possible, because the names of the chromosomes need to be a match to the names in your .bam

You can use 'samtools view' to winnow down your .bams to just the region you want. Or, you can tell 'samtools mpileup' what region to look at.

Have you even looked at the command line options for those programs?

You will get much better responses here if you demonstrate that you've spent 5 minutes trying to do this on your own. You do that by saying "This is the command line I tried, this is what the file I generated looks like, these are the things I looked at to try and figure out why it's not working the way I want it to". Basically, you need to demonstrate that you have spent at least as much time really trying to work it out as you expect some stranger to spend helping you.

Even if people wanted to hold your hand and tell you exactly what to do, they can't do that if you don't say exactly what you are trying to do. "I want to find all the similarities and differences" is too vague. For all we know, you are looking for CNVs, or trying to calculate capture efficiency. "I'm looking for SNPs and small indels" is a lot more precise.
swbarnes2 is offline   Reply With Quote
Old 02-14-2012, 09:55 AM   #12
khoops66
Junior Member
 
Location: Grand Forks, ND

Join Date: Jan 2012
Posts: 5
Default

Quote:
Originally Posted by swbarnes2 View Post
You will get much better responses here if you demonstrate that you've spent 5 minutes trying to do this on your own. You do that by saying "This is the command line I tried, this is what the file I generated looks like, these are the things I looked at to try and figure out why it's not working the way I want it to". Basically, you need to demonstrate that you have spent at least as much time really trying to work it out as you expect some stranger to spend helping you.

Even if people wanted to hold your hand and tell you exactly what to do, they can't do that if you don't say exactly what you are trying to do. "I want to find all the similarities and differences" is too vague. For all we know, you are looking for CNVs, or trying to calculate capture efficiency. "I'm looking for SNPs and small indels" is a lot more precise.

Well, I'm a freshman CS major and I'm doing a work study. I don't mean to plead ignorance, and I apologize for being vague earlier. I am just trying to get this all figured out. I'm reading research papers and documentation for different programs that I've never used. I truly appreciate your help, though. Thank you.

I have a work meeting shortly and hope to get a better understanding of what it is we're looking for in specific from these files. Again, thank you for your patience, I don't mean to be a bother.
khoops66 is offline   Reply With Quote
Old 02-14-2012, 07:06 PM   #13
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Parallel discussion (didn't realize reddit was turning into a place for this, hmmn):
http://www.reddit.com/r/bioinformati...sing_samtools/
nilshomer is offline   Reply With Quote
Reply

Tags
bai, bam, compare, samtools, view

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:14 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO