SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
changing chromosome notation in .BAM file ddaneels Bioinformatics 20 01-06-2017 05:54 AM
Convert 1000-Genomes-proje BAM to FASTA (aligned to reference, grouped by chromosome) ce.log Bioinformatics 17 01-13-2014 11:35 PM
Scripture genome/chromosome.fa file EBER RNA Sequencing 1 07-24-2012 09:57 AM
MiSeq pipeline, per-genome bam vs per chromosome bam & vcf swNGS Bioinformatics 1 03-29-2012 05:00 AM
chromosome match position in eland files litd General 0 03-02-2010 12:54 PM

Reply
 
Thread Tools
Old 10-26-2012, 01:47 PM   #1
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default How to get a summary for a special chromosome position in BAM file

I want to get a summary of a special chromosome position in a BAM file, for example:
I have chr1:100,000, how many times I got T, and how many times I got G...
Is there any code for this question? Or any software I can use? Thanks!
twotwo is offline   Reply With Quote
Old 10-26-2012, 03:06 PM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Caveat: I hope somebody comes up with something better!

Parsing the output of samtools mpileup may be the answer.
Here's a script to do it, just edit for samtools path ($SAMT) and pass arguments 1) chromsome 2) location 3) bamfile.
Problems may be : "one off" error. Didn't check (sorry) and may not really need UPSTREAM.

Here's the bash script ...
#####----------------------- START CODE

#set SAMT to your samtools on next line !!!
export SAMT=/h1/finneyr/samtools-0.1.18/samtools

# this file is called job.mptest and is in current directory ...
# example call ./job.mptest chr1 1011282 /tcga_next_gen/NG_buckets/bucket22/bam/C484.TCGA-06-0119-10A-01D-1490-08.5.bam

export CHR=$1
export PLACE=$2
export UPSTREAM=$(($PLACE-1))
export BAMFILE=$3

RANGE=${CHR}:${UPSTREAM}-${PLACE}

$SAMT mpileup -r $RANGE $BAMFILE 2> /dev/null | tail -1 | cut -f5 | sed -e "s/.\{1\}/&\n/g" | tr "[:lower:]" "[:upper:]" | grep "[A,C,G,T]" | sort | uniq -c
# sed command adds newline every 1 char, tr command converts lower to upper

####------------------- end code

example usage and output :
./job.mptest chr1 1011278 /tcga_next_gen/NG_buckets/bucket22/bam/C484.TCGA-06-0119-10A-01D-1490-08.5.bam
15 A
9 G

Last edited by Richard Finney; 10-26-2012 at 03:11 PM.
Richard Finney is offline   Reply With Quote
Old 10-26-2012, 11:23 PM   #3
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

samtools pileup
Chipper is offline   Reply With Quote
Old 10-29-2012, 07:16 AM   #4
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by Richard Finney View Post
Caveat: I hope somebody comes up with something better!

Parsing the output of samtools mpileup may be the answer.
Here's a script to do it, just edit for samtools path ($SAMT) and pass arguments 1) chromsome 2) location 3) bamfile.
Problems may be : "one off" error. Didn't check (sorry) and may not really need UPSTREAM.

Here's the bash script ...
#####----------------------- START CODE

#set SAMT to your samtools on next line !!!
export SAMT=/h1/finneyr/samtools-0.1.18/samtools

# this file is called job.mptest and is in current directory ...
# example call ./job.mptest chr1 1011282 /tcga_next_gen/NG_buckets/bucket22/bam/C484.TCGA-06-0119-10A-01D-1490-08.5.bam

export CHR=$1
export PLACE=$2
export UPSTREAM=$(($PLACE-1))
export BAMFILE=$3

RANGE=${CHR}:${UPSTREAM}-${PLACE}

$SAMT mpileup -r $RANGE $BAMFILE 2> /dev/null | tail -1 | cut -f5 | sed -e "s/.\{1\}/&\n/g" | tr "[:lower:]" "[:upper:]" | grep "[A,C,G,T]" | sort | uniq -c
# sed command adds newline every 1 char, tr command converts lower to upper

####------------------- end code

example usage and output :
./job.mptest chr1 1011278 /tcga_next_gen/NG_buckets/bucket22/bam/C484.TCGA-06-0119-10A-01D-1490-08.5.bam
15 A
9 G
Thanks Richard! However, when I ran the code, I did not get the output.
What I did is: I created a job.mptest file, and copy-paste your code in that. And change the samtools path. Then copy-paste this file into the bam file folder, and run the code you listed there. But there is no output. Is there anything wrong with my operation? Thanks so much!
twotwo is offline   Reply With Quote
Old 10-31-2012, 04:01 PM   #5
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

remove the " 2> /dev/null " to see error messages from samtools.

get rid of the RED colored text as is done here ...
[FONT="Courier New"]
$SAMT mpileup -r $RANGE $BAMFILE 2> /dev/null | tail -1 | cut -f5 | sed -e "s/.\{1\}/&\n/g" | tr "[:lower:]" "[:upper:]" | grep "[A,C,G,T]" | sort | uniq -c
Richard Finney is offline   Reply With Quote
Old 11-01-2012, 05:29 AM   #6
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Richard, Thank you very much!!!
I finally found that my index file is not appropriate. If a bam file is aaa.bam, my index file is aaa.bai. However, after I changed to aaa.bam.bai, it works very well. Thanks!
twotwo is offline   Reply With Quote
Old 01-25-2013, 11:01 AM   #7
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Hi, Richard,
I then found another question with your code. Here is my code on one chromosome position:

samtools mpileup -r chr1:205131282-205131283 /home/twotwo/1234.bam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
file idx 34926928 chr1 205131282 N 42 TTTTTTTTtTtTTTTTTTTTTTTttTTtTTttTtTTTtTTTt @BDB<BDAG?I@B@HCIGJIJJIEG>JAIG?DJBHF?5FFFB
chr1 205131283 N 43 TTTTTTTT$tTtTTTTTTTTTTTTttTTtTTttTtTTTtTTTt^]t CADDA>DCICICECEDIGJJIGJ?GDJEJGBBJ?GHF<HFFAB

The following is the result. For the position of chr1:205131283, after I ran all of your command(with the shell script after the bam file), I got 43 T. However, if I used the igvtools to view it, I got 43T and 3A. It seemed the bottom line should also be counted because it has 3 A. However, if the bottom line was counted, there are some C and G, which is not showing in the IGVtools. How can I modify it to get 3A??? Thank you very much for your help!





Quote:
Originally Posted by Richard Finney View Post
remove the " 2> /dev/null " to see error messages from samtools.

get rid of the RED colored text as is done here ...
[FONT="Courier New"]
$SAMT mpileup -r $RANGE $BAMFILE 2> /dev/null | tail -1 | cut -f5 | sed -e "s/.\{1\}/&\n/g" | tr "[:lower:]" "[:upper:]" | grep "[A,C,G,T]" | sort | uniq -c
twotwo is offline   Reply With Quote
Old 01-25-2013, 01:18 PM   #8
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

I'm looking at this. The parsing of the 5th field is naive and does not properly count the $,^,<,and> characters. It appears to be inadequate for all possible data for this field. "mpileup" is discussed at http://www.biostars.org/show/tag/mpileup/

Others have problems using this approach : http://blog.nextgenetics.net/?e=56

Dan Koboldt at Massgenomics blog talks about mpileup : ( from http://massgenomics.org/2012/03/5-th...s-mpileup.html )

The 6th field is quality, not sequence.

I have no idea how three As are showing up on IGV. Do you think IGV is right in this case?
Richard Finney is offline   Reply With Quote
Old 01-25-2013, 01:18 PM   #9
BAMseek
Senior Member
 
Location: St. Louis, MO, USA

Join Date: Apr 2011
Posts: 124
Default

Try adding the -A flag to mpileup (so "samtools mpileup -A . . . "). I've found that if you want to see exactly what you see in IGV, then you should include anomalous reads. Also, mpileup (as well as pileup and depth) have a ceiling cutoff for maximum depth at a position, so it is also helpful to specify a large ceiling if you think you'll hit the maximum (which I believe is 8000/number of samples).

Edit: Another thing worth checking - make sure you aren't off by 1 due to 0-offset or 1-offset. I believe mpileup and IGV are both 1-offset, so that should be fine but just something to keep in mind.

Justin

Last edited by BAMseek; 01-25-2013 at 01:26 PM.
BAMseek is offline   Reply With Quote
Old 01-26-2013, 02:39 PM   #10
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by Richard Finney View Post
I'm looking at this. The parsing of the 5th field is naive and does not properly count the $,^,<,and> characters. It appears to be inadequate for all possible data for this field. "mpileup" is discussed at http://www.biostars.org/show/tag/mpileup/

Others have problems using this approach : http://blog.nextgenetics.net/?e=56

Dan Koboldt at Massgenomics blog talks about mpileup : ( from http://massgenomics.org/2012/03/5-th...s-mpileup.html )

The 6th field is quality, not sequence.

I have no idea how three As are showing up on IGV. Do you think IGV is right in this case?
Hi, Richard,
These 3A are also considered mutation with other software, like the varscan... So I think the IGV is right.... However, I really cannot see it in the samtools. very confusing.
twotwo is offline   Reply With Quote
Old 01-26-2013, 02:49 PM   #11
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by BAMseek View Post
Try adding the -A flag to mpileup (so "samtools mpileup -A . . . "). I've found that if you want to see exactly what you see in IGV, then you should include anomalous reads. Also, mpileup (as well as pileup and depth) have a ceiling cutoff for maximum depth at a position, so it is also helpful to specify a large ceiling if you think you'll hit the maximum (which I believe is 8000/number of samples).

Edit: Another thing worth checking - make sure you aren't off by 1 due to 0-offset or 1-offset. I believe mpileup and IGV are both 1-offset, so that should be fine but just something to keep in mind.

Justin
Thanks Justin. I put -A in the code. However, it still gave me 43T (no 3A). How can I check the 0-offset or 1-offset?
twotwo is offline   Reply With Quote
Old 01-27-2013, 05:29 PM   #12
BAMseek
Senior Member
 
Location: St. Louis, MO, USA

Join Date: Apr 2011
Posts: 124
Default

Quote:
Originally Posted by twotwo View Post
Thanks Justin. I put -A in the code. However, it still gave me 43T (no 3A). How can I check the 0-offset or 1-offset?
Looks like from your mpileup output that the calls are

chr1 205131282 T
chr1 205131283 T

This lines up with IGV, so looks like that is fine. The only other thing I can think to check would be if those As are at the end of the read and are actually soft-clipped. If you have "show soft clipping" enabled in IGV, then it would show those bases, but they aren't counted in the total base counts. Maybe you could post a picture of what it looks like in IGV.
BAMseek is offline   Reply With Quote
Old 01-28-2013, 05:55 AM   #13
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by BAMseek View Post
Looks like from your mpileup output that the calls are

chr1 205131282 T
chr1 205131283 T

This lines up with IGV, so looks like that is fine. The only other thing I can think to check would be if those As are at the end of the read and are actually soft-clipped. If you have "show soft clipping" enabled in IGV, then it would show those bases, but they aren't counted in the total base counts. Maybe you could post a picture of what it looks like in IGV.
Please see the attached. All of the As are in some orange background. However, the summary window showed it is 43T and 3 A.
Attached Images
File Type: jpg seqanswers.jpg (87.8 KB, 7 views)
twotwo is offline   Reply With Quote
Old 01-28-2013, 07:31 AM   #14
BAMseek
Senior Member
 
Location: St. Louis, MO, USA

Join Date: Apr 2011
Posts: 124
Default

I think that the coloring means that the paired read was aligned to a different chromosome. If you hadn't already put in the -A flag for anomalous reads, I would have guessed that to be the cause of the difference.
BAMseek is offline   Reply With Quote
Old 01-28-2013, 07:50 AM   #15
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by BAMseek View Post
I think that the coloring means that the paired read was aligned to a different chromosome. If you hadn't already put in the -A flag for anomalous reads, I would have guessed that to be the cause of the difference.
If I put the -A, the command is:
samtools mpileup -A -r chr1:205131282-205131283 /home/twotwo/1234.cleaned.bam 2> /dev/null | tail -1 | cut -f5 | sed -e "s/.\{1\}/&\n/g" | tr "[:lower:]" "[:upper:]" | grep "T" | uniq -c

However, I still get 43T only, no 3A. Is there any problem with my command? Thanks!
twotwo is offline   Reply With Quote
Old 01-28-2013, 09:41 AM   #16
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Get rid of the grep "T" |

samtools mpileup -A -r chr1:205131282-205131283 /home/twotwo/1234.cleaned.bam 2> /dev/null | tail -1 | cut -f5 | sed -e "s/.\{1\}/&\n/g" | tr "[:lower:]" "[:upper:]" | grep "T" | uniq -c
Richard Finney is offline   Reply With Quote
Old 01-28-2013, 03:13 PM   #17
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by Richard Finney View Post
Get rid of the grep "T" |

samtools mpileup -A -r chr1:205131282-205131283 /home/twotwo/1234.cleaned.bam 2> /dev/null | tail -1 | cut -f5 | sed -e "s/.\{1\}/&\n/g" | tr "[:lower:]" "[:upper:]" | grep "T" | uniq -c
got it. yes, it's my fault. And it works. Thanks all for your help!
twotwo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:19 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO