Seqanswers Leaderboard Ad

**BAMseek** · 12-18-2011, 07:52 PM

For visualization, you might try the IGV browser. You can zoom to your region of interest and hover over a read to get information like read id.

To programmatically extract reads that overlap a position, I would suggest using "samtools view", such as

Code:

samtools view filename.bam chr8:128747553-128747553

From there, you could use the sequence, position, and CIGAR to determine if the read contains the mutation of interest.

hope that helps answer your question,
Justin

**Alex Renwick** · 12-19-2011, 09:10 AM

I'm not sure I understand your question. It sounds like you have reads from a several samples mixed together in the same bam file. You visualize the alignment and see that there are mutations at some sites. You want to know which samples carry those mutations.

If that's your problem, then the solution involves setting the Read Group attribute in the bam file. The Read Group is how you keep track of different samples when analyzing them together. With that attribute set properly, you can, for example, select reads from a particular sample in samtools view.

I have found that bowtie doesn't handle Read Groups properly. If you have trouble you might have an easier time with bwa.

**BioSlayer** · 12-19-2011, 10:50 AM

Originally posted by Alex Renwick View Post

I'm not sure I understand your question. It sounds like you have reads from a several samples mixed together in the same bam file. You visualize the alignment and see that there are mutations at some sites. You want to know which samples carry those mutations.

If that's your problem, then the solution involves setting the Read Group attribute in the bam file. The Read Group is how you keep track of different samples when analyzing them together. With that attribute set properly, you can, for example, select reads from a particular sample in samtools view.

I have found that bowtie doesn't handle Read Groups properly. If you have trouble you might have an easier time with bwa.

You got me right !, I basically visualized the alignment and then wanted get the sequences that have the mutations (at that certain position), primarily; their IDs and positions where the mutations happened on the sequence as well as the chromosomal coordinate, I am getting somewhere but there are bumps on the way, In summary, report the mutations by positions and retrieve the relevant data to identify the seqs with.

So, reading BWA docs, I was looking for the Read Group Attrib you suggested but I have not encountered it. It will be great if you can elaborate a bit more ..

Appreciate it...

**Alex Renwick** · 12-19-2011, 01:11 PM

Originally posted by BioSlayer View Post

...
So, reading BWA docs, I was looking for the Read Group Attrib you suggested but I have not encountered it. It will be great if you can elaborate a bit more ..

Appreciate it...

I can elaborate a little.

The Read Group is part of the SAM file specification, which you can find on the samtools webpage. Aligners like bowtie and bwa don't need it, but downstream analysis often does.

The read group has to be added before the samples are merged into one bam file. Typically, you'd produce a bam file for each sample (with read group assigned), then merge the bam files into one for subsequent analysis.

With bwa, the -r option to samse or sampe allows you to specify the Read Group. For example,

Code:

bwa sampe -r "@RG\tID:RUN_427\tSM:smp0\tPL:ILLUMINA" ref.fa smp0.1.sai smp0.2.sai smp0.1.fq smp0.2.fq > smp0.sam

will give a Read Group ID of "RUN_427", sample name "smp0", and specify that the platform was "ILLUMINA".

Some other ways of adding a Read Group specification are 1) using samtools merge -r; 2) using some picard tool; 3) writing your own small shell script.

**BioSlayer** · 01-01-2012, 06:14 AM

@Bamseek and @Alex Renwick.. Thank you folks, the command suggested by Bamseek works great at filtering for me all the seqs mapping at that position. The RG option in samtools merge is so handy, I don't typically produce separate bam files and then merge them but I marked it for the day I have to.

By chance, I found a way in samtools tview that works in conjunction with the samtools view option, now I can toggle between the Read ID and its Sequence by hitting the 'r' key on the keyboard while on the viewer

... Apparently samtools has some hidden cookies to munch on ...

**gringer** · 01-01-2012, 04:40 PM

on a similar vein, does anyone know an easy way to get these read groups to show up as different colours in a GBrowse BAM track for reads? I'll have a go at doing this myself in a couple of days, but it would be nice to know if there's something I can work on as a base.

**BioSlayer** · 01-01-2012, 11:23 PM

Interesting ...

Originally posted by gringer View Post

on a similar vein, does anyone know an easy way to get these read groups to show up as different colours in a GBrowse BAM track for reads? I'll have a go at doing this myself in a couple of days, but it would be nice to know if there's something I can work on as a base.

Should you get a headstart at doing this with GBrowse BAM spill the beans

...

**gringer** · 01-04-2012, 05:05 AM

Okay, so here's my attempt*:

Code:

[illumina_default_UP_tagged_reads]
feature        = match
glyph          = segments
bgcolor        = sub {
# colors from http://gmod.org/wiki/Glyphs_and_Glyph_Options
     my @data = @_;
     my %sam = %{$data[0]};
     my $read_gp = $sam{"align"}->get_tag_values("RG");
     my %read_map;
     @read_map{('0hpost1', '0hpost2', '0hpre', '18hpost', '18hpre', '18Ra', '18Rb',
                '18W', '30Ra', '30Rb', 'bcatpost1', 'bcatpost2', 'bcatpre1',
                'bcatpre2', 'cont1a', 'cont1b', 'cont2', 'evipost1', 'evipost2',
                'evipre1', 'evipre2', 'gfppost1', 'gfppost2', 'gfppre1', 'gfppre2',
               )} =
               ('chartreuse', 'chocolate', 'coral', 'cornflowerblue', 'cornsilk',
                'crimson', 'cyan', 'darkblue', 'darkcyan', 'darkgoldenrod',
                'darkgray', 'darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen',
                'darkorange', 'darkorchid', 'darkred', 'darksalmon', 'darkseagreen',
                'darkslategray', 'darkturquoise', 'darkviolet', 'deeppink', 'deepskyblue',
               );
     return($read_map{$read_gp});
  }
height         = 3
feature_limit  = 200
stranded       = 0
draw_target    = 1
show_mismatch  = 1
mismatch_color = red
bump density   = 20
bump           = fast
database       = Dec_2011_IDUA_Tagged
key            = Dec 2011 Illumina Unpaired Reads (Tagged)
citation       = Mapped with bowtie2 v. 2.0.0-beta5 using default options
category       = RNAseq coverage
label          = 0

Others will obviously need to change some strings around to get this to work for them.

* after many hours of frustration at the lack of useful gbrowse error messages when a 'sub' doesn't work

**gringer** · 01-16-2012, 04:10 AM

This one uses selectable subtracks and is a bit less fiddly to work with, but seems to ignore the feature_limit variable:

Code:

[illumina_default_UP_tagged_reads]
feature        = match
glyph          = segments
bgcolor        = darkgreen
height         = 3
feature_limit  = 50
stranded       = 0
draw_target    = 1
show_mismatch  = 1
mismatch_color = red
bump density   = 20
bump           = fast
database       = Dec_2011_IDUA_Tagged
key            = Dec 2011 Illumina Unpaired Reads (Tagged)
citation       = Mapped with bowtie2 v. 2.0.0-beta5 using default options
category       = RNAseq coverage
label          = 0
hide empty subtracks = 1
subtrack select = Experiment tag_value RG;
subtrack table  = 0hpost1;
                  0hpost2;
                  0hpre;
                  18hpost;
                  18hpre;
                  18Ra;
                  18Rb;
                  18W;
                  30Ra;
                  30Rb;
                  bcatpost1;
                  bcatpost2;
                  bcatpre1;
                  bcatpre2;
                  cont1a;
                  cont1b;
                  cont2;
                  evipost1;
                  evipost2;
                  evipre1;
                  evipre2;
                  gfppost1;
                  gfppost2;
                  gfppre1;
                  gfppre2;

Also, I can't seem to get this to work with coverage maps (for which this function would be ideal).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

samtools pileup

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News