Seqanswers Leaderboard Ad

**maubp** · 11-06-2012, 06:26 AM

Which computer languages can you program in?

For example, if you said Python, I would suggest looking at the pysam library for working with SAM/BAM files from Python using the C samtools library.

**vincentdemolombe** · 04-15-2013, 12:42 AM

Extracting nucleotid at given position

Hi,

i am facing the same issue. I have sam-files resulting from an alignment with CASAVA and I have a list of positons, i.e chr1:63229714.
I want to extract the bases in the aligned reads at these positions.

Easy to do when CIGAR=50M (provided reads are 50bp long).
But tricky when indels are present, i.e. CIGAR=34M1D16M, or CIGAR=10M1467N40M.

I wrote a perl-script to do the job, but it's too slow.

I hope there exist a tool which perform this job and would apreciate any help.
Regards.

**swbarnes2** · 04-15-2013, 11:21 AM

Something like

Code:

samtools view file.bam | cut -f 10 | cut -c 3,6,8 > output.txt

Will work, but not around indels. You could cut out the sequence and the cigar together, and maybe use awk to use the data from the one to know which letters to cut from the other.

**obk** · 04-17-2013, 08:34 AM

I also had a very similar (exact same?) problem: http://seqanswers.com/forums/showthread.php?t=27648

I ended up doing something similar to vincentdemolombe, by writing a custom perl script.

I used Bio::Perl and Bio:

B::Sam, and parsed the CIGAR string and padded_alignment method to get the read bases from particular positions/regions on the reference.

Bio::DB::Bam::Alignment - The SAM/BAM alignment object - metacpan.org

http://search.cpan.org/~lds/Bio-SamTools/lib/Bio/DB/Bam/Alignment.pm

The SAM/BAM alignment object

**swbarnes2** · 04-17-2013, 09:37 AM

I think you might be better off generating a pileup, and parsing the desired line of that. Let the pileup program deal with shifting the reads around properly.

**obk** · 04-17-2013, 11:18 AM

Great point.. though in the 5 minutes I played around with samtools mpileup, it doesn't seem to keep the cluster ID in the output. In my case, I needed to know the cluster ID AND its bases at particular positions on the reference after alignment.

This is as far as I got:

Code:

samtools mpileup -f ref.fasta read1.sorted.bam | less -S -N

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

extract positions from alignment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News