![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
extract the positions from genome where reads from all the samples have aligned | Maulik23 | Bioinformatics | 1 | 09-11-2012 11:45 AM |
How to extract the "overall alignment rate" from bowtie | andaleef | Illumina/Solexa | 0 | 07-26-2012 03:38 PM |
Fastest way to extract differing positions from each alignment in a BAM file | CHRYSES | Bioinformatics | 5 | 12-14-2011 12:28 PM |
extract alignment from SAM with a GFF file | NicoBxl | Bioinformatics | 4 | 08-02-2011 02:45 PM |
Extract Gap and Mismatch From MAF alignment Output | peveralldubois | Bioinformatics | 0 | 01-14-2011 08:23 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Cambridge Join Date: Oct 2011
Posts: 12
|
![]()
Hi all,
I got some short (~220bp) pair-ended DNA sequences targeted in one region. The raw data are in fastq and I aligned them using bwa, which give me sam file. Now I want to extract some positions from each read. For example, The original aligned read sequence is: TTGAATGGGGGATGTTTTTGGGATATAGATTATGTTTTTATATC I want to extract position 3, 6, 8, so I want: GTG on each line. The reason that I can't simply pick up the position from sam file is that sometimes there are indels which will shift my positions. I googled this problem but most answers only give me the count summary of each position like using pileup, bamtobed... But in my case, the sequence of GTG is important and can not be split. Namely, I'm interested in how many GTGs rather than how many G, how many T and how many G. Any help is very appreciated! |
![]() |
![]() |
![]() |
#2 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
Which computer languages can you program in?
For example, if you said Python, I would suggest looking at the pysam library for working with SAM/BAM files from Python using the C samtools library. |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: south france Join Date: Apr 2013
Posts: 1
|
![]()
Hi,
i am facing the same issue. I have sam-files resulting from an alignment with CASAVA and I have a list of positons, i.e chr1:63229714. I want to extract the bases in the aligned reads at these positions. Easy to do when CIGAR=50M (provided reads are 50bp long). But tricky when indels are present, i.e. CIGAR=34M1D16M, or CIGAR=10M1467N40M. I wrote a perl-script to do the job, but it's too slow. I hope there exist a tool which perform this job and would apreciate any help. Regards. |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Something like
Code:
samtools view file.bam | cut -f 10 | cut -c 3,6,8 > output.txt |
![]() |
![]() |
![]() |
#5 |
Member
Location: West Coast Join Date: Dec 2012
Posts: 12
|
![]()
I also had a very similar (exact same?) problem: http://seqanswers.com/forums/showthread.php?t=27648
I ended up doing something similar to vincentdemolombe, by writing a custom perl script. I used Bio::Perl and Bio::DB::Sam, and parsed the CIGAR string and padded_alignment method to get the read bases from particular positions/regions on the reference. http://search.cpan.org/~lds/Bio-SamT...m/Alignment.pm |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
I think you might be better off generating a pileup, and parsing the desired line of that. Let the pileup program deal with shifting the reads around properly.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: West Coast Join Date: Dec 2012
Posts: 12
|
![]()
Great point.. though in the 5 minutes I played around with samtools mpileup, it doesn't seem to keep the cluster ID in the output. In my case, I needed to know the cluster ID AND its bases at particular positions on the reference after alignment.
This is as far as I got: Code:
samtools mpileup -f ref.fasta read1.sorted.bam | less -S -N |
![]() |
![]() |
![]() |
Thread Tools | |
|
|