Hi,
how can I search a SAM file to find the alignments with N or D above a certain value in the CIGAR?
how can I search a SAM file to find the alignments with N or D above a certain value in the CIGAR?
You are currently viewing the SEQanswers forums as a guest, which limits your access. Click here to register now, and join the discussion
function accept(rec) { if( rec.getReadUnmappedFlag()) return false; var cigar = rec.cigar; if(cigar==null) return false; for(var i = 0;i< cigar.numCigarElements();++i) { var ce = cigar.getCigarElement(i); if( ce.getLength()< 8 ) continue; var op = ce.getOperator().name(); if( op=="N" || op=="D") return true; } return false; } accept(record);
curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00154/alignment/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam" | java -jar dist-1.133/samjs.jar -f filter.js (...) ERR018420.635615 147 1 18295328 70 32M10D44M = 18294913 -500 TTTGTTTTGTTTTATTATATTATATTATATTGTATTATATTATATTATATTATATTATATTATATTAATTATATTA =@B9>>?B;>AAAA@AAAABAAAABA@A@BA>A@AA@A@AA@@@@A@A@AA@A@AA@A@BA@AABAB@BA@A@AA< X0:i:1 X1:i:0 OC:Z:67M1D9M MD:Z:32^TATTATATTA44 RG:Z:ERR018420 AM:i:37 NM:i:10 SM:i:37 BQ:Z:BED@@@@@@@@@@@@@@@@@@@@@@@@@@@@@`_``_`_``____`_`_``_`_``_`_a`_``a`a_a`_`_``[ MQ:i:60 XT:A:U ERR018419.19671936 163 1 18335432 35 50M14D20M6S = 18335873 499 TCTAAAAAAAAAAAAAAAGAGAGAGAGAAATAAAGAGAGAAAGAGAAAGAAGAAAGAAGGAAAGAAAGAAAACAAA ;76>=AB<<>?;>=8>972753.&9:64;>87==7;9667B=8=39A>8=82525)%67<5;9)8A>.=969,A<@ OC:Z:62M2D8M6S MD:Z:50^AGAAAGAAAGAAAG20 RG:Z:ERR018419 AM:i:25 NM:i:14 SM:i:25 BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@WQTQTHDUV[TZXHW`]M\X@@@@@@ MQ:i:25 XT:A:M ERR018418.3162191 163 1 18360887 70 43M16D33M = 18361308 496 CATCTATCTATTATCTATCTATCTCATTTATCCATCTATCTATTATCTATCTATCTATCTATCTATCTATCTGCCT <A@@BAA@B@ABAA?BAA@AA?@B@AABBA@??AA?BAA@BAABAA@BA@@BAA@BA@@BAA@BBB@BB@AC??BD X0:i:1 X1:i:0 OC:Z:76M MD:Z:43^TATCTATCTATCTATC33 RG:Z:ERR018418 AM:i:37 NM:i:16 SM:i:37 BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@a``_a`__a``_a`__a``_aaa_aa_`b^^ac MQ:i:60 XT:A:U ERR018418.5325337 163 1 18360893 70 37M16D39M = 18361325 507 TCTATTATCTATCTATCTCATTTATCCATCTATCTATTATCTATCTATCTATCTATCTATCTATCTGCCTATCCAT <?AAABAA@AAA@BAA@B@AAABAA@@AA@AA@?BAABAA?AAA@BAA?BAA@AAA@BAA@BA@@B@AACBBAAC@ X0:i:1 X1:i:0 OC:Z:76M MD:Z:37^TATCTATCTATCTATC39 RG:Z:ERR018418 AM:i:37 NM:i:16 SM:i:37 BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@a``^```_a``^a``_```_a``_a`__a_``baa``b_ MQ:i:60 XT:A:U
#!/usr/bin/env python import sys import pysam INBAM=sys.argv[1] MIN_GAP= int(sys.argv[2]) insam= pysam.AlignmentFile(INBAM) outsam= pysam.AlignmentFile('-', 'wh', template= insam) for aln in insam: for x in aln.cigartuples: # See cigartuples in http://pysam.readthedocs.org/en/latest/api.html if x[0] in [2, 3] and x[1] > MIN_GAP: outsam.write(aln) break insam.close() outsam.close() sys.exit()
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment