Hi everyone,
Is there a smart way to assemble reads in the deep sequencing data?
After aligning reads to the genome using bowtie and samtools, I got the bam file. I want to extend the reads to get contigs if these reads have the same ref chromosome and are on the DNA strand.
Can I get the start position, end position, ref strand and chromosome number of all these contigs?
I wrote the following code but it ignore the strand (+/-) the reads come from. Reads from - strand might connected to a + strand one. That's not good.
Is that possible to extract ref strand info from samtools mpileup?
My code:
samtools mpileup input.sorted.bam |\
cut -d ' ' -f 1,2 |\
awk -F ' ' 'BEGIN {chr="";start=-1;end=-1} {if(chr!=$1 || int($2)!=end+1) { if(chr!="") {printf("%s:%d-%d\n",chr,start,end);} chr=$1;start=int($2);end=int($2);} else { end=end+1;}} END {if(chr!="") {printf("%s:%d-%d\n",chr,start,end); } }'>output.txt
And I really want to achieve this in the shell. Any suggestion?
Thanks a lot.
Dadi
Is there a smart way to assemble reads in the deep sequencing data?
After aligning reads to the genome using bowtie and samtools, I got the bam file. I want to extend the reads to get contigs if these reads have the same ref chromosome and are on the DNA strand.
Can I get the start position, end position, ref strand and chromosome number of all these contigs?
I wrote the following code but it ignore the strand (+/-) the reads come from. Reads from - strand might connected to a + strand one. That's not good.
Is that possible to extract ref strand info from samtools mpileup?
My code:
samtools mpileup input.sorted.bam |\
cut -d ' ' -f 1,2 |\
awk -F ' ' 'BEGIN {chr="";start=-1;end=-1} {if(chr!=$1 || int($2)!=end+1) { if(chr!="") {printf("%s:%d-%d\n",chr,start,end);} chr=$1;start=int($2);end=int($2);} else { end=end+1;}} END {if(chr!="") {printf("%s:%d-%d\n",chr,start,end); } }'>output.txt
And I really want to achieve this in the shell. Any suggestion?
Thanks a lot.
Dadi
Comment