Hi,
I have run a bowtie followed by a tophat with the unmapped reads.
I than want to merge the two sam files (output).
I can't do it because the bam file from the bowtie run is not sorted.
Somehow I can't sort this file.
I tried everything I know. This is how I run my script
As you can see I am sorting both files in the same way.
BUT somehow the tophat file is sorted, the bowtie file is not!
tophat:
bowtie:
I just can't figure out what I do wrong.
Does samtools sort has some kind of a threshold for file size, number of lines or anything?
I would appreciate any elp or ideas as to how to try and fix it.
Thanks
Assa
I have run a bowtie followed by a tophat with the unmapped reads.
I than want to merge the two sam files (output).
I can't do it because the bam file from the bowtie run is not sorted.
Somehow I can't sort this file.
I tried everything I know. This is how I run my script
Code:
bowtie -a --phred64-quals -m 1 -n 2 -l 22 -q --un wt_trimmed.unmapped -t -p 6 --chunkmbs 256 --max wt_trimmed.maxHits -S /export/ref_genome -1 /export/wt_s1_R1 -2 /export/wt_s1_R2 wt_bowtie.sam samtools view -h -o wttotal_bowtie.bam -Sb wttotal_bowtie.sam samtools sort wttotal_bowtie.bam wttotal_bowtie.sorted samtools index wttotal_bowtie.sorted.bam #topHat for unmapped reads from bowtie run tophat -o tophat_out -p 6 -r 100 --solexa1.3-quals /export/ref_genome wt_trimmed1.unmapped wt_trimmed2.unmapped cp tophat_out/accepted_hits.bam wt_tophat.bam samtools sort wt_tophat.bam wt_tophat.sorted samtools index wt_tophat.sorted.bam
BUT somehow the tophat file is sorted, the bowtie file is not!
tophat:
@HD VN:1.0 SO:coordinate
@SQ SN:2L LN:23011544
@SQ SN:2LHet LN:368872
@SQ SN:2R LN:21146708
@SQ SN:2RHet LN:3288761
@SQ SN:3L LN:24543557
@SQ SN:3LHet LN:2555491
@SQ SN:3R LN:27905053
@SQ SN:3RHet LN:2517507
@SQ SN:4 LN:1351857
@SQ SN:U LN:10049037
@SQ SN:Uextra LN:29004656
@SQ SN:X LN:22422827
@SQ SN:XHet LN:204112
@SQ SN:YHet LN:347038
@SQ SN:dmel_mitochondrion_genome LN:19517
@PG ID:TopHat VN:1.3.0 CL:/usr/bin/tophat -o tophat_out -p 6 -r 100 --solexa1.3-quals /export/ref_genome wt_trimmed1.unmapped wt_trimmed2.unmapped
HWI-EAS225_0031_FC:2:102:14716:18758#0 147 2L 9069 255 54M = 9077 62 CAAGCCCAAAAAGCAGTTTGATACCAGCGATTTTGTTATTGAGAGCGTGCAGAA GD<BGGDD<>DDG>DCACAA?B=?D<<BDDIDIGBC>?A9BED<BBABA?<D@G NM:i:0 NH:i:1
HWI-EAS225_0031_FC:2:102:14716:18758#0 99 2L 9077 255 54M = 9069 62 AAAAGCAGTTTGATACCAGCGATTTTGTTATTGAGAGCGTGCAGAATATACCAC EDCFBDDGGEEGCCBGEGGEGGDGCCB9@>EF<ECAE8E:CC><:76?=DDBGB NM:i:0 NH:i:1
HWI-EAS225_0031_FC:2:56:6678:11287#0 137 2L 9855 255 54M * 0 0 GTTTTATTAATAATCCTAAGCTAAATACTCAATTATATACTTTATATGGTCGGA IHIHIIIIIGIIHIIGIGHFIIHHHIIIIHIIIIIIIIIIIEIHIIIHHIBHII NM:i:0 NH:i:1
...
@SQ SN:2L LN:23011544
@SQ SN:2LHet LN:368872
@SQ SN:2R LN:21146708
@SQ SN:2RHet LN:3288761
@SQ SN:3L LN:24543557
@SQ SN:3LHet LN:2555491
@SQ SN:3R LN:27905053
@SQ SN:3RHet LN:2517507
@SQ SN:4 LN:1351857
@SQ SN:U LN:10049037
@SQ SN:Uextra LN:29004656
@SQ SN:X LN:22422827
@SQ SN:XHet LN:204112
@SQ SN:YHet LN:347038
@SQ SN:dmel_mitochondrion_genome LN:19517
@PG ID:TopHat VN:1.3.0 CL:/usr/bin/tophat -o tophat_out -p 6 -r 100 --solexa1.3-quals /export/ref_genome wt_trimmed1.unmapped wt_trimmed2.unmapped
HWI-EAS225_0031_FC:2:102:14716:18758#0 147 2L 9069 255 54M = 9077 62 CAAGCCCAAAAAGCAGTTTGATACCAGCGATTTTGTTATTGAGAGCGTGCAGAA GD<BGGDD<>DDG>DCACAA?B=?D<<BDDIDIGBC>?A9BED<BBABA?<D@G NM:i:0 NH:i:1
HWI-EAS225_0031_FC:2:102:14716:18758#0 99 2L 9077 255 54M = 9069 62 AAAAGCAGTTTGATACCAGCGATTTTGTTATTGAGAGCGTGCAGAATATACCAC EDCFBDDGGEEGCCBGEGGEGGDGCCB9@>EF<ECAE8E:CC><:76?=DDBGB NM:i:0 NH:i:1
HWI-EAS225_0031_FC:2:56:6678:11287#0 137 2L 9855 255 54M * 0 0 GTTTTATTAATAATCCTAAGCTAAATACTCAATTATATACTTTATATGGTCGGA IHIHIIIIIGIIHIIGIGHFIIHHHIIIIHIIIIIIIIIIIEIHIIIHHIBHII NM:i:0 NH:i:1
...
@HD VN:1.0 SO:unsorted
@SQ SN:YHet LN:347038
@SQ SN:dmel_mitochondrion_genome LN:19517
@SQ SN:2L LN:23011544
@SQ SN:X LN:22422827
@SQ SN:3L LN:24543557
@SQ SN:4 LN:1351857
@SQ SN:2R LN:21146708
@SQ SN:3R LN:27905053
@SQ SN:Uextra LN:29004656
@SQ SN:2RHet LN:3288761
@SQ SN:2LHet LN:368872
@SQ SN:3LHet LN:2555491
@SQ SN:3RHet LN:2517507
@SQ SN:U LN:10049037
@SQ SN:XHet LN:204112
@PG ID:Bowtie VN:0.12.7 CL:"bowtie -a --phred64-quals -m 1 -n 2 -l 22 -q --un wt_trimmed.unmapped -t -p 6 --chunkmbs 256 --max wt_trimmed.maxHits -S /export/ref_genome -1 /export/wt_s1_R1 -2 /export/wt_s1_R2 wt_bowtie.sam"
HWI-EAS225_0031_FC:2:1:10172:6768#0 163 dmel_mitochondrion_genome 1832 255 54M = 1850 72 TTGGAACAGGATGAACTGTTTATCCACCTTTATCCGCTGGAATTGCTCATGGTG BIIGIIIIIIIIIIIIIEIIIDIIGIEII<GGGGHHIIII>IIIGIIIGGIIEH XA:i:1 MD:Z:0C53 NM:i:1
HWI-EAS225_0031_FC:2:1:10172:6768#0 83 dmel_mitochondrion_genome 1850 255 54M = 1832 -72 TTTATCCACCTTTATCCGCTGGAATTGCTCATGGTGGAGCTTCAGTTGATTTAG HIIFIGGCDGI8IGHHFFIIIIIIIHIIIIIHIIIIIDIIIIHIIIIIIIIIII XA:i:0 MD:Z:54 NM:i:0
HWI-EAS225_0031_FC:2:4:13364:12037#0 163 2L 6805 255 54M = 6819 68 AGAGGTGAAAATATATTAAAATTGCCGCTCATTTTCTTCGCGCTAGAATTAGGA HIGIIDIIIIIIIFIIIIIIIIEIIIIHIIFIIIIIIIIIHH+HGGFFGIIIII XA:i:0 MD:Z:42G11 NM:i:1
...
@SQ SN:YHet LN:347038
@SQ SN:dmel_mitochondrion_genome LN:19517
@SQ SN:2L LN:23011544
@SQ SN:X LN:22422827
@SQ SN:3L LN:24543557
@SQ SN:4 LN:1351857
@SQ SN:2R LN:21146708
@SQ SN:3R LN:27905053
@SQ SN:Uextra LN:29004656
@SQ SN:2RHet LN:3288761
@SQ SN:2LHet LN:368872
@SQ SN:3LHet LN:2555491
@SQ SN:3RHet LN:2517507
@SQ SN:U LN:10049037
@SQ SN:XHet LN:204112
@PG ID:Bowtie VN:0.12.7 CL:"bowtie -a --phred64-quals -m 1 -n 2 -l 22 -q --un wt_trimmed.unmapped -t -p 6 --chunkmbs 256 --max wt_trimmed.maxHits -S /export/ref_genome -1 /export/wt_s1_R1 -2 /export/wt_s1_R2 wt_bowtie.sam"
HWI-EAS225_0031_FC:2:1:10172:6768#0 163 dmel_mitochondrion_genome 1832 255 54M = 1850 72 TTGGAACAGGATGAACTGTTTATCCACCTTTATCCGCTGGAATTGCTCATGGTG BIIGIIIIIIIIIIIIIEIIIDIIGIEII<GGGGHHIIII>IIIGIIIGGIIEH XA:i:1 MD:Z:0C53 NM:i:1
HWI-EAS225_0031_FC:2:1:10172:6768#0 83 dmel_mitochondrion_genome 1850 255 54M = 1832 -72 TTTATCCACCTTTATCCGCTGGAATTGCTCATGGTGGAGCTTCAGTTGATTTAG HIIFIGGCDGI8IGHHFFIIIIIIIHIIIIIHIIIIIDIIIIHIIIIIIIIIII XA:i:0 MD:Z:54 NM:i:0
HWI-EAS225_0031_FC:2:4:13364:12037#0 163 2L 6805 255 54M = 6819 68 AGAGGTGAAAATATATTAAAATTGCCGCTCATTTTCTTCGCGCTAGAATTAGGA HIGIIDIIIIIIIFIIIIIIIIEIIIIHIIFIIIIIIIIIHH+HGGFFGIIIII XA:i:0 MD:Z:42G11 NM:i:1
...
Does samtools sort has some kind of a threshold for file size, number of lines or anything?
I would appreciate any elp or ideas as to how to try and fix it.
Thanks
Assa
Comment