Hi,
I am trying to run DEXSeq with bame files made by the hisat2 mapper. I have created the gff file according to the manual. I have seen that the chromosome names are different from the bam files, so I have awk-modified them. now they both look similar.
I have attached here the first 1000 rows of the gff file as well as 400 of the bam file from hisat2.
here are the first few rows of both files:
When running the command:
I keep getting the following error message:
the first line of the bam file looks like that:
Has anyone ever tried to run DEXSeq with the hisat2 output?
Is there a difference to the "standard" sam/bam format?
I would appreciate any help solving this error message.
thanks
Assa
I am trying to run DEXSeq with bame files made by the hisat2 mapper. I have created the gff file according to the manual. I have seen that the chromosome names are different from the bam files, so I have awk-modified them. now they both look similar.
I have attached here the first 1000 rows of the gff file as well as 400 of the bam file from hisat2.
here are the first few rows of both files:
Code:
[B]gff[/B]: 1 dexseq_prepare_annotation.py aggregate_gene 11869 14409 . + . gene_id "ENSG00000223972.5" 1 dexseq_prepare_annotation.py exonic_part 11869 12009 . + . transcripts "ENST00000456328.2"; exonic_part_number "001"; gene_id "ENSG00000223972.5" 1 dexseq_prepare_annotation.py exonic_part 12010 12057 . + . transcripts "ENST00000450305.2+ENST00000456328.2"; exonic_part_number "002"; gene_id "ENSG00000223972.5" 1 dexseq_prepare_annotation.py exonic_part 12058 12178 . + . transcripts "ENST00000456328.2"; exonic_part_number "003"; gene_id "ENSG00000223972.5" 1 dexseq_prepare_annotation.py exonic_part 12179 12227 . + . transcripts "ENST00000450305.2+ENST00000456328.2"; exonic_part_number "004"; gene_id "ENSG00000223972.5" 1 dexseq_prepare_annotation.py exonic_part 12613 12697 . + . transcripts "ENST00000450305.2+ENST00000456328.2"; exonic_part_number "005"; gene_id "ENSG00000223972.5" [B]bam[/B]: SOLEXA3_1:8:54:9864:6930/2 433 1 10536 1 50M 16 90128665 0 TACCCCCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCT ################################################## AS:i:-4 ZS:i:-4 XN:i:0 XM:i:2 XO:i:0 XG:i:0NM:i:2 MD:Z:4A19C25 YT:Z:UP NH:i:8 SOLEXA3_1:8:96:12216:10104/2 401 1 10536 1 50M = 532371 0 TACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCT BBCB<>=@:=)BCCCBCCDC@@@@:C>C@CCCCCC@CCCBACCBCCCCCC AS:i:-4 ZS:i:-4 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1MD:Z:24C25 YT:Z:UP NH:i:6 SOLEXA3_1:8:38:7825:12181/2 177 1 10537 1 50M 2 61785065 0 ACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTC #########B-A6=0;6<:6:66:*;34:.ABBA?7>AA=ABC?C@DBCC AS:i:-4 ZS:i:-4 XN:i:0 XM:i:1 XO:i:0 XG:i:0NM:i:1 MD:Z:23C26 YT:Z:UP NH:i:6 SOLEXA3_1:8:102:12459:18067/1 337 1 10540 1 50M 19 238009 0 ACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCC BCCCCCC@CCCCCCCCCCCCCCCCCCCBCCCCCCCCCCCCCCCCCCCCCC AS:i:-5 ZS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1MD:Z:20C29 YT:Z:UP NH:i:4 SOLEXA3_1:8:63:4265:15899/2 153 1 10565 255 50M = 10565 0 CAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAAC A@<BB=B@BB2/5..(*0*/>CB>@BBC@B??;?9?C=C=<<78:;<;;< AS:i:0 ZS:i:-2 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0MD:Z:50 YT:Z:UP NH:i:1
Code:
python ~/R/x86_64-pc-linux-gnu-library/3.2/DEXSeq/python_scripts/dexseq_count.py -p yes gencode.v23.annotation.mod.gff $file DEXSeq_completeSample/$NEW_FILE.txt
Code:
Traceback (most recent call last): File "/home/yeroslaviz/R/x86_64-pc-linux-gnu-library/3.2/DEXSeq/python_scripts/dexseq_count.py", line 215, in <module> for af, ar in HTSeq.pair_SAM_alignments( reader( sam_file ) ): File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 601, in pair_SAM_alignments for almnt in alignments: File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 536, in __iter__ algnt = SAM_Alignment.from_SAM_line( line ) File "_HTSeq.pyx", line 1276, in HTSeq._HTSeq.SAM_Alignment.from_SAM_line (src/_HTSeq.c:24611) ValueError: ('SAM line does not contain at least 11 tab-delimited fields.', 'line 1 of file hisat2.bamFiles/610W1AAXX_8.sorted.bam')
Code:
SOLEXA3_1:8:54:9864:6930/2 433 1 10536 1 50M 16 90128665 0 TACCCCCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCT ################################################## AS:i:-4 ZS:i:-4 XN:i:0 XM:i:2 XO:i:0 XG:i:0NM:i:2 MD:Z:4A19C25 YT:Z:UP NH:i:8
Is there a difference to the "standard" sam/bam format?
I would appreciate any help solving this error message.
thanks
Assa