Hi,
I am relatively new to using Linux and to RNA-seq data analysis. I have been given some BAM files and been told to analyse the data in R. However, before I do that, I need to generate a count table for each sample. I am using HTSeq to do this but am hitting an error every time. Below is what I'm typing and the resulting error:
$ htseq-count -s no 01_v2.sam "gencode.v21.annotation.gtf" > 01_v2.counts
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2300000 GFF lines processed.
2400000 GFF lines processed.
2500000 GFF lines processed.
2546594 GFF lines processed.
Error occured when reading first line of sam file.
Error: ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 1 of file 01_v2.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:1321]
head 01_v2.sam
FCC64Y1ACXX:1:2311:16630:3201#GCCAATAT 81 chr10 60021 0 49M * 0 0 GCATCGGGGTGCTCTGGTTTTGTTGTTGTTATTTCTGAATGACATTTAC hiihhiiiiiiiiiiiiihdhiiiihhfiiihihhheggggeeeeebb_ NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2206:15375:73069#GCCAATAT 65 chr10 60124 0 49M * 0 0 GACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTT bbbeeecegggggifhhfhiiiihhiiiiihhihfefcffhiefhcg_c NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2214:12808:62966#NCCAATAT 65 chr10 60154 0 49M * 0 0 CTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGGGA bbbeeeeeggggghihiggiiiiighiihhihhiiihiiihihiiiiig NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1102:13233:83253#GCCAATAT 65 chr10 60853 0 49M * 0 0 CGCAGATGGATAGATTACTGTTATTAGTTCTCATTTCATTGTTAATTTT bbbeeeeeggfgghiiiiiihhdhidgghfhihhiiihhhifhihhhhi NM:i:1 MD:Z:0T48
FCC64Y1ACXX:1:1308:1800:34170#GCCAATAT 65 chr10 60930 30 49M * 0 0 TGCCTTTCAATATACCTTAGTGGAATTTATTAAATTTTCCTGGATGTCC bbbeeeeegggggiiiihiighheiiihghhiiiiiiiiiiiihiigii NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1307:19223:71708#GCCAATAT 65 chr10 61145 0 49M * 0 0 AATTCCACTTGGTTATATTGTCTAACTTTTTTCTAATTTTCTTTCATTT ^[\cccSaaccccd[bKQ[`Y^`[Y^`ecaddccd[accdccccdcbcd NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2206:7623:63302#GCCAATAT 81 chr10 62142 0 49M * 0 0 CTATTTGCACATATAGTTTTAATACCAATGACGTTAAAATGTATAACAC ghfcf^fhhiiiiiiiiiiiihggiiiiiihiiiiigggggeeeee_ab NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1216:3028:53281#NCCAATAT 65 chr10 62384 0 49M * 0 0 GTCCAGAGACAAATATTTTAAATATTGAAGTTGAAGACCTAAAAATGTG ___`ccdegeegehhhgfffhhhhXeeg_gghhhf_ddfghhfghaaa_ NM:i:1 MD:Z:0T48
FCC64Y1ACXX:1:1207:12316:62577#GCCAATAT 81 chr10 66812 0 49M * 0 0 TGAAAGCATTCCCTTTGAGAATTGGAACAAGACGAGGAGACTACTCTCA iiiiiiiiihiihihhhifiiiiiiihhiiiiiiiigggggeceeebb_ NM:i:0 MD:Z:49
I have also posted the first 5 lines of the file, but don't know what is wrong with line 1 of the file.
If anyone can help, that would be greatly appreciated,
Thanks
I am relatively new to using Linux and to RNA-seq data analysis. I have been given some BAM files and been told to analyse the data in R. However, before I do that, I need to generate a count table for each sample. I am using HTSeq to do this but am hitting an error every time. Below is what I'm typing and the resulting error:
$ htseq-count -s no 01_v2.sam "gencode.v21.annotation.gtf" > 01_v2.counts
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2300000 GFF lines processed.
2400000 GFF lines processed.
2500000 GFF lines processed.
2546594 GFF lines processed.
Error occured when reading first line of sam file.
Error: ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 1 of file 01_v2.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:1321]
head 01_v2.sam
FCC64Y1ACXX:1:2311:16630:3201#GCCAATAT 81 chr10 60021 0 49M * 0 0 GCATCGGGGTGCTCTGGTTTTGTTGTTGTTATTTCTGAATGACATTTAC hiihhiiiiiiiiiiiiihdhiiiihhfiiihihhheggggeeeeebb_ NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2206:15375:73069#GCCAATAT 65 chr10 60124 0 49M * 0 0 GACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTT bbbeeecegggggifhhfhiiiihhiiiiihhihfefcffhiefhcg_c NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2214:12808:62966#NCCAATAT 65 chr10 60154 0 49M * 0 0 CTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGGGA bbbeeeeeggggghihiggiiiiighiihhihhiiihiiihihiiiiig NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1102:13233:83253#GCCAATAT 65 chr10 60853 0 49M * 0 0 CGCAGATGGATAGATTACTGTTATTAGTTCTCATTTCATTGTTAATTTT bbbeeeeeggfgghiiiiiihhdhidgghfhihhiiihhhifhihhhhi NM:i:1 MD:Z:0T48
FCC64Y1ACXX:1:1308:1800:34170#GCCAATAT 65 chr10 60930 30 49M * 0 0 TGCCTTTCAATATACCTTAGTGGAATTTATTAAATTTTCCTGGATGTCC bbbeeeeegggggiiiihiighheiiihghhiiiiiiiiiiiihiigii NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1307:19223:71708#GCCAATAT 65 chr10 61145 0 49M * 0 0 AATTCCACTTGGTTATATTGTCTAACTTTTTTCTAATTTTCTTTCATTT ^[\cccSaaccccd[bKQ[`Y^`[Y^`ecaddccd[accdccccdcbcd NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2206:7623:63302#GCCAATAT 81 chr10 62142 0 49M * 0 0 CTATTTGCACATATAGTTTTAATACCAATGACGTTAAAATGTATAACAC ghfcf^fhhiiiiiiiiiiiihggiiiiiihiiiiigggggeeeee_ab NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1216:3028:53281#NCCAATAT 65 chr10 62384 0 49M * 0 0 GTCCAGAGACAAATATTTTAAATATTGAAGTTGAAGACCTAAAAATGTG ___`ccdegeegehhhgfffhhhhXeeg_gghhhf_ddfghhfghaaa_ NM:i:1 MD:Z:0T48
FCC64Y1ACXX:1:1207:12316:62577#GCCAATAT 81 chr10 66812 0 49M * 0 0 TGAAAGCATTCCCTTTGAGAATTGGAACAAGACGAGGAGACTACTCTCA iiiiiiiiihiihihhhifiiiiiiihhiiiiiiiigggggeceeebb_ NM:i:0 MD:Z:49
I have also posted the first 5 lines of the file, but don't know what is wrong with line 1 of the file.
If anyone can help, that would be greatly appreciated,
Thanks
Comment