Error using HTSeq

ea11

Member

Join Date: Jun 2015

Posts: 36
- Share
- Tweet
#1

Error using HTSeq

07-09-2015, 04:39 AM

Hi,

I am relatively new to using Linux and to RNA-seq data analysis. I have been given some BAM files and been told to analyse the data in R. However, before I do that, I need to generate a count table for each sample. I am using HTSeq to do this but am hitting an error every time. Below is what I'm typing and the resulting error:

$ htseq-count -s no 01_v2.sam "gencode.v21.annotation.gtf" > 01_v2.counts
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2300000 GFF lines processed.
2400000 GFF lines processed.
2500000 GFF lines processed.
2546594 GFF lines processed.
Error occured when reading first line of sam file.
Error: ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 1 of file 01_v2.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:1321]

head 01_v2.sam
FCC64Y1ACXX:1:2311:16630:3201#GCCAATAT 81 chr10 60021 0 49M * 0 0 GCATCGGGGTGCTCTGGTTTTGTTGTTGTTATTTCTGAATGACATTTAC hiihhiiiiiiiiiiiiihdhiiiihhfiiihihhheggggeeeeebb_ NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2206:15375:73069#GCCAATAT 65 chr10 60124 0 49M * 0 0 GACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTT bbbeeecegggggifhhfhiiiihhiiiiihhihfefcffhiefhcg_c NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2214:12808:62966#NCCAATAT 65 chr10 60154 0 49M * 0 0 CTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGGGA bbbeeeeeggggghihiggiiiiighiihhihhiiihiiihihiiiiig NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1102:13233:83253#GCCAATAT 65 chr10 60853 0 49M * 0 0 CGCAGATGGATAGATTACTGTTATTAGTTCTCATTTCATTGTTAATTTT bbbeeeeeggfgghiiiiiihhdhidgghfhihhiiihhhifhihhhhi NM:i:1 MD:Z:0T48
FCC64Y1ACXX:1:1308:1800:34170#GCCAATAT 65 chr10 60930 30 49M * 0 0 TGCCTTTCAATATACCTTAGTGGAATTTATTAAATTTTCCTGGATGTCC bbbeeeeegggggiiiihiighheiiihghhiiiiiiiiiiiihiigii NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1307:19223:71708#GCCAATAT 65 chr10 61145 0 49M * 0 0 AATTCCACTTGGTTATATTGTCTAACTTTTTTCTAATTTTCTTTCATTT ^[\cccSaaccccd[bKQ[`Y^`[Y^`ecaddccd[accdccccdcbcd NM:i:0 MD:Z:49
FCC64Y1ACXX:1:2206:7623:63302#GCCAATAT 81 chr10 62142 0 49M * 0 0 CTATTTGCACATATAGTTTTAATACCAATGACGTTAAAATGTATAACAC ghfcf^fhhiiiiiiiiiiiihggiiiiiihiiiiigggggeeeee_ab NM:i:0 MD:Z:49
FCC64Y1ACXX:1:1216:3028:53281#NCCAATAT 65 chr10 62384 0 49M * 0 0 GTCCAGAGACAAATATTTTAAATATTGAAGTTGAAGACCTAAAAATGTG ___`ccdegeegehhhgfffhhhhXeeg_gghhhf_ddfghhfghaaa_ NM:i:1 MD:Z:0T48
FCC64Y1ACXX:1:1207:12316:62577#GCCAATAT 81 chr10 66812 0 49M * 0 0 TGAAAGCATTCCCTTTGAGAATTGGAACAAGACGAGGAGACTACTCTCA iiiiiiiiihiihihhhifiiiiiiihhiiiiiiiigggggeceeebb_ NM:i:0 MD:Z:49

I have also posted the first 5 lines of the file, but don't know what is wrong with line 1 of the file.

If anyone can help, that would be greatly appreciated,

Thanks
Tags: error, htseq, rna-seq
Michael.Ante

Senior Member

Join Date: Oct 2011

Posts: 127
- Share
- Tweet
#2

07-09-2015, 05:19 AM

You may have a look at this biostars threat.
Comment
ea11

Member

Join Date: Jun 2015

Posts: 36
- Share
- Tweet
#3

07-09-2015, 05:25 AM

Thanks or the reply. I have looked at that post previously. So does that mean the only way to resolve the problem would be to realign the raw data? Is there no way around this?

Its just that I was given the aligned BAM files as the alignment was done with an external company.

Thanks
Comment

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Latest Articles

ad_right_rmr

News