SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
htseq-count for sam and gff3 sofia17 RNA Sequencing 45 11-04-2016 03:32 PM
DEXSeq Using Counts File From htseq-count FuzzyCoder Bioinformatics 20 01-03-2016 11:18 PM
Threshold quality score to determine the quality read of ILLUMINA reads problem edge Illumina/Solexa 35 11-02-2015 10:31 AM
Error with GTF file when using htseq-count MDonlin Bioinformatics 13 01-13-2015 08:29 AM
mapping quality score in tophat sam file jjpurwar Bioinformatics 5 06-22-2011 11:55 AM

Reply
 
Thread Tools
Old 04-10-2012, 10:12 AM   #1
flashton
Member
 
Location: london, uk

Join Date: Feb 2011
Posts: 10
Default Problem using HTSeq count with SAM file without quality score

Hi everyone,

I have a problem using HTSeq count to analyse my SAM file. I think I know the source of the error and I'm wondering whether there is a work around.

The SAM files I'm dealing with don't have any quality scores as they were made by aligning fasta files to the reference using segemehl. They look like

@HD VN:1.0
@SQ SN:gi|153930785|ref|NC_009697.1| LN:3863450
@PG ID:segemehl VN:0.1.3-$Rev: 335 $ ($Date: 2012-03-13 18:55:51 +0100 (Tue, 13 Mar 2012) $) CL:/usr/local/segemehl/segemehl.x -i cbot19397.idx -d Cbot19397.fasta -q /Volumes/DataRAID/Projects/Phil_RNA_SEQ/170112/fasta/14_trimmed.fa -t 6 -E 0.0001
Noname 0 gi|153930785|ref|NC_009697.1| 1577456 255 25M1I11M * 0 0 TATAAAATATAATAATATATATATATATATATATATA * NM:i:3 MD:Z:19A7T8 NH:i:1 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 3243930 255 49M1D1M * 0 0 ATAGTAACAACAATAATAACAATAACGACAATAACAACAATACTAAACCC * NM:i:1 MD:Z:49^T1 NH:i:1 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 59594 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 458627 255 17M1I4M1D28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:18A2^A28 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 52967 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 67539 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 73821 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q

As you can see, there is a * instead of a qual string. When I try to use HTSeq count by inputting

python -m HTSeq.scripts.count -t CDS -m intersection-nonempty -i Name -a 0 14_trimmed_fasta_v2.sam CP000726.gff

It returns

7465 GFF lines processed.
Error occured when reading first line of sam file.
Error: ("'seq' and 'qualstr' do not have the same length.", 'line 4 of file 14_trimmed_fasta_v2.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:765]

I have tried setting the -a parameter to 0 and * with no positive effect. I think that HTSeq is expecting a quality string which my SAM file doesn't have.

Does anyone know a way around this problem?

Thanks,
flashton is offline   Reply With Quote
Old 04-11-2012, 03:00 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Duplicate thread - see http://seqanswers.com/forums/showthread.php?t=18952 where Simon Anders confirmed this was a known issue.
maubp is offline   Reply With Quote
Old 04-11-2012, 03:29 AM   #3
flashton
Member
 
Location: london, uk

Join Date: Feb 2011
Posts: 10
Default

Thanks Peter, apologies for the duplicate.
flashton is offline   Reply With Quote
Reply

Tags
htseq, htseq-count, rna-seq, rna-seq data analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO