SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Htseq-count problem 11xinqi Bioinformatics 0 12-11-2013 11:49 AM
HTSeq problem on Windows adaigle Bioinformatics 0 10-04-2013 10:51 AM
HTseq count problem dietmar13 Bioinformatics 0 12-19-2012 02:34 AM
Problem using HTSeq pinki999 Bioinformatics 4 10-23-2012 09:40 PM
HTSeq: Problem with installation dariober Bioinformatics 8 11-16-2010 06:24 AM

Reply
 
Thread Tools
Old 01-17-2014, 12:46 AM   #1
PhD1990
Junior Member
 
Location: Belgium

Join Date: Jan 2014
Posts: 3
Default problem with HTSeq

hi everyone

I'm trying to start to use python/HTSeq to try to analyse RNA seq data.
I'm following a tour through HTSeq but i m having a weird problem

i can import HTSeq
and read in a file with the HTSeq.FastqReader
i can get a name of a read with read.name
but when i type read.qual python just automatically restart and i have to start over

does anyone know why this is and how i cna solve this problem?

thank you
PhD1990 is offline   Reply With Quote
Old 01-19-2014, 11:59 AM   #2
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default

Dear PhD1990

it's good that you report having a problem. Probably you need to be more specific for someone to be able to help you. Can you provide a

- reproducible example (i.e. a self-contained piece of code and, if needed, data for others to reproduce your problem)
- a statement of what the problem is that you experience (any error messages, warnings etc.)
- an overview over your system (OS, Python version).

Kind regards
Wolfgang
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Old 01-19-2014, 06:16 PM   #3
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default HTSeq: Very few counts recognised

Hi!
Ive seen a lot of threads on this, but I can't figure it out. I got 16-60 millions single end reads in each library. Ive used Tophat 2 with UCSC GTF file for hg19.

This is my code:

samtools view accepted_hits.bam | \
htseq-count -m intersection-nonempty -s no -a 10 \
- UCSC/hg19/genes.gtf \
> Out.txt

Here is a typical result, its propotional to the library size:

no_feature 7013689
ambiguous 269370
too_low_aQual 0
not_aligned 0
alignment_not_unique 6645341

How come i get on average 25 - 50% reads that is "no_feature",
"ambiguous" or "alignment_not_unique".

This is RNAseq, and if I must visually inspect, how to precede?
sindrle is offline   Reply With Quote
Old 01-19-2014, 09:55 PM   #4
PhD1990
Junior Member
 
Location: Belgium

Join Date: Jan 2014
Posts: 3
Default thanks + second question

hi everyone

thank you so much for helping me
i have found the problem by the way in the tutorial they say you chould download a vcredist x86 2010 version but now i downloaded 2012 and it wordks perfectly

i have a second question though.

Now the tutorial is working for me i still have one really weird problem. to count reads you should download exon information from internet? (ensembl or something) but in the tutorial they give a gtf file and that works perfectly, but on internet i can only find gff3 files for for example E coli strains. How do you use these because i see that the content is different from the gtf file?

is there a standard format? of a place where i can find exon information in gtf version?

thanks
grtz

Sara
PhD1990 is offline   Reply With Quote
Old 01-20-2014, 01:39 AM   #5
bruce01
Senior Member
 
Location: .

Join Date: Mar 2011
Posts: 157
Default

Hi Sara,

you can use GFF3 format in HTSeq, you just need to specify the feature (3rd column) using -t flag as it may be different from default which I think is 'gene_id'. For example '-t gene'. Otherwise you can use a conversion script to make a GTF from GFF3, there are a few around in various scripting languages, or I can PM you one I use if you want.

Bruce.
bruce01 is offline   Reply With Quote
Old 01-20-2014, 05:14 AM   #6
PhD1990
Junior Member
 
Location: Belgium

Join Date: Jan 2014
Posts: 3
Default

hi Bruce

that would be really nice if you could send me such a script

thank you so much

Sara
PhD1990 is offline   Reply With Quote
Old 01-20-2014, 11:12 PM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by sindrle View Post
Hi!
Ive seen a lot of threads on this, but I can't figure it out. I got 16-60 millions single end reads in each library. Ive used Tophat 2 with UCSC GTF file for hg19.

[...]

How come i get on average 25 - 50% reads that is "no_feature",
"ambiguous" or "alignment_not_unique".
Is this a GTF file created with UCSC's table browser? If so: These do not work. There is a bug in the Table Browser server, which causes all the gene IDs to contain not the gene ID but the transcript ID.

Please use a GTF file from another source.

Simon
Simon Anders is offline   Reply With Quote
Reply

Tags
htseq, read.qual, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:59 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO