Seqanswers Leaderboard Ad

**Simon Anders** · 10-20-2010, 03:48 AM

Hi

Originally posted by agali View Post

2_512_865_F3 16 Esi0595_0002 conserved unknown protein [1335] f:2354-3688 613 255 3H47M * 0 0 * AS:i:347

Your SAM file is incorrect. According to the specs, a SAM file has the following fields:

Code:

<QNAME> <FLAG> <RNAME> <POS> <MAPQ> <CIGAR> <MRNM> <MPOS> <ISIZE> <SEQ> <QUAL> [<TAG>:<VTYPE>:<VALUE> [...]]

I try to align your fields to the field names:

QNAME: 2_512_865_F3
FLAG: 16
RNAME: Esi0595_0002 conserved unknown protein [1335] f:2354-3688 (assuming these are all spaces and no tags in here)
POS: 613
MAPQ: 255
CIGAR: 3H47M
MRNM: *
MPOS: 0
ISIZE: 0
SEQ: *
QUAL: AS:i:347
TAG:VTYPE:VALUE:

Obviously, "AS:i:347" is a tag and should hence be in the 12th column. It is, however, in the 11th column, and hence read as quality string.

Where did you get this SAM file from?

Simon

**agali** · 10-20-2010, 04:59 AM

Hi Simon,

The SAM file is from SHRiMP. I looked up the file format specification and I think there should be a '*' in the QUAL field when there is a '*' in the SEQ field..
I will try to put an extra column in my SAM file and then run it on HTSeq.

Thanks!
Aga

**mmpillai** · 11-02-2010, 10:56 AM

Hi,
I am trying to install 04.5p5 on windows. I get this error when I run the setup.py on a shell
Traceback (most recent call last):
File "C:\Python26\Lib\site-packages\HTSeq-0.4.5p5\setup.py", line 62, in <module>
'scripts/htseq-count',
File "C:\Python26\lib\distutils\core.py", line 140, in setup
raise SystemExit, gen_usage(dist.script_name) + "\nerror: %s" % msg
SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

error: no commands supplied
Please help, all help is very much appreciated.
Thanks
Manoj

**alvin** · 12-21-2010, 10:38 AM

No Feature

Hi!
I wonder if it possible to retrieve the id of the reads that has "no feature" in htseq-count.
I'm interested in those reads that do not overlap with any annotated gene.
I would really appreciate any suggestion.
Thanks
Best regards.

Alvaro Pena

**Simon Anders** · 12-22-2010, 03:05 AM

Hi

Originally posted by mmpillai View Post

Hi,
I am trying to install 04.5p5 on windows. I get this error when I run the setup.py on a shell
Traceback (most recent call last):
File "C:\Python26\Lib\site-packages\HTSeq-0.4.5p5\setup.py", line 62, in <module>
'scripts/htseq-count',
File "C:\Python26\lib\distutils\core.py", line 140, in setup
raise SystemExit, gen_usage(dist.script_name) + "\nerror: %s" % msg
SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

error: no commands supplied
Please help, all help is very much appreciated.
Thanks
Manoj

Please read the installation instructions:

http://www-huber.embl.de/users/anders/HTSeq/doc/install.html

I haven't made a Windows binary package for a while, though.

(I still have trouble understanding why anybody would want to do HTS bioinformatics on Windows. Nearly all bioinformatics developers work on GNU systems (Linux or Mac OS). Ensuring that a tool developed on Linux works on a Mac, or vice versa, is trivial, but supporting Windows is always extra work, and hence has low priority for us developers, which makes Windows a bad choice for users, too.)

Simon

**Simon Anders** · 12-22-2010, 05:45 AM

Hi Alvaro

Originally posted by alvin View Post

I wonder if it possible to retrieve the id of the reads that has "no feature" in htseq-count.
I'm interested in those reads that do not overlap with any annotated gene.
I would really appreciate any suggestion.

As you are by now the fourth person requesting this feature, I thought I'd no longer only make promises to add it in the future but rather get it done. :-)

The new version, 0.4.7, now offers a new option, "-o", for htseq-count. If you add '-o', followed by a filename, a SAM file of this name will be written that contains the same lines as the input SAM file, but with each line appended by an optional field, with tag 'XF', that indicates how the read was counted, i.e., it is either a gene name, or a special counter name like "no_feature". With grep and cut, you can then get what you want.

Simon

**alvin** · 12-22-2010, 06:31 AM

Originally posted by Simon Anders View Post

Hi Alvaro

As you are by now the fourth person requesting this feature, I thought I'd no longer only make promises to add it in the future but rather get it done. :-)

The new version, 0.4.7, now offers a new option, "-o", for htseq-count. If you add '-o', followed by a filename, a SAM file of this name will be written that contains the same lines as the input SAM file, but with each line appended by an optional field, with tag 'XF', that indicates how the read was counted, i.e., it is either a gene name, or a special counter name like "no_feature". With grep and cut, you can then get what you want.

Simon

Great! I found the -o option very useful.
Thank you very much for your help.
Best Regards

Álvaro Pena

**marcora** · 01-10-2011, 03:39 AM

Originally posted by Simon Anders View Post

Hi Keith
At the moment, HTSeq can natively only work with SAM files. Adding BAM support is on my to-do list, and of course, I would do it by simply wrapping the samtools.

Cheers
Simon

Hi Simon,

is BAM support in HTSeq coming soon!?!

Keep up the good work!

**naluru** · 01-11-2011, 11:11 AM

htseq-count for miRNA

I am using "htseq-count" to count the miRNA using their genomic coordinates. It worked very well. But, I am also interested in a more detailed output. I want an output with each and every aligned read and their counts. The reason for this is, there are lot of miRNA length variants, mature star and precursor sequences. It would be nice to see the proportion of different reads. Right now, I can only see the counts of all precursor miRNAs.

I would like to know if there is any way to get that information and can provide some hints that will be highly appreciated.

Thank you in advance.

**mmpillai** · 01-13-2011, 08:25 PM

Hi Simon,
I heed your advice re: the OS - I have succesfully installed HTSeq in my linux system. I wanted to install it from binary on my Mac, but the binary package is not available for download on PyPI. (I dont want to download XCode - seems like it is >3.5 Gb in size ).
Thanks again, bioinformatics clearly being the bottleneck for high throughput applications, packages such as yours is clearly very helpful.
Manoj

**Simon Anders** · 01-14-2011, 12:44 AM

Hi Manoj,

I don't provide binary packages for MacOS -- it's too complicated as I don't have a Mac myself. Please install XCode and use it. (Actually, you only need GCC, but installing all of XCode is easiest.)

XCode comes with MacOS and can be found on the second of the two MacOS installation CDs. So, there is no need to download it.

(If a Mac user reading this wants to help out Manoj: Run the command 'python setup.py bdist' in the unpacked tarball, and a binary package will be built automatically and packed into a single file.)

Simon

**marcora** · 01-14-2011, 03:11 AM

Originally posted by Simon Anders View Post

Hi Manoj,

I don't provide binary packages for MacOS -- it's too complicated as I don't have a Mac myself. Please install XCode and use it. (Actually, you only need GCC, but installing all of XCode is easiest.)

XCode comes with MacOS and can be found on the second of the two MacOS installation CDs. So, there is no need to download it.

(If a Mac user reading this wants to help out Manoj: Run the command 'python setup.py bdist' in the unpacked tarball, and a binary package will be built automatically and packed into a single file.)

Simon

Here we go!

Attached Files

HTSeq-0.4.7.macosx-10.5-i386.tar.gz (204.6 KB, 37 views)

**mmpillai** · 01-14-2011, 09:23 AM

Simon and Marcora: thanks much !

**fennan** · 02-09-2011, 10:53 AM

Hi Simon,

In one of my datasets, I'm getting a lot of these warnings:

Read ILLUMINA-GA_0000:8:36:18294:7129#0 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

If I grep for these reads in the SAM file I do find the two mates:

Code:

ILLUMINA-GA_0000:8:36:18294:7129#0    163     chrY    59342791        255     38M     =       59342801        0 CAGAGGGCAGCAGGAGCAGCAGCAGCAGCAGCAGCAGC hdhhehhhhhhgghhghghgahhff[fhacfdaahhgh  NM:i:0  NH:i:1  XS:A:+
ILLUMINA-GA_0000:8:36:18294:7129#0    83      chrY    59342801        255     38M     =       59342791        0 CAGGAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAACA abaQWdffRbWWffWfd]aa_ggfggcgfgfgggfggg  NM:i:1  NH:i:1  XS:A:+

Questions:

1) Why is this warning coming up?
2) When this warning appears, is the read discarded? I'm getting results that are not making a lot of sense to me:

Code:

The command:
htseq-count -s yes -i gene_id -m intersection-nonempty accepted_hits.sam /scratch/fdgarcia/data/gtfs/Homo_sapiens.GRCh37.60.gtf > counts.txt


Results for ~210000000 reads:

no_feature      130841007
ambiguous       51826
too_low_aQual   0
not_aligned     0
alignment_not_unique    66886614

Thanks!

**Simon Anders** · 02-09-2011, 11:23 AM

Hi Fennan

Originally posted by fennan View Post

In one of my datasets, I'm getting a lot of these warnings:

Read ILLUMINA-GA_0000:8:36:18294:7129#0 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

If I grep for these reads in the SAM file I do find the two mates:
...

Well, is the SAM file properly sorted?

If you use htseq-count on paired-end data, you need to make sure that all SAM lines referring to the same read pair are in adjacent lines. To this end, you need to sort the SAM file by read name. (Just run it through the standard Unix 'sort' command.)

Simon

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News