Seqanswers Leaderboard Ad

**dpryan** · 03-07-2013, 10:43 AM

I wonder if the segfault is related to the malformed SAM file. The warnings are referring to unmapped reads (or their mates) still having chromosomes assigned to them. I've usually only seen that with bwa. You might try taking a small chunk of the your SAM file and see if you can find and remove the offending line. If removing just that read prevents the segfault, then you will have at least gotten around that.

**Simon Anders** · 03-07-2013, 11:27 AM

You are the first to report a segfault on HTSeq. Not sure how to debug this. You could send me your files and I could try to reproduce it on my machine. You could also try to use the "--samout" option to get htseq-count to write out each SAM line it reads in order to see which line is at fault and then isolate that one.

**[email protected]** · 03-08-2013, 09:03 AM

Thanks for your quick response.

Seeing the run crashes right after the GFF file is read in I presume the line it crashes on is the first one. Also when I try the --samout argument as you suggested the resulting file is empty (0 bytes). I think it has something to do with the format bowtie2 uses as out put. Maybe it could be something to do with spaces or tabs? I had a look at the python-count script but the parser is in a different file, which I did not look at. Maybe something the parser is expecting is having troubles with the bowtie2 format. However I would assume there has been at least a few other people that have used HTseq in conjunction with bowtie2 mappings...

The sam file and gff file are about 3.5 gb together and I am not sure what is the best way of transferring them. I tried dropbox, but it is a bit slow (and it gave me an error). I could ask our server administrator to give you direct access to our server, but if you have an easier solution I would be happy to hear.

Thanks, Sipko

**Simon Anders** · 03-08-2013, 09:15 AM

Actually, as it's the first line: Can you just post the first couple of lines?

In a way, Python should not crash jsut because some spaces or tabs are mixed up. One notorious issue, however, is line endings. If they get messed up (can happen, for example, if you move files between Windows and Unxi environments, or through FTP), suddenly the whole huge file appears as a single line, which Python then tries to read in one go.

**[email protected]** · 03-08-2013, 11:56 AM

I send you a truncated version of the file by mail as the forums seem to change the tabs into spaces.

**Rainbird** · 04-22-2013, 06:53 AM

Any updates with this? I'm having the exact same issue. It seems that HTseq doesn't work with sam file generated by bowtie2.

The reason I switched to bowtie2 is that, in my case, the bwa aligner always generates malformed sam file that can not be correctly converted into bam file using samtools. (with core dump error message "CIGAR and sequence length are inconsistent").

which aligner to choose? please help!

**[email protected]** · 04-22-2013, 07:12 AM

I didnt get any update concerning this topic. I am using STAR to align now and HTseq works with the resulting SAM file. I am very satisfied with STAR and I have seen more people with similar reactions. It was easy to install and I find it very intuitive to use. It is much faster and I like the fact that in their paper (http://bioinformatics.oxfordjournals...ts635.abstract) they show that STAR is the mapper that is the most likely to map reads in agreement with at least two other mappers.
I specifically like this last part as one of my concerns is that different mappers can give different results leading to different conclusions, even though the data is the same...

**Simon Anders** · 04-22-2013, 07:17 AM

I was a bit busy with other stuff, so very sorry I didn't reply earlier. Could one of you two try to make a test case for me, i.e., truncate your SAM and GFF file to just a few lines, and if the segfault still happens, send me the file. Or put the full files somewhere for me to download.

(Sipko: Yes, I got your file from two weeks ago, but I think you said it did not cause a segfault, right? Also, you email client changed the tabs to spaces, too.)

**pecanton** · 07-08-2013, 10:18 AM

I would like to report that I'm also getting a segmentation fault when using a sorted SAM produced from a bowtie2 alignment and running htseq-count. I could send the files, but they are quite big. I could trim the SAM file, though I'm too sure how.

**DrS** · 08-19-2013, 12:49 PM

Add me to the list now. I've also used samtools to convert sam to bam, sort the bam file, and convert back to sam - and still get the same error. This means it's not the first line causing the problem.

Is this only happening for paired-end alignments? Mine is paired-end - anyone else?

**dpryan** · 08-19-2013, 01:03 PM

Originally posted by DrS View Post

Add me to the list now. I've also used samtools to convert sam to bam, sort the bam file, and convert back to sam - and still get the same error. This means it's not the first line causing the problem.

Is this only happening for paired-end alignments? Mine is paired-end - anyone else?

If you could post a small version of your SAM file (such that it still causes a crash) somewhere then it'll be easier for this bug (if that's what it turns out to be) to get fixed.

**DrS** · 08-19-2013, 01:13 PM

I'm trying to narrow down which line is giving me problems right now..

**DrS** · 08-19-2013, 01:23 PM

It doesn't like the last line from these few lines (the lines before it are fine... this is actually the 74th line of my sam file)

M02019:4:000000000-A4P4A:1:1101:6364:9627 99 chr16 24051992 17 160M90S = 24051992 -160 AAAATGTCAATCTGCCCTGTAATAATGCTCATTCTGGCTGCACATCAACTACATGGAACTATAACAGACATTCAGAGACGGTTGAACTGGTTGCTGGAGGGTCACTGAAGACAAACATAAAGAGACGTGAGAGACTGAGTCTGGGCTCTGACTGCTCTCTCTGTCTCTTTAACCCTTTCCCGACCCCCCGAGCCCGACCAAGATCCGGTATGCCGTTTTTTGCTTGAAAAAAAAAAAAAAAAAAAATAAT ABCCAFFFFFFFGGGGGGGGGGHGHHHHHHHHHHHHHHHGGHHHHHHHHHHHHHHHHHHHIIHIIIHHHHHHHHHHHHHFEFGGHHHHHHHHGHHHHHHGGGHHHHHHHHHHHHHHHHHHHHHHHHGG/GGHHGHHHHHHFHHHHHHHHHHHHHHHHHHHHH2@HHHH222221/0011?1//->C.---<<-=.-----...000.---;0;09..;..--0;00;0BFFF;B=--:;-;-;:--.000 AS:i:31XS:i:217 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:8C151 YS:i:313 YT:Z:CP
M02019:4:000000000-A4P4A:1:1101:6364:9627 147 chr16 24051992 17 160M = 24051992 -160 AAAATGTCAATCTGCCCTGTAATAATGCTCATTCTGGCTGCACATCAACTACATGGAACTATAACAGACATTCAGAGACGGTTGAACTGGTTGCTGGAGGGTCACTGAAGACAAACATAAAGAGACGTGAGAGACTGAGTCTGGGCTCTGACTGCTCTCT HHHHHHHHHHGGGDGHHHHHHHHHHHHHHHHHHHGHHHHHGHHHHHHHHHHGHHHHHG4HHHHHHHHHHHHHGHHHGGGHHHHHHHHGHHHHHHHGHHHHHHHHHHHHHHHHHHHFHHHHHHHHHHHHHHHHHHHHHHGGGGGGGGGGFFFFFCCCBBBA AS:i:313 XS:i:218 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:8C151 YS:i:313 YT:Z:CP
M02019:4:000000000-A4P4A:1:1101:6397:9032 89 chr16 19255696 21 104S146M = 19255696 0 TTTATATTCTGAAGACGGATTACGTGTTTTTGTTTTTTTTTTTCAAGCAGAAGACGGCATAAGAGATCTAGTACGGTCTCGTGGGCTCGGAGATGTGTATAAGAAGCTCAGGAATTACTTTGCCTACAGCCTTGGCAGCCCCAGTGGAGGCTGGGATGATGTTCTGACTGGCACCACGGCCATCCCTCCACAGCTTCCCAGAGGGCCCATCAACGGTCTTCTGTGTTGCTGTGATGGCATGAACAGTGCT 99;///;://9..;.://..:......9;...;FFFDFF;;0GGGGGGGGGGGGGHHHHHG:HGFCHHHHGEDGGGGGGFGHF>>>>/HHHHHHHHHHGHHHHHHHHHHBHHHHHHHHHHHHGHHHHHHGHHHHGGGGGHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHGHGFGGHHGGHGGGHHGHHHHHGHHHGHHHHGHHHHHHHGGGGHHHHHHHHGHHHHHGGGGGGGGGGFFFFFFFCCCCC AS:i:292 XS:i:72 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:146 YT:Z:UP
M02019:4:000000000-A4P4A:1:1101:6397:9032 165 chr16 19255696 0 * = 19255696 0 CCTTGATGTTCTTCCACTCTTCAGGTGTAACTGTAAAAGTTCTGAACTTCTATAATTTTCCGTCCTCATTGCACCACCGGTCTTCGAGCTCCAGATCCTCATCTGGCTATTGTCCCTCTTTCTGCCCTTCTCTCTTTATTCCACTACGAGCAACTCCTGTCGCCTGACGCAACACTCGCGGTTGCTCCTATCCTAATCATCCGCTACCGCCGCTGATCACAGGTAAGTCCTAGTTAGTAGGAGGTCACTT >111111B3BD333A1111E1A311111133D1DD3331133212B1DD2222222AA2D1/BB//1112211B00//A////B2//0//1@@11@1@1@1B1211101121BB11100B1B@21FCB001121B2122BB2FG1B0/////0<0B111B1///</00///</0/1</---.--.<00<<=0<00=0D00:0;.:-;/::----9;0;0/00;0900000000;///9///:/---//// YT:Z:UP

**DrS** · 08-19-2013, 01:40 PM

I've now tried this on 14 files I generated alignments for today, and every single one of them fails.

I just ran a quick test on a file I had aligned using tophat2, also paired-end reads, and it worked just fine, so I know it's not my install.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

HTseq Segmentation fault

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News