![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Cufflinks 2.0.2 segmentation fault | bmicro_mit1 | RNA Sequencing | 13 | 01-21-2015 11:45 AM |
segmentation fault with cisGenome | aquleaf | Bioinformatics | 0 | 10-03-2012 08:31 AM |
Newbler segmentation fault | flobpf | Bioinformatics | 4 | 04-18-2011 12:45 PM |
Maq: Segmentation Fault | mrxcm3 | Bioinformatics | 2 | 03-21-2011 03:43 PM |
Segmentation fault in consed | michaelbarton | Bioinformatics | 0 | 06-11-2010 01:50 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: England Join Date: Mar 2013
Posts: 13
|
![]()
I have been trying to get counts for genes using HTseq but it keep crashing giving me a segmentation error and I have no idea what is causing it.
The data I use is created using bowtie2 on Human data from Encode: bowtie2 -p 16 -x human_index/hg19 -1 wgEncodeCshlLongRnaSeqHelas3CellPapFastqRd1Rep1 -2 wgEncodeCshlLongRnaSeqHelas3CellPapFastqRd2Rep1 -S wgEncodeCshlLongRnaSeqHelas3CellPapFastqRep1.sam The commandline I use to run HTseq: htseq-count -s no wgEncodeCshlLongRnaSeqHelas3CellPapFastqRep1.sam Homo_sapiens.GRCh37.70.gtf I get 2 warnings and an error if I run this command: 2200000 GFF lines processed. 2280612 GFF lines processed. Warning: Malformed SAM line: MRNM != '*' although flag bit &0x0008 set Warning: Malformed SAM line: RNAME != '*' although flag bit &0x0004 set Segmentation fault (core dumped) The log for the segmentation fault has this error: kernel: [5076890.033992] python[27097]: segfault at 0 ip 00007f38cec6336c sp 00007fff4b6b7190 error 6 in _HTSeq.so[7f38cec24000+50000] I downloaded the GTF file from ensembl and from what I understand I had to replace the chromosome numbers e.g.: "12" to "chr12" (As before I did this it could not find the chromosomes), which I did using sed. My GTF file has the following format: chr11 processed_pseudogene exon 75780 76143 . + . gene_id "ENSG00000253826"; transcript_id "ENST00000519787"; exon_number "1"; gene_name "RP11-304M2.1"; gene_biotype "pseudogene"; transcript_name "RP11-304M2.1-001"; exon_id "ENSE00002139035"; chr11 processed_transcript exon 86612 87605 . - . gene_id "ENSG00000224777"; transcript_id "ENST00000521196"; exon_number "1"; gene_name "OR4F2P"; gene_biotype "pseudogene"; transcript_name "OR4F2P-002"; exon_id "ENSE00002124594"; My Sam file has the following format: BILLIEHOLIDAY_0004:1:1:2921:975#0 77 0 0 0 * * 0 0 NCAAAAGTGACAATCCAGCAATTCCAAATAAGGTATGAAAAGGATCCACCATATCTCCTGGCCTGTCTGCAAATCC BIKIHLJMNMb___b______________bQQ__bb____b_____b____bbb_____________bb______b YT:Z:UP BILLIEHOLIDAY_0004:1:1:2921:975#0 141 0 0 0 * * 0 0 NNTAGTTATNCTACTCATGTTGNTTCCNNGNNTCCCTAAAGATAATTNGAAGACTTCATTGGATTTATAGAGAGAA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB YT:Z:UP BILLIEHOLIDAY_0004:1:1:3277:975#0 83 chr10 64566791 23 76M = 64566652 -215 CAGAAGATCACAGCTAGAGAATTGAGAATTAACTATACTACTAGCCATTTTAGGGCACCAAAACTTGGGATTAAAN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:75C0 YS:i:-39 YT:Z:CPCAGAAGATCACAGCTAGAGAATTGAGAATTAACTATACTACTAGCCATTTTAGGGCACCAAAACTTGGGATTAAAN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:75C0 YS:i:-39 YT:Z:CP What am I doing wrong? Thank you, Sipko |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
I wonder if the segfault is related to the malformed SAM file. The warnings are referring to unmapped reads (or their mates) still having chromosomes assigned to them. I've usually only seen that with bwa. You might try taking a small chunk of the your SAM file and see if you can find and remove the offending line. If removing just that read prevents the segfault, then you will have at least gotten around that.
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
You are the first to report a segfault on HTSeq. Not sure how to debug this. You could send me your files and I could try to reproduce it on my machine. You could also try to use the "--samout" option to get htseq-count to write out each SAM line it reads in order to see which line is at fault and then isolate that one.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: England Join Date: Mar 2013
Posts: 13
|
![]()
Thanks for your quick response.
Seeing the run crashes right after the GFF file is read in I presume the line it crashes on is the first one. Also when I try the --samout argument as you suggested the resulting file is empty (0 bytes). I think it has something to do with the format bowtie2 uses as out put. Maybe it could be something to do with spaces or tabs? I had a look at the python-count script but the parser is in a different file, which I did not look at. Maybe something the parser is expecting is having troubles with the bowtie2 format. However I would assume there has been at least a few other people that have used HTseq in conjunction with bowtie2 mappings... The sam file and gff file are about 3.5 gb together and I am not sure what is the best way of transferring them. I tried dropbox, but it is a bit slow (and it gave me an error). I could ask our server administrator to give you direct access to our server, but if you have an easier solution I would be happy to hear. Thanks, Sipko |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Actually, as it's the first line: Can you just post the first couple of lines?
In a way, Python should not crash jsut because some spaces or tabs are mixed up. One notorious issue, however, is line endings. If they get messed up (can happen, for example, if you move files between Windows and Unxi environments, or through FTP), suddenly the whole huge file appears as a single line, which Python then tries to read in one go. |
![]() |
![]() |
![]() |
#6 |
Member
Location: England Join Date: Mar 2013
Posts: 13
|
![]()
I send you a truncated version of the file by mail as the forums seem to change the tabs into spaces.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: US Join Date: Dec 2012
Posts: 16
|
![]()
Any updates with this? I'm having the exact same issue. It seems that HTseq doesn't work with sam file generated by bowtie2.
The reason I switched to bowtie2 is that, in my case, the bwa aligner always generates malformed sam file that can not be correctly converted into bam file using samtools. (with core dump error message "CIGAR and sequence length are inconsistent"). which aligner to choose? please help! |
![]() |
![]() |
![]() |
#8 |
Member
Location: England Join Date: Mar 2013
Posts: 13
|
![]()
I didnt get any update concerning this topic. I am using STAR to align now and HTseq works with the resulting SAM file. I am very satisfied with STAR and I have seen more people with similar reactions. It was easy to install and I find it very intuitive to use. It is much faster and I like the fact that in their paper (http://bioinformatics.oxfordjournals...ts635.abstract) they show that STAR is the mapper that is the most likely to map reads in agreement with at least two other mappers.
I specifically like this last part as one of my concerns is that different mappers can give different results leading to different conclusions, even though the data is the same... |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
I was a bit busy with other stuff, so very sorry I didn't reply earlier. Could one of you two try to make a test case for me, i.e., truncate your SAM and GFF file to just a few lines, and if the segfault still happens, send me the file. Or put the full files somewhere for me to download.
(Sipko: Yes, I got your file from two weeks ago, but I think you said it did not cause a segfault, right? Also, you email client changed the tabs to spaces, too.) |
![]() |
![]() |
![]() |
#10 |
Member
Location: Florida, USA Join Date: Jun 2011
Posts: 13
|
![]()
I would like to report that I'm also getting a segmentation fault when using a sorted SAM produced from a bowtie2 alignment and running htseq-count. I could send the files, but they are quite big. I could trim the SAM file, though I'm too sure how.
|
![]() |
![]() |
![]() |
#11 |
Member
Location: North Carolina Join Date: Mar 2013
Posts: 17
|
![]()
Add me to the list now. I've also used samtools to convert sam to bam, sort the bam file, and convert back to sam - and still get the same error. This means it's not the first line causing the problem.
Is this only happening for paired-end alignments? Mine is paired-end - anyone else? |
![]() |
![]() |
![]() |
#12 | |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#13 |
Member
Location: North Carolina Join Date: Mar 2013
Posts: 17
|
![]()
I'm trying to narrow down which line is giving me problems right now..
|
![]() |
![]() |
![]() |
#14 | |
Member
Location: North Carolina Join Date: Mar 2013
Posts: 17
|
![]()
It doesn't like the last line from these few lines (the lines before it are fine... this is actually the 74th line of my sam file)
Quote:
|
|
![]() |
![]() |
![]() |
#15 |
Member
Location: North Carolina Join Date: Mar 2013
Posts: 17
|
![]()
I've now tried this on 14 files I generated alignments for today, and every single one of them fails.
I just ran a quick test on a file I had aligned using tophat2, also paired-end reads, and it worked just fine, so I know it's not my install. |
![]() |
![]() |
![]() |
#16 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Are those two reads sufficient to cause the crash, or are the earlier lines also required? Also, could you post the exact command you used that caused the crash (with options, like strandedness)?
|
![]() |
![]() |
![]() |
#17 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
FYI, I was able to reproduce this crash in HTSeq-0.5.3 but it ran correctly in HTSeq-0.5.4. If you're not using the most up-to-date version, try upgrading and see if that fixes things.
|
![]() |
![]() |
![]() |
#18 |
Member
Location: North Carolina Join Date: Mar 2013
Posts: 17
|
![]()
Thank you - I'll try to do that. And yes, those were sufficient to make it crash.
|
![]() |
![]() |
![]() |
#19 |
Member
Location: North Carolina Join Date: Mar 2013
Posts: 17
|
![]()
Public License v3. Part of the 'HTSeq' framework, version 0.5.4p1.
Already running 0.5.4 |
![]() |
![]() |
![]() |
#20 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
The most recent version is 0.5.4p3
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|