SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   tophat-fusion-post: ValueError: invalid literal for int() with base 10: 'exonCount' (http://seqanswers.com/forums/showthread.php?t=74876)

komalsrathi 03-17-2017 08:11 AM

tophat-fusion-post: ValueError: invalid literal for int() with base 10: 'exonCount'
 
Hi everyone,

I am running tophat-fusion-post like this:

Code:

tophat-fusion-post -o ./fusion_results -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /mnt/isilon/cbmi/variome/reference/bowtie_indexes/hg38_no_alt/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome
My root folder is tophat-fusion. I have 4 tophat output folders under it: CHP212, SHSY5Y, SKNAS and SKNSH, each of which contain a fusions.out file. I have created symbolic links to refGene.txt, ensGene.txt and blast database (blast) in the same folder.

This is my directory structure where I have run tophat:

Code:

$ tree -L 2 ./tophat-fusion

./
|-- CHP212
|  |-- accepted_hits.bam
|  |-- align_summary.txt
|  |-- deletions.bed
|  |-- fusions.out
|  |-- insertions.bed
|  |-- junctions.bed
|  |-- logs
|  |-- prep_reads.info
|  `-- unmapped.bam
|-- CHP212.sh
|-- CHP212_R1.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/CHP212_R1.fastq.gz
|-- CHP212_R2.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/CHP212_R2.fastq.gz
|-- IMR32.sh
|-- IMR32_R1.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/IMR32_R1.fastq.gz
|-- IMR32_R2.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/IMR32_R2.fastq.gz
|-- SHSY5Y
|  |-- accepted_hits.bam
|  |-- align_summary.txt
|  |-- deletions.bed
|  |-- fusions.out
|  |-- insertions.bed
|  |-- junctions.bed
|  |-- logs
|  |-- prep_reads.info
|  `-- unmapped.bam
|-- SHSY5Y.sh
|-- SHSY5Y_R1.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/SHSY5Y_R1.fastq.gz
|-- SHSY5Y_R2.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/SHSY5Y_R2.fastq.gz
|-- SKNAS
|  |-- accepted_hits.bam
|  |-- align_summary.txt
|  |-- deletions.bed
|  |-- fusions.out
|  |-- insertions.bed
|  |-- junctions.bed
|  |-- logs
|  |-- prep_reads.info
|  `-- unmapped.bam
|-- SKNAS.sh
|-- SKNAS_R1.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/SKNAS_R1.fastq.gz
|-- SKNAS_R2.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/SKNAS_R2.fastq.gz
|-- SKNSH
|  |-- accepted_hits.bam
|  |-- align_summary.txt
|  |-- deletions.bed
|  |-- fusions.out
|  |-- insertions.bed
|  |-- junctions.bed
|  |-- logs
|  |-- prep_reads.info
|  `-- unmapped.bam
|-- SKNSH.sh
|-- SKNSH_R1.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/SKNSH_R1.fastq.gz
|-- SKNSH_R2.fastq.gz -> /mnt/isilon/maris_lab/target_nbl_ngs/CellLineRNASeq/rawfiles/cat_fastq/SKNSH_R2.fastq.gz
|-- blast -> /mnt/isilon/cbmi/variome/reference/blast_db/hg38
|-- ensGene.txt -> /mnt/isilon/cbmi/variome/reference/blast_db/hg38/ensGene.txt
|-- fusion_results
|  |-- fusion_seq.bwtout
|  |-- fusion_seq.fa
|  |-- fusion_seq.map
|  |-- logs
|  `-- tmp
|-- refGene.txt -> /mnt/isilon/cbmi/variome/reference/blast_db/hg38/refGene.txt
`-- tophat-fusion.sh

When I run tophat-fusion-post under this directory, I am getting the following errors:

Code:

[Fri Mar 17 15:09:18 2017] Beginning TopHat-Fusion post-processing run (v2.1.0)
-----------------------------------------------
[Fri Mar 17 15:09:18 2017] Extracting 23-mer around fusions and mapping them using Bowtie
[Fri Mar 17 15:09:30 2017] Filtering fusions
Traceback (most recent call last):
  File "/home/rathik/tools/miniconda3/envs/fusion-env/bin/tophat-fusion-post", line 2924, in <module>
    sys.exit(main())
  File "/home/rathik/tools/miniconda3/envs/fusion-env/bin/tophat-fusion-post", line 2895, in main
    filter_fusion(bwt_idx_prefix, params)
  File "/home/rathik/tools/miniconda3/envs/fusion-env/bin/tophat-fusion-post", line 965, in filter_fusion
    ensGene_list = read_genes("ensGene.txt")
  File "/home/rathik/tools/miniconda3/envs/fusion-env/bin/tophat-fusion-post", line 917, in read_genes
    num_exons = int(line[7])
ValueError: invalid literal for int() with base 10: 'exonCount'


Enraico 06-14-2019 01:35 AM

remove header line from the ensGene.txt file
Code:

mv ensGene.txt ensGene.txt.bk
grep -v "^#" ensGene.txt.bk > ensGene.txt



All times are GMT -8. The time now is 06:53 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.