Seqanswers Leaderboard Ad

**GenoMax** · 08-22-2014, 10:31 AM

Make a backup copy of the file before trying the following:

Code:

$ sed 's/PGOMOU/ENSMUSG/g' your_file > new_file

**dpryan** · 08-22-2014, 10:42 AM

It's unlikely that changing this will solve whatever problem you're having. Post the actual problem and we'll try to solve it.

**diegobonatto** · 08-22-2014, 10:48 AM

Thanks all!

I'm trying to use the GTF file from Gencode that contain all pseudogenes predicted by the Yale & UCSC pipelines (but not by Havana on reference chromosomes) (ftp://ftp.sanger.ac.uk/pub/gencode/G...pseudos.gtf.gz) with the last GRCm38.p3 assembly, also from Gencode. The fastq files are OK. However, when I use Bowtie2, I'm always getting Bowtie error = 1, which could be related to PGOMOU gene nomenclature (my hypothesis). My first idea was to change all PGOMOU for ENSMUSG in order to allow Bowtie to recognize the same ID on genome...or I'm wrong?

**dpryan** · 08-22-2014, 10:58 AM

ENSMUSG* doesn't exist in the mouse genome (it's just used in the annotation). Please provide the exact command you used that produced the error and entire error message including the entire output that's printed to the screen.

**diegobonatto** · 08-22-2014, 11:07 AM

Here it goes:

HTML Code:

tophat2 -p4 -G gencode.v20.2wayconspseudos.gtf -o MSCd0Adip-12 GRCh38 SRR490218_output2.fastq

[2014-08-22 15:59:44] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2014-08-22 15:59:44] Checking for Bowtie
                  Bowtie version:        2.1.0.0
[2014-08-22 15:59:44] Checking for Samtools
                Samtools version:        0.1.19.0
[2014-08-22 15:59:44] Checking for Bowtie index files (genome)..
[2014-08-22 15:59:44] Checking for reference FASTA file
        Warning: Could not find FASTA file GRCh38.fa
[2014-08-22 15:59:44] Reconstituting reference FASTA file from Bowtie index
  Executing: /usr/bin/bowtie2-inspect GRCh38 > MSCd0Adip-12/tmp/GRCh38.fa
[2014-08-22 16:02:05] Generating SAM header for GRCh38
        format:          fastq
        quality scale:   phred33 (default)
[2014-08-22 16:03:07] Reading known junctions from GTF file
        Warning: TopHat did not find any junctions in GTF file
[2014-08-22 16:03:07] Preparing reads
         left reads: min. length=60, max. length=66, 53983 kept reads (176 discarded)
[2014-08-22 16:03:09] Building transcriptome data files..
[2014-08-22 16:04:02] Building Bowtie index from gencode.v20.2wayconspseudos.fa
        [FAILED]
Error: Couldn't build bowtie index with err = 1

The information is from the human GTF file and, of course, the last human genome assembly from Gencode, which generate an identical error for mouse. I'm running Bowtie together with TopHat (I known that is not necessary. Just only Bowtie is sufficient for alignment).

thanks again!

**dpryan** · 08-22-2014, 11:18 AM

You can't expect an mouse annotation and a human reference sequence to be compatible (no amount of changing ID names will change that).

**diegobonatto** · 08-22-2014, 11:25 AM

Yes, of course, but the example that I posted was for human (GTF AND genome) and the alignment was human Gencode pseudogene GTF with human genome assembly. When I tested the murine Gencode pseudogenes GTF AND murine genome (also from Gencode), I got the same Bowtie error.....If you look the example that I posted, the genome is from human and the GTF is from human. No murine genome OR murine GTF was used in that example.

**dpryan** · 08-22-2014, 11:29 AM

Then look in the run log for the last command that tophat issued and run that yourself. You'll then get the actual underlying error message.

**diegobonatto** · 08-22-2014, 11:29 AM

And to reinforce that I'm not mixturing murine AND humans, each fastq is especific for each organism....

Again, any help is welcome!

**diegobonatto** · 08-22-2014, 11:36 AM

OK....TopHat indicated that "TopHat did not find any junctions in GTF file" and in run log the following command was used

PHP Code:


/usr/bin/tophat -p4 -G gencode.v20.2wayconspseudos.gtf -o MSCd0Adip-12 GRCh38 SRR490218_output2.fastq

/usr/bin/gtf_juncs gencode.v20.2wayconspseudos.gtf  > MSCd0Adip-12/tmp/gencode.juncs

#>prep_reads:

/usr/bin/prep_reads --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir MSCd0Adip-12/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p4 --gtf-annotations gencode.v20.2wayconspseudos.gtf --gtf-juncs MSCd0Adip-12/tmp/gencode.juncs --no-closure-search --no-coverage-search --no-microexon-search --fastq --aux-outfile=MSCd0Adip-12/prep_reads.info --index-outfile=MSCd0Adip-12/tmp/left_kept_reads.bam.index --sam-header=MSCd0Adip-12/tmp/GRCh38_genome.bwt.samheader.sam --outfile=MSCd0Adip-12/tmp/left_kept_reads.bam SRR490218_output2.fastq

#>map_start:

/usr/bin/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir MSCd0Adip-12/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p4 --gtf-annotations gencode.v20.2wayconspseudos.gtf --gtf-juncs MSCd0Adip-12/tmp/gencode.juncs --no-closure-search --no-coverage-search --no-microexon-search gencode.v20.2wayconspseudos.gtf MSCd0Adip-12/tmp/GRCh38.fa MSCd0Adip-12/tmp/gencode.v20.2wayconspseudos.fa > MSCd0Adip-12/logs/g2f.out

/usr/bin/bowtie2-build MSCd0Adip-12/tmp/gencode.v20.2wayconspseudos.fa MSCd0Adip-12/tmp/gencode.v20.2wayconspseudos

Could it be possible that the search for junction by TopHat is inducing Bowtie error (again, my hypothesis. Excuse if it is to naive, but I'm struggling with this error at some days)?

**dpryan** · 08-22-2014, 11:40 AM

Possible, is there anything in "MSCd0Adip-12/tmp/gencode.v20.2wayconspseudos.fa"?

**diegobonatto** · 08-22-2014, 11:42 AM

No, it empty....(0 bytes)

**dpryan** · 08-22-2014, 11:48 AM

OK, now we're getting somewhere. That file is made by gtf_to_fasta, so something is going wrong with it. This could be the lack of junctions or it could be something else. Try running that command without the "--gtf-juncs MSCd0Adip-12/tmp/gencode.juncs" options and see what happens (I haven't a clue if it'll even run). If it runs, check to see if the resulting fasta files is empty or not.

Can you look through the GTF file and just see if you see any spliced transcripts? I wonder if tophat ignore pseudogenes.

**diegobonatto** · 08-22-2014, 03:26 PM

The comand did not work also...in fact, the GTF file just only contains spliced transcripts. Maybe running only with Bowtie alone should work. That's a weird problem...

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Changing text in a GTF

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News