Seqanswers Leaderboard Ad

**danwiththeplan** · 01-24-2016, 08:07 PM

XLOC numbers are assigned to all genes, not just ones that aren't in GTF files you supply. So some genes will end up with a gene_name, but all genes end up with a gene_id (=XLOC number). I think the issue is that the downstream programs are looking for the gene_id field and not the gene_name field.

Incidentally, as I am interpreting it, using the -G switch (as opposed to the -g switch) means that you'll only ever detect/quantitate/analyse genes that are in your GTF file (while using -g means that cufflinks will create a new gene from scratch if there is sufficient read support, even if it's not in the GTF file you supply).

So, with the code you used, all output genes should have both a gene_id (XLOC number) and a gene_name (from the GTF)

**gtduarte** · 01-25-2016, 12:32 PM

Hello dan, thanks for replying. I tried what you suggested, running cufflinks with the -g option instead of -G, but unfortunately it didn't work:

$ cufflinks -o cuff_g -g ~/path_to/A_thaliana.TAIR10.30.gtf -b ~/path_to/A_thaliana.TAIR10.30.fa myfile_sorted.bam

Indeed the resulting transcripts.gtf was a bit different from the previous one, for instance:

-> with -G switch:

1 Cufflinks transcript 11649 13714 1000 - . gene_id "gene:AT1G01030"; transcript_id "transcript:AT1G01030.1"; FPKM "0.4354775887"; frac "1.000000"; conf_lo "0.217739"; conf_hi "0.653216"; cov "1.257353";

-> with -g switch:

1 Cufflinks transcript 11649 13714 1000 - . gene_id "AT1G01030"; transcript_id "AT1G01030.1"; FPKM "0.4330886861"; frac "1.000000"; conf_lo "0.216544"; conf_hi "0.649633"; cov "1.250649"; full_read_support "yes";

However, those bam_errors continue to appear when I run cuffmerge, as before, just as the XLOC values as my gene ids:

Example of the merged.gtf:

1 Cufflinks exon 3631 3913 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "NAC001"; oId "transcript:AT1G01010.1"; nearest_ref "transcript:AT1G01010.1"; class_code "="; tss_id "TSS1"; p_id "P1";

Nevertheless I run cuffdiff, and there were the XLOCs:

From gene_exp.diff:

test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
XLOC_000001 XLOC_000001 NAC001 1:3630-5899 wtmock1 wtaba1 OK 2.82389 5.42847 0.94286 0.938326 0.3329 0.999039 no

Just in case, I checked tophat accepted_hits.bam headers, but apparently it seems fine:

$ samtools view -H wtaba1_sorted.bam

@HD VN:1.0 SO:coordinate
@SQ SN:1 LN:30427671
@SQ SN:2 LN:19698289
@SQ SN:3 LN:23459830
@SQ SN:4 LN:18585056
@SQ SN:5 LN:26975502
@SQ SN:Mt LN:366924
@SQ SN:Pt LN:154478
@PG ID:TopHat VN:2.1.0 CL:/usr/bin/tophat -N 3 --read-edit-dist 4 --read-realign-edit-dist 0 -a 6 --microexon-search -r 150 --mate-std-dev 200 -i 8 -I 10000 --min-segment-intron 8 --max-segment-intron 10000 --b2-very-sensitive /path_to/bowtie_index/A_thaliana.TAIR10.30 myfile1_1_paired.fastq.trim myfile1_2_paired.fastq.trim

Do you have any other clue?

Many thanks again!

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Gene id changed to XLOC... cuffmerge issue?

Comment

Comment

Latest Articles

ad_right_rmr

News