SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting statistically significant differentially expressed exons from DEXseq resu pm2012 RNA Sequencing 2 04-01-2014 03:06 AM
miRNA missed by Bowtie alignment wieni RNA Sequencing 6 03-18-2014 02:10 AM
missed region cmccabe Bioinformatics 0 10-23-2013 02:42 PM
Differentially expressed exons? jiwu2573 Bioinformatics 7 02-28-2010 09:51 PM
short reads missed by aligners bioinfosm Bioinformatics 10 08-08-2008 03:50 AM

Reply
 
Thread Tools
Old 10-29-2014, 07:52 AM   #1
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default MATS missed a few obviously differentially spliced exons.

Hi,

This is a difficult question and specific to MATS, but if there is anyone familiar with MATS, I would be grateful for the help.
I've already emailed the authors, but I haven't yet received an answer.

I've been running MATS to identify differentially spliced exons.
Overall, I'm satisfied with the results. I've confirmed with IGV some of the differentially spliced exons.
However, MATS missed a few differentially spliced exons in the Macf1 gene that to me are quite evident to me in IGV.

The annotation file used was the most recent (version 77) Ensembl GTF file for M. musculus.
I've put in attachment an IGV screenshot, and a Sashimi plot generated with IGV, of the differentially spliced exons that MATS did not identify.

I also grepped Macf1 in the folder ASEvent, which is supposed to contain all possible alternative splicing events, and none of the exons returned correspond to the differentially spliced exons identified with IGV.
So, it appears MATS did not even test for these exons.
Could this be a parsing problem of the Ensembl GTF file?
The differentially spliced exons do appear in the GTF file, as illustrated both in the IGV screenshots and the text file in attachment with the grep results for Macf1 on the Ensembl GTF file.

Code:
[username@lg-1r17-n04 ASEvents]$ basename `pwd`
ASEvents
[username@lg-1r17-n04 ASEvents]$ grep Macf1 *
fromGTF.A3SS.txt:1289	"ENSMUSG00000028649"	"Macf1"	4	-	123368820	123368835	123368820	123368832	123369804	123369968
fromGTF.A5SS.txt:799	"ENSMUSG00000028649"	"Macf1"	4	-	123438466	123438666	123438472	123438666	123434699	123435195
fromGTF.AFE.txt:5765	"ENSMUSG00000028649"	"Macf1"	4	-	123683969	123684360	123564462	123564694	123554303	123554365
fromGTF.AFE.txt:5766	"ENSMUSG00000028649"	"Macf1"	4	-	123664465	123664752	123564462	123564694	123554303	123554365
fromGTF.AFE.txt:5767	"ENSMUSG00000028649"	"Macf1"	4	-	123683969	123684360	123664465	123664752	123554303	123554365
fromGTF.SE.txt:5013	"ENSMUSG00000028649"	"Macf1"	4	-	123364056	123364074	123360909	123361027	123365272	123365347
fromGTF.SE.txt:5014	"ENSMUSG00000028649"	"Macf1"	4	-	123397093	123397420	123395912	123395997	123397747	123397907
fromGTF.SE.txt:5015	"ENSMUSG00000028649"	"Macf1"	4	-	123354478	123354580	123351821	123351951	123355055	123355267
fromGTF.SE.txt:5016	"ENSMUSG00000028649"	"Macf1"	4	-	123441602	123441665	123440549	123440791	123444834	123444952
fromGTF.SE.txt:5017	"ENSMUSG00000028649"	"Macf1"	4	-	123368820	123368832	123367874	123368037	123369804	123369968
fromGTF.SE.txt:5018	"ENSMUSG00000028649"	"Macf1"	4	-	123368820	123368835	123367874	123368037	123369804	123369968
fromGTF.SE.txt:5019	"ENSMUSG00000028649"	"Macf1"	4	-	123386548	123386557	123385454	123385686	123387227	123387431
Thank you for your help.
Attached Images
File Type: png IGV screenshot Macf1.png (66.6 KB, 5 views)
File Type: png Sashim_plot_Macf1_2.png (141.6 KB, 7 views)
Attached Files
File Type: txt log.RNASeq-MATS.2014-10-28 22:47:47.174081.txt (15.2 KB, 2 views)

Last edited by blancha; 10-29-2014 at 08:04 AM.
blancha is offline   Reply With Quote
Old 10-29-2014, 08:03 AM   #2
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Apparently, the file resulting for grepping for Macf1 in the Ensembl (version 77) GTF file for M. musculus was too big to put in attachment.

Here are just the first lines, the lines with one of the exons that was not identified as differentially spliced, and the last few lines.

Code:
4	ensembl_havana	gene	123349633	123684360	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
4	ensembl_havana	transcript	123349633	123369931	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000123765"; transcript_version "2"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-003"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "cds_start_NF"; tag "mRNA_start_NF"; tss_id "TSS44415"; p_id "P29600";
4	ensembl_havana	exon	123369805	123369931	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000123765"; transcript_version "2"; exon_number "1"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-003"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; exon_id "ENSMUSE00001259025"; exon_version "1"; tag "cds_start_NF"; tag "mRNA_start_NF"; tss_id "TSS44415"; p_id "P29600";
4	ensembl_havana	CDS	123369805	123369931	.	-	1	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000123765"; transcript_version "2"; exon_number "1"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-003"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; protein_id "ENSMUSP00000119600"; protein_version "1"; tag "cds_start_NF"; tag "mRNA_start_NF"; tss_id "TSS44415"; p_id "P29600";
4	ensembl_havana	exon	123367875	123368037	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000123765"; transcript_version "2"; exon_number "2"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-003"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; exon_id "ENSMUSE00001262976"; exon_version "1"; tag "cds_start_NF"; tag "mRNA_start_NF"; tss_id "TSS44415"; p_id "P29600";
4	ensembl_havana	CDS	123367875	123368037	.	-	0	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000123765"; transcript_version "2"; exon_number "2"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-003"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; protein_id "ENSMUSP00000119600"; protein_version "1"; tag "cds_start_NF"; tag "mRNA_start_NF"; tss_id "TSS44415"; p_id "P29600";
...
4	ensembl	exon	123480187	123480322	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000097897"; transcript_version "5"; exon_number "35"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-202"; transcript_source "ensembl"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS57295"; exon_id "ENSMUSE00001046125"; exon_version "1"; tss_id "TSS44416"; p_id "P29601";
4	ensembl	CDS	123480187	123480322	.	-	1	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000097897"; transcript_version "5"; exon_number "35"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-202"; transcript_source "ensembl"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS57295"; protein_id "ENSMUSP00000095507"; protein_version "4"; tss_id "TSS44416"; p_id "P29601";
4	ensembl	exon	123471007	123476337	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000097897"; transcript_version "5"; exon_number "36"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-202"; transcript_source "ensembl"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS57295"; exon_id "ENSMUSE00000599570"; exon_version "2"; tss_id "TSS44416"; p_id "P29601";
4	ensembl	CDS	123471007	123476337	.	-	0	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000097897"; transcript_version "5"; exon_number "36"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-202"; transcript_source "ensembl"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS57295"; protein_id "ENSMUSP00000095507"; protein_version "4"; tss_id "TSS44416"; p_id "P29601";


4	havana	exon	123457787	123458011	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000147030"; transcript_version "1"; exon_number "38"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-005"; transcript_source "havana"; transcript_biotype "protein_coding"; exon_id "ENSMUSE00001013689"; exon_version "1"; tag "cds_end_NF"; tag "mRNA_end_NF"; tss_id "TSS44426"; p_id "P29599";
4	havana	CDS	123457787	123458011	.	-	0	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000147030"; transcript_version "1"; exon_number "38"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-005"; transcript_source "havana"; transcript_biotype "protein_coding"; protein_id "ENSMUSP00000123246"; protein_version "1"; tag "cds_end_NF"; tag "mRNA_end_NF"; tss_id "TSS44426"; p_id "P29599";
4	havana	exon	123456349	123456749	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000147030"; transcript_version "1"; exon_number "39"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-005"; transcript_source "havana"; transcript_biotype "protein_coding"; exon_id "ENSMUSE00000795608"; exon_version "1"; tag "cds_end_NF"; tag "mRNA_end_NF"; tss_id "TSS44426"; p_id "P29599";
4	havana	CDS	123456349	123456749	.	-	0	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000147030"; transcript_version "1"; exon_number "39"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-005"; transcript_source "havana"; transcript_biotype "protein_coding"; protein_id "ENSMUSP00000123246"; protein_version "1"; tag "cds_end_NF"; tag "mRNA_end_NF"; tss_id "TSS44426"; p_id "P29599";
4	havana	UTR	123664560	123664752	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000147030"; transcript_version "1"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-005"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "cds_end_NF"; tag "mRNA_end_NF"; tss_id "TSS44426"; p_id "P29599";
4	ensembl_havana	transcript	123538880	123564694	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000146000"; transcript_version "1"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-006"; transcript_source "ensembl_havana"; transcript_biotype "processed_transcript"; tss_id "TSS44427";
4	ensembl_havana	exon	123564463	123564694	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000146000"; transcript_version "1"; exon_number "1"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-006"; transcript_source "ensembl_havana"; transcript_biotype "processed_transcript"; exon_id "ENSMUSE00000778819"; exon_version "1"; tss_id "TSS44427";
4	ensembl_havana	exon	123554304	123554365	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000146000"; transcript_version "1"; exon_number "2"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-006"; transcript_source "ensembl_havana"; transcript_biotype "processed_transcript"; exon_id "ENSMUSE00001304267"; exon_version "1"; tss_id "TSS44427";
4	ensembl_havana	exon	123544742	123544831	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000146000"; transcript_version "1"; exon_number "3"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-006"; transcript_source "ensembl_havana"; transcript_biotype "processed_transcript"; exon_id "ENSMUSE00001212477"; exon_version "1"; tss_id "TSS44427";
4	ensembl_havana	exon	123538880	123539873	.	-	.	gene_id "ENSMUSG00000028649"; gene_version "12"; transcript_id "ENSMUST00000146000"; transcript_version "1"; exon_number "4"; gene_name "Macf1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "Macf1-006"; transcript_source "ensembl_havana"; transcript_biotype "processed_transcript"; exon_id "ENSMUSE00000800532"; exon_version "1"; tss_id "TSS44427";
The question is probably far too specific for anyone on this forum to answer, but this is a Hail Mary.

I normally use DEXSeq, but there are no replicates in this experiment.
I tried MISO but I had a very hard time getting the correct GFF format from the latest (version 77) Ensembl GTF file so I gave up.

MATS is simple to run, and the results I verified are good, except for this annoying hiccup.
blancha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO