I have done a easy test with cufflinks and the result is not coherent with the input files.
I use strand specific data (input.sam).
ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 83 chr3 9861 255 36M = 9828 -69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:+
ILLUMINA-GA_0000:1:1:1080:14412#0 163 chr3 9828 255 36M = 9861 69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:+
ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-
I sort the file with samtools (input.sort.bam) and use this command for Cufflinks:
cufflinks --library-type fr-firststrand --min-frags-per-transfrag 3 input.sort.bam
The result is the file transcript3.gtf, where transcripts have no sense.
chr3 Cufflinks transcript 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
chr3 Cufflinks exon 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
chr3 Cufflinks transcript 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "1157774097.4185056686"; frac "1.000000"; conf_lo "0.000000"; conf_hi "2315548194.837011"; cov "583.518145";
chr3 Cufflinks exon 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "1157774097.4185056686"; frac "1.000000"; conf_lo "0.000000"; conf_hi "2315548194.837011"; cov "583.518145";
If I use a different value for --min-frags-per-transfrag parameter, results change to the correct one
cufflinks --library-type fr-firststrand --min-frags-per-transfrag 1 input.sort.bam
File transcript1.gtf.
chr3 Cufflinks transcript 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
chr3 Cufflinks exon 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
chr3 Cufflinks transcript 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "289443524.3546264172"; frac "0.250000"; conf_lo "0.000000"; conf_hi "868330573.063879"; cov "145.879536";
chr3 Cufflinks exon 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "289443524.3546264172"; frac "0.250000"; conf_lo "0.000000"; conf_hi "868330573.063879"; cov "145.879536";
chr3 Cufflinks transcript 9828 9896 1000 - . gene_id "CUFF.3"; transcript_id "CUFF.3.1"; FPKM "868330573.0638793707"; frac "0.750000"; conf_lo "0.000000"; conf_hi "1870992353.271905"; cov "437.638609";
chr3 Cufflinks exon 9828 9896 1000 - . gene_id "CUFF.3"; transcript_id "CUFF.3.1"; exon_number "1"; FPKM "868330573.0638793707"; frac "0.750000"; conf_lo "0.000000"; conf_hi "1870992353.271905"; cov "437.638609";
There is something I have missed?
I use strand specific data (input.sam).
ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 83 chr3 9861 255 36M = 9828 -69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:+
ILLUMINA-GA_0000:1:1:1080:14412#0 163 chr3 9828 255 36M = 9861 69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:+
ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-
I sort the file with samtools (input.sort.bam) and use this command for Cufflinks:
cufflinks --library-type fr-firststrand --min-frags-per-transfrag 3 input.sort.bam
The result is the file transcript3.gtf, where transcripts have no sense.
chr3 Cufflinks transcript 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
chr3 Cufflinks exon 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
chr3 Cufflinks transcript 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "1157774097.4185056686"; frac "1.000000"; conf_lo "0.000000"; conf_hi "2315548194.837011"; cov "583.518145";
chr3 Cufflinks exon 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "1157774097.4185056686"; frac "1.000000"; conf_lo "0.000000"; conf_hi "2315548194.837011"; cov "583.518145";
If I use a different value for --min-frags-per-transfrag parameter, results change to the correct one
cufflinks --library-type fr-firststrand --min-frags-per-transfrag 1 input.sort.bam
File transcript1.gtf.
chr3 Cufflinks transcript 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
chr3 Cufflinks exon 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
chr3 Cufflinks transcript 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "289443524.3546264172"; frac "0.250000"; conf_lo "0.000000"; conf_hi "868330573.063879"; cov "145.879536";
chr3 Cufflinks exon 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "289443524.3546264172"; frac "0.250000"; conf_lo "0.000000"; conf_hi "868330573.063879"; cov "145.879536";
chr3 Cufflinks transcript 9828 9896 1000 - . gene_id "CUFF.3"; transcript_id "CUFF.3.1"; FPKM "868330573.0638793707"; frac "0.750000"; conf_lo "0.000000"; conf_hi "1870992353.271905"; cov "437.638609";
chr3 Cufflinks exon 9828 9896 1000 - . gene_id "CUFF.3"; transcript_id "CUFF.3.1"; exon_number "1"; FPKM "868330573.0638793707"; frac "0.750000"; conf_lo "0.000000"; conf_hi "1870992353.271905"; cov "437.638609";
There is something I have missed?
Comment