Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • incoherent result cufflinks

    I have done a easy test with cufflinks and the result is not coherent with the input files.

    I use strand specific data (input.sam).

    ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
    ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
    ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
    ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
    ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA YccacSccc\ccccXcYScc]a]a\caScc^acccK NM:i:0 NH:i:1 XS:A:-
    ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 NH:i:1 XS:A:-
    ILLUMINA-GA_0000:1:1:1080:14412#0 83 chr3 9861 255 36M = 9828 -69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:+
    ILLUMINA-GA_0000:1:1:1080:14412#0 163 chr3 9828 255 36M = 9861 69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:+
    ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
    ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-
    ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
    ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-
    ILLUMINA-GA_0000:1:1:1080:14412#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA c]\b]JccYcccccPcaRb]]Jb]aZYYaccYaccc NM:i:0 NH:i:1 XS:A:-
    ILLUMINA-GA_0000:1:1:1080:14412#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:3 NH:i:1 XS:A:-


    I sort the file with samtools (input.sort.bam) and use this command for Cufflinks:

    cufflinks --library-type fr-firststrand --min-frags-per-transfrag 3 input.sort.bam

    The result is the file transcript3.gtf, where transcripts have no sense.

    chr3 Cufflinks transcript 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
    chr3 Cufflinks exon 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
    chr3 Cufflinks transcript 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "1157774097.4185056686"; frac "1.000000"; conf_lo "0.000000"; conf_hi "2315548194.837011"; cov "583.518145";
    chr3 Cufflinks exon 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "1157774097.4185056686"; frac "1.000000"; conf_lo "0.000000"; conf_hi "2315548194.837011"; cov "583.518145";


    If I use a different value for --min-frags-per-transfrag parameter, results change to the correct one

    cufflinks --library-type fr-firststrand --min-frags-per-transfrag 1 input.sort.bam

    File transcript1.gtf.


    chr3 Cufflinks transcript 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
    chr3 Cufflinks exon 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
    chr3 Cufflinks transcript 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "289443524.3546264172"; frac "0.250000"; conf_lo "0.000000"; conf_hi "868330573.063879"; cov "145.879536";
    chr3 Cufflinks exon 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "289443524.3546264172"; frac "0.250000"; conf_lo "0.000000"; conf_hi "868330573.063879"; cov "145.879536";
    chr3 Cufflinks transcript 9828 9896 1000 - . gene_id "CUFF.3"; transcript_id "CUFF.3.1"; FPKM "868330573.0638793707"; frac "0.750000"; conf_lo "0.000000"; conf_hi "1870992353.271905"; cov "437.638609";
    chr3 Cufflinks exon 9828 9896 1000 - . gene_id "CUFF.3"; transcript_id "CUFF.3.1"; exon_number "1"; FPKM "868330573.0638793707"; frac "0.750000"; conf_lo "0.000000"; conf_hi "1870992353.271905"; cov "437.638609";


    There is something I have missed?

  • #2
    First thing anyone thinks when seeing a string of Bs for a quality string is that you are using an oldish dataset, and those reads are the worst possible quailty. You appear to be using phred 64 offset quality scaling. So a 'B' is a 66 in ASCII, which makes it a 2 in quality. An 'h' on the other hand, is a 104 in ASCII, which is a quality of 40.

    So try your test on higher quality reads for starters. If the quality string has a lot of h's that's a good quality read.

    Newer runs have phred offsets of 33, so that '#' is the lowest quality, and 'I' is the highest.

    Comment


    • #3
      Thanks for the quick reply.

      I have changed the quality of the reads to 'h' for all the nt but the result is the same.

      INPUT FILE 1
      ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18162#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18162#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18163#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18163#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14414#0 83 chr3 9861 255 36M = 9828 -69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:+
      ILLUMINA+GA_0000:1:1:1080:14414#0 163 chr3 9828 255 36M = 9861 69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:+
      ILLUMINA+GA_0000:1:1:1080:14415#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14415#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14416#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14416#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14417#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14417#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-


      RESULT 1
      chr3 Cufflinks transcript 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
      chr3 Cufflinks exon 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
      chr3 Cufflinks transcript 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "1157774097.4185056686"; frac "1.000000"; conf_lo "0.000000"; conf_hi "2315548194.837011"; cov "583.518145";
      chr3 Cufflinks exon 9828 9896 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "1157774097.4185056686"; frac "1.000000"; conf_lo "0.000000"; conf_hi "2315548194.837011"; cov "583.518145";




      If I change the position of the less abundant fragment, the result is the correct.

      INPUT FILE 2
      ILLUMINA+GA_0000:1:1:1079:18161#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18161#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18162#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18162#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18163#0 99 chr3 1355 255 36M = 1366 47 TTCAGGAGGCCACCCGACATAGTGCAGAGAAGCGGA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1079:18163#0 147 chr3 1366 255 36M = 1355 +47 ACCCGACATANTNCAGAGAAGCGGCCCACAATGCGN hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14414#0 83 chr3 2861 255 36M = 9828 -69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:+
      ILLUMINA+GA_0000:1:1:1080:14414#0 163 chr3 2828 255 36M = 9861 69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:+
      ILLUMINA+GA_0000:1:1:1080:14415#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14415#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14416#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14416#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14417#0 99 chr3 9828 255 36M = 9861 69 CTTCATCCGCCAACTAATATTTCACTTTACATCCAA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-
      ILLUMINA+GA_0000:1:1:1080:14417#0 147 chr3 9861 255 36M = 9828 +69 NGTCATTATTGGCTCAACTTTCCNCNCTATCTGCTT hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:1 XS:A:-

      RESULT 2
      chr3 Cufflinks transcript 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
      chr3 Cufflinks exon 1355 1401 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "8019117703.6926527023"; frac "1.000000"; conf_lo "0.000000"; conf_hi "17278797233.473148"; cov "4041.635323";
      chr3 Cufflinks transcript 9828 9896 1000 - . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "868330573.0638793707"; frac "1.000000"; conf_lo "0.000000"; conf_hi "1870992353.271905"; cov "437.638609";
      chr3 Cufflinks exon 9828 9896 1000 - . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "868330573.0638793707"; frac "1.000000"; conf_lo "0.000000"; conf_hi "1870992353.271905"; cov "437.638609";

      In both runs I have been using the same parameters for cufflinks:

      cufflinks --library-type fr-firststrand -F 0.01 --min-frags-per-transfrag 2 input.sort.bam

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X