Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • michaelleonard
    Junior Member
    • Oct 2015
    • 3

    multi-read-correct not working in Cuffdiff 2.2.1

    Hi all,

    I've posted this to the Cufflinks mailing list as well:


    I'm running cuffdiff v2.2.1 from the binary on both Linux and osx. When I try to use the --multi-read-correct flag, the only difference I see in any output file is the "run.info" where it lists that I used the flag. I can see a second progress bar in the log when it runs with --multi-read-correct, so it looks like cuffdiff recognizes the flag. Otherwise, I can diff any other output with/without multi-read correction and it shows no change. I've tried all the binaries since 2.0.0 and this behavior seems to have started in 2.1.0.

    I've also noticed that multi-reads that map to duplicated genes in the genome lead to an FPKM and count of 0 for each duplicate. I'll post after this with concrete examples.

    Does anyone else notice no change in their output when enabling or disabling "--multi-read-correct" with cuffdiff 2.2.1?

    Michael Leonard
  • michaelleonard
    Junior Member
    • Oct 2015
    • 3

    #2
    I've created a toy genome from Chlamydomonas reinhardtii to demonstrate what I'm seeing in the full genome. I've extracted seven genes of interest with 1kp flanking regions and made each their own contig. I've also extracted the reads mapping to these region and remapped them. The last "contig" was duplicated exactly to demonstrate an exact duplication event. I'm using STAR to map the reads, but I notice similar behavior with tophat. The entire example and all files can be accessed here:


    The following pairs of genes demonstrate my process of debugging. I report the counts and FPKM for the first sample. I also report the raw pileup of reads from bamtools. I see no difference with and without multi-read-correct:


    Biologically distinct genes, should both have FPKMs
    chromosome_test1 Cre01.g004300 count:6239 FPKM:122170 coverage:6443
    chromosome_test2 Cre01.g004500 count:5275 FPKM:118732 coverage:5452



    multi-read-correct seems to work correctly for a small duplicate region
    however, it appears reads mapping to duplicated region aren't counted at all
    97% of the first gene is contained in the second gene (30% global coverage)
    chromosome_test3 Cre17.g707450 count:0 FPKM:0 coverage:1657
    chromosome_test4 Cre07.g333746 count:4629 FPKM:57610 coverage:6373



    Very low count and FPKM for genes with large duplicate region
    blast reports 100% sequence identity with 70% coverage between genes
    chromosome_test5 Cre17.g738650 count:6 FPKM:123.825 coverage:1758
    chromosome_test6 Cre17.g698299 count:225 FPKM:2909 coverage:1965



    Duplicated full gene, expect 0 FPKM for each based on the above
    chromosome_test7 Cre01.g000900.1 count:0 FPKM:0 coverage:1959
    chromosome_test8 Cre01.g000900.2 count:0 FPKM:0 coverage:1959




    It appears that cuffdiff 2.1.0 - 2.2.1 is ignoring duplicated regions completely. I've tried this test on every binary I could run, with and without the multi-read-correct flag:


    The parameters I use for everything are in the "scripts" folder of the zip file above. These are my cuffdiff parameters:

    cuffdiff \
    --labels ${SAMPLE_LABELS} \
    --output-dir . \
    --num-threads 4 \
    --multi-read-correct \
    --max-bundle-frags 1000000000 \
    ${GTF} \
    ${SAMPLE_LIST}

    I'm assuming that multi-read-correct is supposed to place multi-reads in whichever transcript fits the expression model better. With multi-read-correct disabled, I'm also assuming that each duplicate should get half of the reads. Am I correct in these assumptions and is this the desired behavior? I can imagine instances where genes are expressed, but since they are duplicated somewhere else they report 0 expression.

    Michael Leonard

    Comment

    Latest Articles

    Collapse

    • SEQadmin2
      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
      by SEQadmin2


      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

      Here are nine questions we think about, in roughly the order they matter, before...
      06-18-2026, 07:11 AM
    • SEQadmin2
      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
      by SEQadmin2


      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
      ...
      06-02-2026, 10:05 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, Yesterday, 05:37 AM
    0 responses
    6 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-26-2026, 11:10 AM
    0 responses
    16 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-17-2026, 06:09 AM
    0 responses
    51 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-09-2026, 11:58 AM
    0 responses
    110 views
    0 reactions
    Last Post SEQadmin2  
    Working...