Unconfigured Ad

**michaelleonard** · 10-26-2015, 12:36 PM

I've created a toy genome from Chlamydomonas reinhardtii to demonstrate what I'm seeing in the full genome. I've extracted seven genes of interest with 1kp flanking regions and made each their own contig. I've also extracted the reads mapping to these region and remapped them. The last "contig" was duplicated exactly to demonstrate an exact duplication event. I'm using STAR to map the reads, but I notice similar behavior with tophat. The entire example and all files can be accessed here:

Dropbox - Error - Simplify your life

https://www.dropbox.com/s/vw3ydk6kk6esp14/cuffdiff_multireads.zip?dl=0

The following pairs of genes demonstrate my process of debugging. I report the counts and FPKM for the first sample. I also report the raw pileup of reads from bamtools. I see no difference with and without multi-read-correct:

Biologically distinct genes, should both have FPKMs
chromosome_test1 Cre01.g004300 count:6239 FPKM:122170 coverage:6443
chromosome_test2 Cre01.g004500 count:5275 FPKM:118732 coverage:5452

http://i.imgur.com/WS7wzn2.png

http://i.imgur.com/Yjq3Txp.png

multi-read-correct seems to work correctly for a small duplicate region
however, it appears reads mapping to duplicated region aren't counted at all
97% of the first gene is contained in the second gene (30% global coverage)
chromosome_test3 Cre17.g707450 count:0 FPKM:0 coverage:1657
chromosome_test4 Cre07.g333746 count:4629 FPKM:57610 coverage:6373

http://i.imgur.com/2anZg5k.png

http://i.imgur.com/Dk2wKEQ.png

Very low count and FPKM for genes with large duplicate region
blast reports 100% sequence identity with 70% coverage between genes
chromosome_test5 Cre17.g738650 count:6 FPKM:123.825 coverage:1758
chromosome_test6 Cre17.g698299 count:225 FPKM:2909 coverage:1965

http://i.imgur.com/mGSAq06.png

http://i.imgur.com/gCGy4b1.png

Duplicated full gene, expect 0 FPKM for each based on the above
chromosome_test7 Cre01.g000900.1 count:0 FPKM:0 coverage:1959
chromosome_test8 Cre01.g000900.2 count:0 FPKM:0 coverage:1959

http://i.imgur.com/wEQd4GC.png

http://i.imgur.com/C27nBne.png

It appears that cuffdiff 2.1.0 - 2.2.1 is ignoring duplicated regions completely. I've tried this test on every binary I could run, with and without the multi-read-correct flag:

Dropbox - Error - Simplify your life

https://www.dropbox.com/s/u5tiosjys04iqtb/cuffdiff_versions.genes.read_group_tracking.xlsx?dl=0

The parameters I use for everything are in the "scripts" folder of the zip file above. These are my cuffdiff parameters:

cuffdiff \
--labels ${SAMPLE_LABELS} \
--output-dir . \
--num-threads 4 \
--multi-read-correct \
--max-bundle-frags 1000000000 \
${GTF} \
${SAMPLE_LIST}

I'm assuming that multi-read-correct is supposed to place multi-reads in whichever transcript fits the expression model better. With multi-read-correct disabled, I'm also assuming that each duplicate should get half of the reads. Am I correct in these assumptions and is this the desired behavior? I can imagine instances where genes are expressed, but since they are duplicated somewhere else they report 0 expression.

Michael Leonard

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Yesterday, 05:37 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 51 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

multi-read-correct not working in Cuffdiff 2.2.1

Comment

Latest Articles

ad_right_rmr

News