SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks, Cuffquant and Cuffdiff Parameters : multi-read and frag-bias correction vramnarine RNA Sequencing 2 08-05-2014 04:24 PM
Cufflinks or Cuffdiff for frag-bias-correct? Dbuch Bioinformatics 2 05-22-2014 07:20 PM
Cuffdiff multi-protein vs multi-promoter RockChalkJayhawk Bioinformatics 2 03-26-2010 10:26 AM

Reply
 
Thread Tools
Old 10-26-2015, 12:30 PM   #1
michaelleonard
Junior Member
 
Location: Los Angeles

Join Date: Oct 2015
Posts: 3
Default multi-read-correct not working in Cuffdiff 2.2.1

Hi all,

I've posted this to the Cufflinks mailing list as well:
https://groups.google.com/forum/#!to...rs/F9a7Ti-IQIA

I'm running cuffdiff v2.2.1 from the binary on both Linux and osx. When I try to use the --multi-read-correct flag, the only difference I see in any output file is the "run.info" where it lists that I used the flag. I can see a second progress bar in the log when it runs with --multi-read-correct, so it looks like cuffdiff recognizes the flag. Otherwise, I can diff any other output with/without multi-read correction and it shows no change. I've tried all the binaries since 2.0.0 and this behavior seems to have started in 2.1.0.

I've also noticed that multi-reads that map to duplicated genes in the genome lead to an FPKM and count of 0 for each duplicate. I'll post after this with concrete examples.

Does anyone else notice no change in their output when enabling or disabling "--multi-read-correct" with cuffdiff 2.2.1?

Michael Leonard
michaelleonard is offline   Reply With Quote
Old 10-26-2015, 12:36 PM   #2
michaelleonard
Junior Member
 
Location: Los Angeles

Join Date: Oct 2015
Posts: 3
Default

I've created a toy genome from Chlamydomonas reinhardtii to demonstrate what I'm seeing in the full genome. I've extracted seven genes of interest with 1kp flanking regions and made each their own contig. I've also extracted the reads mapping to these region and remapped them. The last "contig" was duplicated exactly to demonstrate an exact duplication event. I'm using STAR to map the reads, but I notice similar behavior with tophat. The entire example and all files can be accessed here:
https://www.dropbox.com/s/vw3ydk6kk6...reads.zip?dl=0

The following pairs of genes demonstrate my process of debugging. I report the counts and FPKM for the first sample. I also report the raw pileup of reads from bamtools. I see no difference with and without multi-read-correct:


Biologically distinct genes, should both have FPKMs
chromosome_test1 Cre01.g004300 count:6239 FPKM:122170 coverage:6443
chromosome_test2 Cre01.g004500 count:5275 FPKM:118732 coverage:5452
http://i.imgur.com/WS7wzn2.png
http://i.imgur.com/Yjq3Txp.png

multi-read-correct seems to work correctly for a small duplicate region
however, it appears reads mapping to duplicated region aren't counted at all
97% of the first gene is contained in the second gene (30% global coverage)
chromosome_test3 Cre17.g707450 count:0 FPKM:0 coverage:1657
chromosome_test4 Cre07.g333746 count:4629 FPKM:57610 coverage:6373
http://i.imgur.com/2anZg5k.png
http://i.imgur.com/Dk2wKEQ.png

Very low count and FPKM for genes with large duplicate region
blast reports 100% sequence identity with 70% coverage between genes
chromosome_test5 Cre17.g738650 count:6 FPKM:123.825 coverage:1758
chromosome_test6 Cre17.g698299 count:225 FPKM:2909 coverage:1965
http://i.imgur.com/mGSAq06.png
http://i.imgur.com/gCGy4b1.png

Duplicated full gene, expect 0 FPKM for each based on the above
chromosome_test7 Cre01.g000900.1 count:0 FPKM:0 coverage:1959
chromosome_test8 Cre01.g000900.2 count:0 FPKM:0 coverage:1959
http://i.imgur.com/wEQd4GC.png
http://i.imgur.com/C27nBne.png


It appears that cuffdiff 2.1.0 - 2.2.1 is ignoring duplicated regions completely. I've tried this test on every binary I could run, with and without the multi-read-correct flag:
https://www.dropbox.com/s/u5tiosjys0...king.xlsx?dl=0

The parameters I use for everything are in the "scripts" folder of the zip file above. These are my cuffdiff parameters:

cuffdiff \
--labels ${SAMPLE_LABELS} \
--output-dir . \
--num-threads 4 \
--multi-read-correct \
--max-bundle-frags 1000000000 \
${GTF} \
${SAMPLE_LIST}

I'm assuming that multi-read-correct is supposed to place multi-reads in whichever transcript fits the expression model better. With multi-read-correct disabled, I'm also assuming that each duplicate should get half of the reads. Am I correct in these assumptions and is this the desired behavior? I can imagine instances where genes are expressed, but since they are duplicated somewhere else they report 0 expression.

Michael Leonard
michaelleonard is offline   Reply With Quote
Reply

Tags
cuffdiff, cufflinks, multi-mapping, multi-read-correct, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:25 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO