Seqanswers Leaderboard Ad

**mallela** · 03-15-2014, 02:20 AM

Hi Starr,

Did you get to solve the issue. I am also encountering the same problem.
The system is killing the process at 99% exactly at the location that you mentioned.

PHP Code:


Testing for differential expression and regulation in locus.

> Processing Locus chr7:141695678-142985271    [************************ ]  99%Killed

I couldn't find any fix for this until now

Thanks!

**dpryan** · 03-15-2014, 06:21 AM

Your operating system is killing the program, look in the system logs for why. It's likely that you're running out of memory.

**mallela** · 03-15-2014, 06:45 AM

Hi Devon,

Thanks for the hint, I've checked my syslogs but I didn't find anything related to the crash. In anycase, I am running on a system with 24 cores and 128GB RAM. When I encountered this error, I restarted the process with only 3 BAM files per conditions (each ~500 MB). Here is my command for reference. Do you see anything unusual ??

I've also tried to run the cuffdiff in the verbose mode and it is just stuck at this location since a long time ! And I don't know what's happening.

PHP Code:


Reduced 1 frags to 1 (0.000000 percent)

Reduced 12 frags to 12 (0.000000 percent)

Reduced 11 frags to 11 (0.000000 percent)

Testing for differential expression and regulation in locus [chrY:59213961-59276439]

Testing for differential expression and regulation in locus [chrY:15262680-15592550]

I've also checked my the memory consumption but it seems to occupy only ~40% of the available.

PHP Code:


 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 1389 xxxxx   20   0 98.4g  97g 2628 S  200 38.8   4614:57 cuffdiff

Many Thanks,
Nikhil.

Here is my run command:

PHP Code:


~/bin/tophat-cufflinks/cuffdiff \

--library-type fr-secondstrand \

--dispersion-method per-condition \

-b ~/bin/tophat-cufflinks/indexes-bowtie/hg19_c.fa \

-u ~/bin/tophat-cufflinks/indexes-transcriptome/hg19_all_mrna_chr1-22XYM.gtf \

-o diff_out_17.vs.18 \

-v -p 8 -L SP,nonSP \

./Library17/th_FC2_lane2/accepted_hits.bam,\

./Library17/th_FC2_lane3/accepted_hits.bam,\

./Library17/th_FC2_lane5/accepted_hits.bam \

\

./Library18/th_FC2_lane3/accepted_hits.bam,\

./Library18/th_FC2_lane4/accepted_hits.bam,\

./Library18/th_FC2_lane5/accepted_hits.bam

**dpryan** · 03-15-2014, 06:49 AM

I guess a system with 128GB of RAM is unlikely to get maxed out by cufflinks

I don't see anything obviously wrong there, unfortunately. If nothing else, try masking that region (add it to a mask.gtf file and then specify that with the -M option).

**mallela** · 03-17-2014, 05:47 AM

Well, according to the post in cufflinks[1], I tried masking (-M option) all the repeat regions with the rmsk table obtained from the UCSC & combined with the chrM from the mRNA table of UCSC . But no success, it still hangs at 99%. I am now run out of options.

Masking the regions where it hangs ? This seems illogical to me, cause it hangs at different regions for different runs !

It should be noted that my problem is not with cufflinks, but is with the "cuffdiff".

Ref:
1. http://cufflinks.cbcb.umd.edu/faq.html#h99p

**dpryan** · 03-17-2014, 06:18 AM

Ah, I was under the impression that it was just hanging in one specific place. You might try the cufflinks google group, since you're more likely to get feedback from the original authors there.

**LaurenLaboss** · 05-09-2016, 07:08 AM

same problem here

Have you guys had any luck? Both of my cuffdiff jobs are stuck at 99% and have been for 36-48hours now. The process isn't terminating or giving me an error message and there is plenty of memory available.

Here's the command I used:
cuffdiff -b index_mouse.fa -p 20 -L M1,M2 -max-bundle-frags 500000 merged_asm/merged.gtf ./M1_1_3.th/accepted_hits.bam,./M1_2_13.th/accepted_hits.bam,./M1_2_14.th/accepted_hits.bam,./M1_2_15.th/accepted_hits.bam,./M1_2_17.th/accepted_hits.bam ./M2_1_2.th/accepted_hits.bam,./M2_2_8.th/accepted_hits.bam,./M2_2_23.th/accepted_hits.bam,./M2_2_24.th/accepted_hits.bam,./M2_2_26.th/accepted_hits.bam

**mallela** · 05-09-2016, 07:15 AM

The cuffdiff being stuck at 99% is probably because of the duplicates in the bam file. When I marked the optical duplicates / sequence duplicates using the picard tools and then input the resultant bam files into cuffdiff, I didn't see this problem again.

**LaurenLaboss** · 05-09-2016, 07:23 AM

Would you mind telling me exactly what that means? I'm brand new to data analysis, so this entire pipeline has been new to me. I'm looking at this site (https://broadinstitute.github.io/pic...-overview.html) to try and figure out what you mean by optical/sequence duplicates.

Is this a command I run prior to cuffdiff to remove and/or mark the duplicates?

**mallela** · 05-09-2016, 07:37 AM

Forget about the "optical/sequence" terminology. I was just being a little descriptive.
Just read the description on the site for MarkDuplicates and tailor your command according to the read names in your bam file. That's it.

For example if your read name is 611_653_1319 then the READ_NAME_REGEX=([0-9]+)_([0-9]+)_([0-9]+)

Read the description here, you will know what I mean.

Tool documentation

https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates

And if picard is throwing out some warning and halting the process, use the option VALIDATION_STRINGENCY=SILENT
But its good to know why it is throwing those warnings and one should know if those warnings are okay to be ignored. For rest of the options, I usually go with the defaults.

**LaurenLaboss** · 05-09-2016, 08:05 AM

So, I think I understand the use of this command, but I don't know what the read name is. When I try to look in my bam file, it comes back as a bunch of symbols

**mallela** · 05-09-2016, 08:11 AM

can you paste me the output of the following command ? I will then have a better understanding of what you mean.

samtools view accepted_hits.bam | head | cut -f1

**LaurenLaboss** · 05-09-2016, 09:58 AM

Wow, I didn't realize I could view the sample with samtools so thank you for pointing that out! When I use that command, this is the output

D00356:163:C905VANXX:2:2104:15983:9631
D00356:163:C905VANXX:2:2108:20725:65407
D00356:163:C905VANXX:2:1311:11892:9460
D00356:163:C905VANXX:2:1303:17471:73446
D00356:163:C905VANXX:2:2301:2337:29197
D00356:163:C905VANXX:2:2304:6530:9328
D00356:163:C905VANXX:2:2201:5062:22265
D00356:163:C905VANXX:2:2106:9823:26894
D00356:163:C905VANXX:2:1303:17766:73767
D00356:163:C905VANXX:2:1102:10798:24451

So how would I formulate the MarkDuplicates command line with this information? Would it be something like:

java -jar picard.jar MarkDuplicates I=accepted_hits.bam O=marked_duplicates.bam M=marked_dup_metrics.txt

I'm not really sure where the Read_Name_Regex fits in. Thank you so much for all of your help!

**mallela** · 05-09-2016, 10:14 AM

These are your read names. If one has to write a regex pattern to match such names, it would be this:

Code:

"([A-Z0-9]+):([0-9]+):([A-Z0-9]+):([0-9]+):([0-9]+):([0-9]+):([0-9]+)"

Your picard command will then be this. Hope it works

Code:

java -Xmx4G -jar picard.jar MarkDuplicates \
I=accepted_hits.bam \
O=accepted_hits_md.bam \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=SILENT \
M=accepted_hits_md.txt \
READ_NAME_REGEX="([A-Z0-9]+):([0-9]+):([A-Z0-9]+):([0-9]+):([0-9]+):([0-9]+):([0-9]+)"

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

CuffDiff Threads vs Processes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News