SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Reply
 
Thread Tools
Old 02-19-2013, 06:29 AM   #1
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default PBJelly

Hi,

I have SMRTanalysis and PBJelly package installed. I was able to run the PBJelly with test data provided with software. However, when I was running it for a bacterial genome which is (~10 MB in size and ~350 scaffolds with ~3000 gaps) and used pacbio long reads for correction. I got memory correction error as:

Command:
Code:
nohup Jelly.py assembly Protocol.xml -x "--nproc=8" &
Error:
Code:
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7fe21cf7cb96]
make-consensus[0x40ce91]
make-consensus[0x427e89]
make-consensus[0x409d1c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fe21cf1f76d]
make-consensus[0x406439]
======= Memory map: ========
00400000-0047c000 r-xp 00000000 08:21 113928                             /data2/smart/smrtanalysis-1.4.0/analysis/bin/make-consensus
0067b000-0067d000 r--p 0007b000 08:21 113928                             /data2/smart/smrtanalysis-1.4.0/analysis/bin/make-consensus
0067d000-0067e000 rw-p 0007d000 08:21 113928                             /data2/smart/smrtanalysis-1.4.0/analysis/bin/make-consensus
0067e000-0067f000 rw-p 00000000 00:00 0 
01b28000-01b8e000 rw-p 00000000 00:00 0                                  [heap]
7fe21cce8000-7fe21ccfd000 r-xp 00000000 08:01 933932                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe21ccfd000-7fe21cefc000 ---p 00015000 08:01 933932                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe21cefc000-7fe21cefd000 r--p 00014000 08:01 933932                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe21cefd000-7fe21cefe000 rw-p 00015000 08:01 933932                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe21cefe000-7fe21d0b3000 r-xp 00000000 08:01 933906                     /lib/x86_64-linux-gnu/libc-2.15.so
7fe21d0b3000-7fe21d2b2000 ---p 001b5000 08:01 933906                     /lib/x86_64-linux-gnu/libc-2.15.so
7fe21d2b2000-7fe21d2b6000 r--p 001b4000 08:01 933906                     /lib/x86_64-linux-gnu/libc-2.15.so
7fe21d2b6000-7fe21d2b8000 rw-p 001b8000 08:01 933906                     /lib/x86_64-linux-gnu/libc-2.15.so
7fe21d2b8000-7fe21d2bd000 rw-p 00000000 00:00 0 
7fe21d2bd000-7fe21d3b8000 r-xp 00000000 08:01 933918                     /lib/x86_64-linux-gnu/libm-2.15.so
7fe21d3b8000-7fe21d5b7000 ---p 000fb000 08:01 933918                     /lib/x86_64-linux-gnu/libm-2.15.so
7fe21d5b7000-7fe21d5b8000 r--p 000fa000 08:01 933918                     /lib/x86_64-linux-gnu/libm-2.15.so
7fe21d5b8000-7fe21d5b9000 rw-p 000fb000 08:01 933918                     /lib/x86_64-linux-gnu/libm-2.15.so
7fe21d5b9000-7fe21d69b000 r-xp 00000000 08:01 748647                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7fe21d69b000-7fe21d89a000 ---p 000e2000 08:01 748647                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7fe21d89a000-7fe21d8a2000 r--p 000e1000 08:01 748647                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7fe21d8a2000-7fe21d8a4000 rw-p 000e9000 08:01 748647                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7fe21d8a4000-7fe21d8ba000 rw-p 00000000 00:00 0 
7fe21d8c0000-7fe21d8d8000 r-xp 00000000 08:21 119775                     /data2/smart/smrtanalysis-1.4.0/common/lib/libz.so.1
7fe21d8d8000-7fe21dad7000 ---p 00018000 08:21 119775                     /data2/smart/smrtanalysis-1.4.0/common/lib/libz.so.1
7fe21dad7000-7fe21dad8000 r--p 00017000 08:21 119775                     /data2/smart/smrtanalysis-1.4.0/common/lib/libz.so.1
7fe21dad8000-7fe21dad9000 rw-p 00018000 08:21 119775                     /data2/smart/smrtanalysis-1.4.0/common/lib/libz.so.1
7fe21dad9000-7fe21dafb000 r-xp 00000000 08:01 933924                     /lib/x86_64-linux-gnu/ld-2.15.so
7fe21dcf4000-7fe21dcfb000 rw-p 00000000 00:00 0 
7fe21dcfb000-7fe21dcfc000 r--p 00022000 08:01 933924                     /lib/x86_64-linux-gnu/ld-2.15.so
7fe21dcfc000-7fe21dcfe000 rw-p 00023000 08:01 933924                     /lib/x86_64-linux-gnu/ld-2.15.so
7fffcf60a000-7fffcf62c000 rw-p 00000000 00:00 0                          [stack]
7fffcf78b000-7fffcf78c000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
*** glibc detected *** make-consensus: free(): invalid next size (fast): 0x0000000000a0e5f0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7f166a931b96]
make-consensus[0x43e287]
make-consensus[0x43a1fc]
make-consensus[0x409cb1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f166a8d476d]
make-consensus[0x406439]
======= Memory map: ========
Any suggestions to resolve this? I know PBJelly is not best describe to handle the memory but any suggestion regarding memory/parameter optimization will be helpful.

Thanks
sagarutturkar is offline   Reply With Quote
Old 02-26-2013, 07:56 AM   #2
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Well,

Replying to my own thread. It might be useful for others to know. Seems like the PBjelly is not yet optimized to make full use of available memory. There might be some memory leaks in python code which causing the above error.

Despite of the above mentioned errors, the gap filling process was complete and got decent results. If you face same error be patient and let it proceed.

Thanks
sagarutturkar is offline   Reply With Quote
Old 02-27-2013, 12:00 AM   #3
HenrivdGeest
Member
 
Location: Arnhem

Join Date: Feb 2012
Posts: 16
Default

Hi,
I also got this error a few times.
But, its not that its not a problem. The errors you see come from an individual gap closure assembly operation. That one just failed. You can see these failed ones in the assembly.out as well.
I tried different OS systems, including Ubuntu 8, 12, CEntos6, centos5. On none of them I could either get PBjelly working at all, or it failed with these same messages.
Also, I emailed with the author of PBjelly and it seems that make-consensus crashes in 90% of these cases. So, you can try to manually restart those failed jobs. However, that's a hassle in my experience. To be clear, normally >90% of the gap filling assemblies just succeeds, but the one that fails, tend to fail almost each time you re-run pbjelly.

These errors are also causing you program to run for a very long time. Because these assemblies fail abnormally Pbjelly doesn't see them as finished. So it waits 1440 minutes and than these stalled ones are killed. I changed the timeout in the code to 2 minutes, (individual assemblies only take a few seconds in my case) and the whole PB jelly was done in 12 minutes instead of 2 hrs. However, you have to set the timeout back to normal if you are running the mapping/blasr step, these will take more than 2 minutes for sure.
code change:
/PBJelly_12.9.14/CommandRunner.py #line 10 #def exe(cmd, timeout=2): #was 1440 (in minutes)

edit: make-consensus is the one with problems I think, and that's part of the open source AMOS package, not something that belongs to pbjelly.
HenrivdGeest is offline   Reply With Quote
Old 02-27-2013, 06:17 AM   #4
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Quote:
Originally Posted by HenrivdGeest View Post

These errors are also causing you program to run for a very long time. Because these assemblies fail abnormally Pbjelly doesn't see them as finished. So it waits 1440 minutes and than these stalled ones are killed. I changed the timeout in the code to 2 minutes, (individual assemblies only take a few seconds in my case) and the whole PB jelly was done in 12 minutes instead of 2 hrs. However, you have to set the timeout back to normal if you are running the mapping/blasr step, these will take more than 2 minutes for sure.
code change:
/PBJelly_12.9.14/CommandRunner.py #line 10 #def exe(cmd, timeout=2): #was 1440 (in minutes)
Thanks for this suggestion. But I guess PBJelly always run blasr/mapping step in order to fill the gap. So, may not be feasible to change the time.

Last edited by sagarutturkar; 02-27-2013 at 06:19 AM.
sagarutturkar is offline   Reply With Quote
Old 02-27-2013, 06:23 AM   #5
HenrivdGeest
Member
 
Location: Arnhem

Join Date: Feb 2012
Posts: 16
Default

Quote:
Originally Posted by sagarutturkar View Post
Thanks for this suggestion. But I guess PBJelly always run blasr/mapping step in order to fill the gap. So, may not be feasible to change the time.
If you run PBJelly, you run the mapping and the assembly step separately, so you can change the code in between. Or make some if statement in the code itself
HenrivdGeest is offline   Reply With Quote
Old 02-27-2013, 06:28 AM   #6
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Oh yes! That's true. I just made a generic shell script to run PBJelly and forgot about different steps. You are right!
sagarutturkar is offline   Reply With Quote
Old 03-18-2013, 02:40 PM   #7
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default Modified CommandRunner.py worked; what exactly did it do?

This change allowed me to complete a PBJelly run; thanks! I made a copy of the original CommandRunner.py script, as well as a modified copy. Copied the modified version (with 2 minute timeout) over the script before the assembly step, then copied the original back over afterwards.

However, how can I know if my assemblies were behaving like yours, and either succeeding within 2 minutes, or were destined to fail? What were you watching in which output / log file? Or were you tracking individual files? I guess I'm concerned that PBJelly seems to have shrunk a fraction of the scaffolding gaps that I had in a CLC assembly, but not others. I haven't found a gap that's been filled yet with PB sequence, where there were simply N's before. The most common thing I'm seeing is scaffolding gaps that were flanked by low complexity sequence (single base repeats) have been closed to simply consist of the low complexity sequence, without the gap (I'm guessing that PB reads were found to support this). My output.err file says this:

2013-03-18 09:49:00,594 [INFO] Number of Gaps Addressed 824
2013-03-18 09:49:00,594 [INFO] No Filling Metrics 349
2013-03-18 09:49:00,594 [INFO] Filled 473
2013-03-18 09:49:00,595 [INFO] Reduced 2
2013-03-18 09:49:00,595 [INFO] Overfilled 1
2013-03-18 09:49:00,595 [INFO] Loading Fasta/Qual Files
2013-03-18 09:49:17,344 [INFO] Finished Loading Files
2013-03-18 09:49:17,371 [INFO] Closed ref0000087_0_1
2013-03-18 09:49:17,372 [INFO] Didn't Pass Assembly ref0000086_0_1
...


Would what I'm describing above fall into the "Filled" category?? Or "Reduced" (though I saw more than 2 such conversions). So, can I be confident that all gaps have been filled or shrunk that could be? I guess I'm seeing later comments like:


Trimmed 1 from 5' end of ref0000085_5
or
Closed ref0000085_10_11


Do these relate to gap coordinates?? (seems unlikely) Or gap "numbers" assigned by PBJelly?

Any guidance would be appreciated.
~Joe
jnfass is offline   Reply With Quote
Old 03-19-2013, 01:27 AM   #8
HenrivdGeest
Member
 
Location: Arnhem

Join Date: Feb 2012
Posts: 16
Default

I think you should look in the that output log file for "Didn't pass assembly". Those are the assemblies that failed. Could be due to the crash or due to another issue of which I am not aware of At least, that's how I interpreted those messages.
The ones with closed gap have a message like " [INFO] Closed ref0001480_0_1". You could try visualizing that closed gap in some viewer.(not an easy way, map the reads with blasr first on the scaffold, map new contig with blasr and generate a SAM/BAM file out of this.)

You could also try to re-run PBjelly on the output PB jelly generated and see whats comes out. I Asked the author (Adam English) of PBjelly about that and he replied:

So, yes, you can run PBJelly iteratively. However! You should be aware that when you use the same dataset multiple times, you run the risk of a gap that closed one gap could -- somehow -- map elsewhere and close another gap. I have the script exclude.py that removes all gaps that previously supported gaps, but this won't be incredibly helpful to you because you're trying to fix gaps that failed assembly - but may succeed a second time. You see, the reads that support these gaps will be excluded by the script. I have plans of making better read accounting for multiple iterations, but this isn't implemented yet.

I still don't know exactly why this iterative approach wouldn't work, since my initial assembly could easily be equal to one of the PBjelly outputs. The risk of reads that closing multiple gaps is always an issue I would say.

About the 2 minutes time out, its easy to see if that's enough, in top or htop(F5, tree view), you can see the process takes up cpu time, and in my case, it was at zero percent after 20 seconds. If its still at 100% cpu after 2 minutes, clearly you should put the time out longer.
HenrivdGeest is offline   Reply With Quote
Old 05-16-2013, 01:17 PM   #9
ACEnglish
Junior Member
 
Location: Houston, TX

Join Date: Jun 2010
Posts: 6
Default PBJelly Fix

Hello,

The way to fix this glibc 'memory leak' problem is to simply comment change one small piece of the code. When PBJelly 12.9.14 was being developed, blasr couldn't handle IUB nucleotide codes, and PBJelly removed them from the local assembly 'seed' contigs, it didn't remove 'Ns' corresponding quality score. This created a problem -- for example:
@fasta
ATCGGACT
+
)#()@(!)#(!))(0

You can see they're of different lengths and this caused amos to break.

To fix this:
At line 178 in Extraction.py - Change the cleanSequence method so that it doesn't modify what is sent to it.
Code:
    
    def cleanSequence(self,seq):
        """
        Remove IUB characters from the reference since
        we use contigs as part of the input reference for
        the local assembly and blasr doesn't like them
        """
        seq = seq.replace('M','C')
        seq = seq.replace('R','A')
        seq = seq.replace('W','T')
        seq = seq.replace('S','G')
        seq = seq.replace('Y','C')
        seq = seq.replace('K','T')
        seq = seq.replace('V','G')
        seq = seq.replace('H','A')
        return seq  # <- Don't replace the 'N's since blasr is fixed
        seq = seq.replace('N','')
        return seq
Obviously, you could also just delete
Code:
seq = seq.replace('N','')
as well, I just left that in for a point of reference.
I won't be releasing a fix for this issue in PBJelly12.9.14 because I'm about to finish a completely refactored version of PBJelly that doesn't make the same mistake but does fill Inter-Scaffold gaps.

Last edited by ACEnglish; 05-20-2013 at 01:59 PM.
ACEnglish is offline   Reply With Quote
Old 05-29-2013, 07:48 AM   #10
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

Quote:
Originally Posted by ACEnglish View Post
I won't be releasing a fix for this issue in PBJelly12.9.14 because I'm about to finish a completely refactored version of PBJelly that doesn't make the same mistake but does fill Inter-Scaffold gaps.
This sounds very interesting.. do I understand you correctly and PBJelly will then be able to act as a 'scaffolder' using the pacbio reads.

Do you envisage this being released in 2013?
SLB is offline   Reply With Quote
Old 05-29-2013, 08:40 AM   #11
ACEnglish
Junior Member
 
Location: Houston, TX

Join Date: Jun 2010
Posts: 6
Default

Quote:
Originally Posted by SLB View Post
...PBJelly will then be able to act as a 'scaffolder' using the pacbio reads.

Do you envisage this being released in 2013?
Correct. Inter-scaffold gap filling - pretty much a scaffolder. I hope to have it finished by August at the latest.
ACEnglish is offline   Reply With Quote
Old 06-21-2013, 10:05 AM   #12
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

What type of pacbio reads are recommended for PBJelly? So far, I was using native Pacbio files - i.e. f(iltered.subreads.fasta). However, If I use corrected PacBio reads, would that be beneficial or better? Did anybody used corrected reads and got better results?
sagarutturkar is offline   Reply With Quote
Old 06-27-2013, 11:29 PM   #13
HenrivdGeest
Member
 
Location: Arnhem

Join Date: Feb 2012
Posts: 16
Default

I've never tried pbjelly with clean reads. Since most of the time those reads will be shorter, the chance of finding an overlap is also smaller. If there aren't shorter, I would not see any drawback in using cleaned reads.
In case they are shorter, I would suggest pbjelly to fill gaps with raw reads, and run quiver after that to perform base error correction on the new consensus.


The code change helped a lot, now after filling 7500 gaps, I only got a single " Error in `make-consensus': double free or corruption " error. Previously,I got many more.

However I am still wondering if pbjelly could fire up multiple make-consensus threads at once. Now its doing one-by-one, with nproc=n, but that nproc does not seem to make any difference. A cluster would help here, but I think pbjelly could also just fire up multiple gap filling processes. Does anyone know how to do this?
HenrivdGeest is offline   Reply With Quote
Old 07-23-2013, 10:23 AM   #14
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

My first attempt to address 90K gaps wasn't so successful.

2013-07-23 14:26:43,585 [INFO] Number of Gaps Addressed 31998
2013-07-23 14:26:43,585 [INFO] No Filling Metrics 31376
2013-07-23 14:26:43,585 [INFO] Filled 208
2013-07-23 14:26:43,585 [INFO] Reduced 414
2013-07-23 14:26:43,585 [INFO] Overfilled 84

What are the main reasons for
(1) [INFO] Skipping Singleton ref0015196
(2) [INFO] Didn't Address ref0002885_0_1
(3) [INFO] Didn't Pass Assembly ref0002885_3_4

I guess (1) is due to a single pacbio read addressing the gap, but not sure about the other two. Anyone know this?
SLB is offline   Reply With Quote
Old 08-09-2013, 09:15 AM   #15
ACEnglish
Junior Member
 
Location: Houston, TX

Join Date: Jun 2010
Posts: 6
Default

Singleton means a scaffold without a gap.
Didn't address means that no reads mapped in a way that they seem to support the gap (into/over the gap)
Didn't pass assembly means a good assembly couldn't be created.
Those numbers look incredibly depreciated from what to expect.

With 90K gaps, and only 30k addressed, you must have low input pacbio coverage.. Yes? If not, then there's an issue worth investigating.
I would also check the logs of some of the 'didn't pass assembly.' Maybe there are clues as to why so many failed.

PBJelly2 will be out at or around September 13th.
ACEnglish is offline   Reply With Quote
Old 11-13-2013, 02:24 AM   #16
zhoufan
Member
 
Location: china

Join Date: Feb 2009
Posts: 13
Default

Hi,
I am trying Jelly 13.10.22 version on ubuntu 10.04 with smrtanalysis 2.0.1 . I am trying the sample data and the support stage have no result with no error. Has anyone met this kind of situation?
zhoufan is offline   Reply With Quote
Old 11-14-2013, 01:53 AM   #17
HenrivdGeest
Member
 
Location: Arnhem

Join Date: Feb 2012
Posts: 16
Default

Yes,I also encounter this.
I think there is a small bug in the sample scripts in the latest release of pbjelly(13.10.22). The mapping quality of the reads is lower than the default filtering settings of the support stage. If you run the support stage with lower settings it should work:
Jelly.py support Protocol.xml -x "--minMapqv=45" # the qv of mapped reads in the sample = 50.
( I got this more or less from the documentation from the pbjelly website at sourceforge)
HenrivdGeest is offline   Reply With Quote
Old 03-05-2018, 11:48 PM   #18
bio_d
Junior Member
 
Location: USA

Join Date: Oct 2017
Posts: 6
Default Running PBJELLY on SLURM

How do I submit PBJELLY script in a cluster(SLURM)?

I mean slurm doesn't allow job submission at head node whereas PBJELLY says the script will itself submit jobs according to the specifications in the <cluster></cluster> tag of the Protocol.xml file.
bio_d is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO