SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
Beg for latest version of SOAPdenovo correction tool before assembly zhongj Illumina/Solexa 2 02-26-2012 06:56 PM
PerM is an ultra-fast and sensitive SOLiD reads mapping tool KevinLam Bioinformatics 7 06-18-2010 04:03 AM
Fast and accurate long read alignment with Burrows-Wheeler transform. nilshomer Literature Watch 1 01-28-2010 10:38 PM
BFAST and read error correction (with SAET or similar tool) javijevi Bioinformatics 4 01-27-2010 01:46 PM

Reply
 
Thread Tools
Old 10-11-2012, 01:22 AM   #21
LSC
Member
 
Location: stanford

Join Date: Jul 2012
Posts: 24
Default

Sorry of the incompleteness of the website. I am currently pulled into an emergency project so that I have to postpone the release of the documentation. I hope I could have time to finish the manual in a week or so. The paper is on the homepage now. Sorry for the inconvenience again.

Quote:
Originally Posted by kmcarr View Post
LSC,

I would really like to use your software but it is extremely difficult to do so given the complete lack of documentation. The 'How it works?', 'Tutorial', 'Manual' and 'Filters' links are website are dead links. The 'FAQ' has just one line referring to SpliceMap. There isn't even a README file. Yes I can run the program but without any documentation I have know idea whether my results are correct or meaningful.

I installed LSC and ran it against a PacBio long read data set consisting of 100,000 reads, totaling 38Mbp. My short read set are 20 million, 100bp Illumina reads. I ran the program with default parameters and the output generated is 3 files, full_LR_SR.map.fa, uncorrected_LR_SR.map.fa and corrected_LR_SR.map.fa. Each file contains ~30,000 reads; the full file contains ~15Mbp and the other two each ~8Mbp.

What am I to make of these files? Does this output sound normal? Which output file is useful for further analysis?
LSC is offline   Reply With Quote
Old 12-31-2012, 10:47 AM   #22
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

Has anyone had any success in running LSC to correct pacbio data arising from Gb genomes? I am currently using it to try and correct a 6x coverage of >2Gb genome with 30X SR data. At the moment it is in the alignment stage with 40 CPU but finding it difficult to gauge how long the alignment could take.
SLB is offline   Reply With Quote
Old 12-31-2012, 10:00 PM   #23
Boonie
Junior Member
 
Location: Memphis

Join Date: Mar 2009
Posts: 6
Default LSC: beware the dinucleotide repeats

I am working with a ~1Gb genome and using 40X coverage of mer-trimmed Illumina reads. A test run on 100Mb of PacBio sequence took almost 10 days to complete on 40 cpus. As you know, LSC sorts the Illumina reads by sequence, then normalizes the data with "uniq", then splits the reads into several SR.fa.*.cps files according to the number of cpus. Each sub-file is aligned to the PacBio reads in parallel. What I learned in this test run was that 'sort' grouped reads that contained classes of dinucleotide repeats. Thus the split resulted in a few sub-files that were quite rich in CA repeats, GT repeats, etc. Those files required a few more days to complete the Novoalign step while the rest of the cpus sat idle.

Next time, I would run a small test set of PacBio reads with SR_uniq.fa and copy the .cps subfiles to a new directory as soon as they are produced, then terminate runLSC. Let's say, hypothetically, that I used 48 cpus and sort/uniq/split resulted in four files that were rich in CA, GT, CT, and GA repeats. I would cat the 44 non-repetitive files then re-split into 48 subfiles. Then I'd split each of the four repeat-rich files into 48 subfiles and add them to the non-repetitive files. I'd cat these into a single, new SR_uniq.fa file. The result should be that when LSC runs afresh on the new SR_uniq.fa, the repetitive reads would be distributed evenly among the 48 subfiles. That approach is only a rough estimate of where the repetitive sequences exist in the original file, and is also inelegant due to lack of programming skill but perhaps someone more skilled could find a way to automate the process.
Boonie is offline   Reply With Quote
Old 01-01-2013, 02:54 PM   #24
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

Thanks for the information. I would be interested to know how you get on with your second attempt. Out of interest, did the nature of your data set allow you to evaluate the corrected reads from your first test?
SLB is offline   Reply With Quote
Old 01-18-2013, 06:02 AM   #25
juassis
Bioinformatician
 
Location: Brazil

Join Date: Dec 2012
Posts: 8
Default

Hello!
The names of PacBio long reads must be in the format of the following example: ">m111006_202713_42141_c100202382555500000315044810141104_s1_p0/16/3441_3479".
The last two numbers (3441 and 3479 in this example) are the positions of the sub reads.

However, my new data PacBio, doesn't contain, the last two numbers.

ex:
>m120627_142215_42149_c100335932550000001523020209201251_s1_p0/7
>m120627_142215_42149_c100335932550000001523020209201251_s1_p0/9

How can I get the last two numbers (3441 and 3479 in this example) are the positions of the sub reads?
thanks
juassis is offline   Reply With Quote
Old 01-24-2013, 04:22 AM   #26
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Reads in the correct form for LSC are the result of filtering and trimming by smrtpipe. Your read IDs look like those from raw reads before filtering.

Last edited by flxlex; 01-24-2013 at 04:23 AM. Reason: Clarification
flxlex is offline   Reply With Quote
Old 02-13-2013, 11:42 AM   #27
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

Has anyone experienced the following error when getting to the writetmp.py stage of the pipeline.

Traceback (most recent call last):
File "/home/stby/bin/writetmp.py", line 57, in ?
SR_cps_dict[readname] = line.strip()
MemoryError


I do have over 400Gb of memory available.
SLB is offline   Reply With Quote
Old 02-18-2013, 12:12 PM   #28
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

Quote:
Originally Posted by SLB View Post
Has anyone experienced the following error when getting to the writetmp.py stage of the pipeline.

Traceback (most recent call last):
File "/home/stby/bin/writetmp.py", line 57, in ?
SR_cps_dict[readname] = line.strip()
MemoryError


I do have over 400Gb of memory available.
Problem solved.. It was an issue with python version. Although I had specified a newer installation of python in the runLSC.py script, when it called the writetmp.py script the default python path pointed to an older version. Something to bear in mind if there is multiple installations of python on your system.
SLB is offline   Reply With Quote
Old 03-20-2013, 01:16 PM   #29
weijenc
Junior Member
 
Location: NY, USA

Join Date: Aug 2012
Posts: 7
Default Paired-End files

Hello,

I can't seem to find the instruction for Illumina paired-end reads. Should I first combine the two files, or there's a way to write both files in the .cfg file?

Also, is Novoalign still required to run LSC?

Thanks,


WJ
weijenc is offline   Reply With Quote
Old 03-25-2013, 11:23 AM   #30
joxcargator73
Member
 
Location: Gainesville

Join Date: Dec 2012
Posts: 28
Default

I wonder if there is a version LSC for Mac users.
Thanks
joxcargator73 is offline   Reply With Quote
Old 03-25-2013, 02:06 PM   #31
mjhsieh
Junior Member
 
Location: USA

Join Date: Jan 2013
Posts: 9
Default

Quote:
Originally Posted by joxcargator73 View Post
I wonder if there is a version LSC for Mac users.
Thanks
It should be runnable (through Terminal.app of course) after some minor modifications.
mjhsieh is offline   Reply With Quote
Old 03-26-2013, 08:13 AM   #32
joxcargator73
Member
 
Location: Gainesville

Join Date: Dec 2012
Posts: 28
Default

I am using pacbio and illumina in a Novo-assembly. I am using the LSC to correct PacBio reads. Trying to edit the run.cfg. I wonder if somebody could clarify two options in the file:
lenght of pseudochromosomes (lpseudochr)
and length of gap sequence between long reads (LgapInpseudochr)?
Thanks
joxcargator73 is offline   Reply With Quote
Old 04-30-2013, 01:07 AM   #33
cemonat
Junior Member
 
Location: Montpellier

Join Date: Apr 2013
Posts: 2
Default

Hi,
I'm using pacBio and illumina, and want to correct pacBio with LSC.
I create a WkgDir folder where I placed a bin folder with all binaries, and a data folder with my LR and SR sequences, I added run.cfg in WkgDir with the modified path.
I then launched the following command:

Quote:
/ home / cecmonat / sources / LSC / runLSC.py run.cfg
and here are the errors that are returned :

Quote:
['LR_pathfilename ', ' /data/projects/assembling-glab/PacBio_test/C2/filtered_subreads_C2.fasta']
['SR_pathfilename ', ' /data/projects/assembling-glab/SEQUENCES/TOG5681Clean/tog5681Clean_all.fasta']
['I_nonredundant ', ' N']
['Nthread1 ', ' 12']
['Nthread2 ', ' 12']
['temp_foldername ', ' /data/projects/assembling-glab/LSC_temp']
['output_foldername ', ' /data/projects/assembling-glab/LSC_out']
['Lpseudochr ', ' 50000000']
['LgapInpseudochr ', ' 100']
['I_RemoveBothTails ', ' Y']
['MinNumberofNonN ', ' 39']
['MaxN ', ' 1']
=== sort and uniq SR data ===
0:00:00.038417
===split SR:===
0:00:00.047578
===compress SR.aa:===
Traceback (most recent call last):
File "/home/cecmonat/sources/LSC/compressFASTA.py", line 52, in <module>
inseq=open(inseq_filename,'r')
IOError: [Errno 2] No such file or directory: '/data/projects/assembling-glab/LSC_temp/SR.fa.aa'
Traceback (most recent call last):
File "/home/cecmonat/sources/LSC/compressFASTA.py", line 52, in <module>
Traceback (most recent call last):
File "/home/cecmonat/sources/LSC/compressFASTA.py", line 52, in <module>
Traceback (most recent call last):
File "/home/cecmonat/sources/LSC/compressFASTA.py", line 52, in <module>
Traceback (most recent call last):
File "/home/cecmonat/sources/LSC/compressFASTA.py", line 52, in <module>
Traceback (most recent call last):
File "/home/cecmonat/sources/LSC/compressFASTA.py", line 52, in <module>
inseq=open(inseq_filename,'r')
inseq=open(inseq_filename,'r')
IOError: [Errno 2] No such file or directory: '/data/projects/assembling-glab/LSC_temp/SR.fa.ae'
inseq=open(inseq_filename,'r')
IOErrorIOError: inseq=open(inseq_filename,'r')
: [Errno 2] No such file or directory: '/data/projects/assembling-glab/LSC_temp/SR.fa.ac'[Errno 2] No such file or directory: '/data/projects/assembling-glab/LSC_temp/SR.fa.aj'
I do not know what the problem is ...
In the folder LSC_temp/ files there:

Quote:
LR.fa.cps SR.fa.ag.cps.convertNAV.log
LR.fa.idx SR.fa.ah.cps.convertNAV.log
LR.fa.readname SR.fa.ai.cps.convertNAV.log
Notwotails_filtered_subreads_C2.fasta_intact_MS SR.fa.aj.cps.convertNAV.log
SR.fa.aa.cps.convertNAV.log SR.fa.ak.cps.convertNAV.log
SR.fa.ab.cps.convertNAV.log SR.fa.al.cps.convertNAV.log
SR.fa.ac.cps.convertNAV.log SR.fa.cps
SR.fa.ad.cps.convertNAV.log SR.fa.idx
SR.fa.ae.cps.convertNAV.log SR_uniq.fa
SR.fa.af.cps.convertNAV.log
but I do not know why there is an extension "cps.convertNAV.log" ... ?
Any idea ?
Thanks
cemonat is offline   Reply With Quote
Old 05-17-2013, 06:35 AM   #34
ashNZ
Member
 
Location: Minas Gerais, Brazil

Join Date: May 2013
Posts: 11
Question

Gday all,

I am trying to run LSC to improve my PacBio data to use for a hybrid de novo assembly of a cattle genome, however I am having trouble getting the program to run!
I am running Ubuntu 12.04, have installed Python 2.6 and when I go to run the program as per step 4 of the tutorial instructions on the LSC website I am getting this error:

bash: ./bin/runLSC.py: Permission denied

Anyone know why this is and how I can fix it??

Cheers
Ash
ashNZ is offline   Reply With Quote
Old 05-17-2013, 07:12 AM   #35
winsettz
Member
 
Location: US

Join Date: Sep 2012
Posts: 91
Default

ash,

The .py files don't have executable permissions.

For example, I use:

Code:
python /data4/Programs/LSC/LSC_023/runLSC.py run.cfg
I've yet to try setting it up with executable privileges, but that would be a

Code:
chmod u+x runLSC.py
to add (+) to the user (u) executable privileges (x).

I found an interesting indentation related error in LSC 023
Code:
def GetPathAndName(pathfilename):
    ls=pathfilename.split('/')
    filename=ls[-1]
    path='/'.join(ls[0:-1])+'/'
        if path == "/":
            path = "./"
    return path, filename
, which I had to fix by revising the indents to

Code:
def GetPathAndName(pathfilename):
    ls=pathfilename.split('/')
    filename=ls[-1]
    path='/'.join(ls[0:-1])+'/'
    if path == "/":
     path = "./"
    return path, filename
I get the feeling I'm the only one this happened to, but I'm just putting it out there.

It's worth noting that all the other python files have permission issues. You can give user-executable permissions to all the python files in LSC with

Code:
chmod u+x *.py

Last edited by winsettz; 05-17-2013 at 07:31 AM.
winsettz is offline   Reply With Quote
Old 05-20-2013, 09:17 AM   #36
ashNZ
Member
 
Location: Minas Gerais, Brazil

Join Date: May 2013
Posts: 11
Default

Cheers for your answer, it was very helpful. I've run into another problem I haven't been able to work out. After checking the parameters of my config file to make sure that all of the file paths are correct etc... when I run the program I am getting an error saying the program cannot find my .fa data files. It looks like this:

=== sort and uniq SR data ===
awk: cmd. line:1: fatal: cannot open file '/home/Bioinformatics/LSC/example/data/SR.fa' for reading (No such file or directory)
0:00:00.025276

I'm a little confused by this as the SR.fa file is at the path specified! Anyone know what the problem might be here?

Cheers
Ash
ashNZ is offline   Reply With Quote
Old 05-20-2013, 09:37 AM   #37
winsettz
Member
 
Location: US

Join Date: Sep 2012
Posts: 91
Default

What are the permissions of your SR.fa file? It's possible you may have stripped read permissions (-wx------). Unlikely, but...
winsettz is offline   Reply With Quote
Old 05-20-2013, 11:46 AM   #38
ashNZ
Member
 
Location: Minas Gerais, Brazil

Join Date: May 2013
Posts: 11
Default Not sure....

Quote:
Originally Posted by winsettz View Post
What are the permissions of your SR.fa file? It's possible you may have stripped read permissions (-wx------). Unlikely, but...
I'm not sure... how can I check that? I am just using the example package from the 'tutorial' section of the LSC website
ashNZ is offline   Reply With Quote
Old 05-20-2013, 12:53 PM   #39
winsettz
Member
 
Location: US

Join Date: Sep 2012
Posts: 91
Default

Quote:
===convertNAV SR.aa.cps.nav:===
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
start_pt = int(line_list[3])
IndexError: list index out of range
start_pt = int(line_list[3])
IndexError: list index out of range
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
start_pt = int(line_list[3])
start_pt = int(line_list[3])
start_pt = int(line_list[3])
IndexError: list index out of range
IndexError: list index out of range
IndexError: list index out of range
start_pt = int(line_list[3])
IndexError: list index out of range
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
start_pt = int(line_list[3])
IndexError: list index out of range
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
start_pt = int(line_list[3])
IndexError: list index out of range
start_pt = int(line_list[3])
IndexError: list index out of range
Traceback (most recent call last):
File "/data4/setoc/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
start_pt = int(line_list[3])
IndexError: list index out of range
Traceback (most recent call last):
File "/data4/setoc/Programs/LSC/LSC_023/convertNAV.py", line 49, in <module>
start_pt = int(line_list[3])
IndexError: list index out of range
start_pt = int(line_list[3])
IndexError: list index out of range
6:12:15.153565
===merge_mapping_file SR.aa.cps.nav.map:===
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/merge_mapping_file.py", line 22, in <module>
LR_SR_mapping_file = open(LR_SR_mapping_filename,'r')
IOError: [Errno 2] No such file or directory: 'temp/SR.fa.aa.cps.nav.map'
6:12:15.434721
rm: cannot remove `temp/SR.fa.aa.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.ab.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.ac.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.ad.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.ae.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.af.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.ag.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.ah.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.ai.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.aj.cps.nav.map': No such file or directory
rm: cannot remove `temp/SR.fa.ak.cps.nav.map': No such file or directory
===split LR_SR.map:===
rm: cannot remove `temp/SR.fa.al.cps.nav.map': No such file or directory
Traceback (most recent call last):
File "/data4/Programs/LSC/LSC_023/runLSC.py", line 326, in <module>
LR_SR_map = open(temp_foldername +"LR_SR.map",'r')
IOError: [Errno 2] No such file or directory: 'temp/LR_SR.map'
Looks like list index out of range borks everything up.
winsettz is offline   Reply With Quote
Old 05-20-2013, 01:02 PM   #40
winsettz
Member
 
Location: US

Join Date: Sep 2012
Posts: 91
Default

to check permissions:

ls -la *.fa

-rw-r--r-- 1 user group 51109952 Aug 13 2012 LR.fa
-rw-r--r-- 1 user group 109888896 Aug 13 2012 SR.fa

Last edited by winsettz; 05-20-2013 at 01:09 PM.
winsettz is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO