Unconfigured Ad

**flxlex** · 07-31-2012, 03:24 AM

I thought to have a look at the paper mentioned as a preprint, but the link (http://www.stanford.edu/~kinfai/LSC/LSC.pdf) returns a 'Page not found' error...

**LSC** · 07-31-2012, 09:56 AM

sorry, the paper is still in review and I just set up the website. I will fix the problem soon.

**ZFHans** · 08-07-2012, 05:55 AM

Hi LSC,

I'm trying to improve a 1.6 GB genome with Pacbio data. Celera read correction is slow and so I welcome your effort. I am trying to run LSC 0.2 but encounter problems. First, in some of the scripts that make up LSC the first line is #!/home/stow/swtree/bin/python2.6 Changing this to #!/usr/bin/python helped to get rid of some error messages.
Secondly, I installed novoalign v2.08 as suggested to do the alignments. In the runLSC.py script the aligner is called with no option for the output format. So novoalign produces their native format. In the next script however, the expected format is, I assume, the SAM format. So I added -o SAM to the option list in line 207 of runLSC.py (also had to add the path to novoalign because it would not run), and this got me to the next problem in convertNAV.py. This script looks at the first character of the line in the nav file at line 78 and line 127 of this script. In my version of the nav file the file header character is @ instead of # so I changed this. Now the desired .map file is produced but with only one column of numbers. I know I have short reads aligned so I think I should have more columns. Could you please comment on this? I paste below an example of my SAM output which is different from the example in your script

Code:

@HD	VN:1.0	SO:unsorted
@PG	ID:novoalign	PN:novoalign	VN:V2.08.02	CL:novoalign -r All -F FA -o SAM -d /mnt/scrap_disk/temp2/pseudochr_LR.fa.cps.nix -f /mnt/scrap_disk/temp2/SR.fa.ai.cps
@SQ	SN:Pac1	AS:pseudochr_LR.fa.cps.nix	LN:50000716
@SQ	SN:Pac2	AS:pseudochr_LR.fa.cps.nix	LN:50000772
@SQ	SN:Pac3	AS:pseudochr_LR.fa.cps.nix	LN:50002188
@SQ	SN:Pac4	AS:pseudochr_LR.fa.cps.nix	LN:50000094
@SQ	SN:Pac5	AS:pseudochr_LR.fa.cps.nix	LN:50001433
@SQ	SN:Pac6	AS:pseudochr_LR.fa.cps.nix	LN:50001526
@SQ	SN:Pac7	AS:pseudochr_LR.fa.cps.nix	LN:50001210
@SQ	SN:Pac8	AS:pseudochr_LR.fa.cps.nix	LN:50000056
@SQ	SN:Pac9	AS:pseudochr_LR.fa.cps.nix	LN:50000143
@SQ	SN:Pac10	AS:pseudochr_LR.fa.cps.nix	LN:50002588
@SQ	SN:Pac11	AS:pseudochr_LR.fa.cps.nix	LN:50001867
@SQ	SN:Pac12	AS:pseudochr_LR.fa.cps.nix	LN:50000245
@SQ	SN:Pac13	AS:pseudochr_LR.fa.cps.nix	LN:28473695
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6307:2493	16	Pac10	21159148	3	8S67M19S	*	0	0	GAGTATACTCTCATCACATCAGTCAGAGCTGAGAGCTCTGATGAGAGTGACGTCTCAGACAGAGTCAGTGCTCTGATAGCTGACAGTGAGATAG	*	PG:Z:novoalign	AS:i:242	UQ:i:242	NM:i:0	MD:Z:67	CC:Z:Pac2	CP:i:6239884	ZS:Z:R	ZN:i:2	NH:i:2	HI:i:1	IH:i:2
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6307:2493	256	Pac2	6239884	3	14S72M8S	*	0	0	CTATCTCACTGTCAGCTATCAGAGCACTGACTCTGTCTGAGACGTCACTCTCATCAGAGCTCTCAGCTCTGACTGATGTGATGAGAGTATACTC	*	PG:Z:novoalign	AS:i:242	UQ:i:242	NM:i:1	MD:Z:32G39	ZS:Z:R	ZN:i:2	NH:i:2	HI:i:2	IH:i:2
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6754:2491	4	*	0	0	*	*	0	0	CTCTATATCATGACGAGCATGTACTATACATAGCTGTGCAGCATCTAGAGTGTATCAGAGCACACAC	*	PG:Z:novoalign	ZS:Z:NM
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6775:2489	4	*	0	0	*	*	0	0	AGTATATCTAGCATAGCTAGCACTCACTGTCATCTGTCATACATACTATATATATGTATATAGCTCTCTGAGCTAGACTGAGACTCTGATCAGACATCATGTATGAGATGTG	*	PG:Z:novoalign	ZS:Z:NM
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6822:2491	4	*	0	0	*	*	0	0	TGATACTATAGTGAGAGATACTACATGATATCACTGCTCTCTG	*	PG:Z:novoalign	ZS:Z:NM
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:7018:2483	0	Pac10	4706429	2	28M1I16M1I29M1S	*	0	0	ATAGTATCACTGCATACTATCATCTCAGCTGCTCTGCACTGCTGACTGTACTCGCTGCAGTATATCTATGATGTAT	*	PG:Z:novoalign	AS:i:122	UQ:i:122	NM:i:2	MD:Z:73	CC:Z:Pac7	CP:i:31267643	ZS:Z:R	ZN:i:2	NH:i:2	HI:i:1	IH:i:2
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:7018:2483	256	Pac7	31267643	2	6M1I21M1I47M	*	0	0	ATAGTATCACTGCATACTATCATCTCAGCTGCTCTGCACTGCTGACTGTACTCGCTGCAGTATATCTATGATGTAT	*	PG:Z:novoalign	AS:i:122	UQ:i:122	NM:i:3	MD:Z:43G30	ZS:Z:R	ZN:i:2	NH:i:2	HI:i:2	IH:i:2

Could we combine this thread with the same one in the bioinformatics section?

Many thanks, Hans Jansen

**adaptivegenome** · 08-07-2012, 08:10 AM

I see the paper is not available but I clicked "How it Works" and that link is also broken. Can you fix?

Link: http://www.stanford.edu/~kinfai/LSC/LSC_howitworks.html

**LSC** · 08-07-2012, 02:24 PM

Originally posted by adaptivegenome View Post

I see the paper is not available but I clicked "How it Works" and that link is also broken. Can you fix?

Link: http://www.stanford.edu/~kinfai/LSC/LSC_howitworks.html

The paper is in review now (almost the final round of the revision). I will post it as this final revision is submitted. Sorry for the inconvenience.

**LSC** · 08-07-2012, 02:24 PM

Originally posted by ZFHans View Post

Hi LSC,

I'm trying to improve a 1.6 GB genome with Pacbio data. Celera read correction is slow and so I welcome your effort. I am trying to run LSC 0.2 but encounter problems. First, in some of the scripts that make up LSC the first line is #!/home/stow/swtree/bin/python2.6 Changing this to #!/usr/bin/python helped to get rid of some error messages.
Secondly, I installed novoalign v2.08 as suggested to do the alignments. In the runLSC.py script the aligner is called with no option for the output format. So novoalign produces their native format. In the next script however, the expected format is, I assume, the SAM format. So I added -o SAM to the option list in line 207 of runLSC.py (also had to add the path to novoalign because it would not run), and this got me to the next problem in convertNAV.py. This script looks at the first character of the line in the nav file at line 78 and line 127 of this script. In my version of the nav file the file header character is @ instead of # so I changed this. Now the desired .map file is produced but with only one column of numbers. I know I have short reads aligned so I think I should have more columns. Could you please comment on this? I paste below an example of my SAM output which is different from the example in your script

Code:

@HD	VN:1.0	SO:unsorted
@PG	ID:novoalign	PN:novoalign	VN:V2.08.02	CL:novoalign -r All -F FA -o SAM -d /mnt/scrap_disk/temp2/pseudochr_LR.fa.cps.nix -f /mnt/scrap_disk/temp2/SR.fa.ai.cps
@SQ	SN:Pac1	AS:pseudochr_LR.fa.cps.nix	LN:50000716
@SQ	SN:Pac2	AS:pseudochr_LR.fa.cps.nix	LN:50000772
@SQ	SN:Pac3	AS:pseudochr_LR.fa.cps.nix	LN:50002188
@SQ	SN:Pac4	AS:pseudochr_LR.fa.cps.nix	LN:50000094
@SQ	SN:Pac5	AS:pseudochr_LR.fa.cps.nix	LN:50001433
@SQ	SN:Pac6	AS:pseudochr_LR.fa.cps.nix	LN:50001526
@SQ	SN:Pac7	AS:pseudochr_LR.fa.cps.nix	LN:50001210
@SQ	SN:Pac8	AS:pseudochr_LR.fa.cps.nix	LN:50000056
@SQ	SN:Pac9	AS:pseudochr_LR.fa.cps.nix	LN:50000143
@SQ	SN:Pac10	AS:pseudochr_LR.fa.cps.nix	LN:50002588
@SQ	SN:Pac11	AS:pseudochr_LR.fa.cps.nix	LN:50001867
@SQ	SN:Pac12	AS:pseudochr_LR.fa.cps.nix	LN:50000245
@SQ	SN:Pac13	AS:pseudochr_LR.fa.cps.nix	LN:28473695
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6307:2493	16	Pac10	21159148	3	8S67M19S	*	0	0	GAGTATACTCTCATCACATCAGTCAGAGCTGAGAGCTCTGATGAGAGTGACGTCTCAGACAGAGTCAGTGCTCTGATAGCTGACAGTGAGATAG	*	PG:Z:novoalign	AS:i:242	UQ:i:242	NM:i:0	MD:Z:67	CC:Z:Pac2	CP:i:6239884	ZS:Z:R	ZN:i:2	NH:i:2	HI:i:1	IH:i:2
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6307:2493	256	Pac2	6239884	3	14S72M8S	*	0	0	CTATCTCACTGTCAGCTATCAGAGCACTGACTCTGTCTGAGACGTCACTCTCATCAGAGCTCTCAGCTCTGACTGATGTGATGAGAGTATACTC	*	PG:Z:novoalign	AS:i:242	UQ:i:242	NM:i:1	MD:Z:32G39	ZS:Z:R	ZN:i:2	NH:i:2	HI:i:2	IH:i:2
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6754:2491	4	*	0	0	*	*	0	0	CTCTATATCATGACGAGCATGTACTATACATAGCTGTGCAGCATCTAGAGTGTATCAGAGCACACAC	*	PG:Z:novoalign	ZS:Z:NM
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6775:2489	4	*	0	0	*	*	0	0	AGTATATCTAGCATAGCTAGCACTCACTGTCATCTGTCATACATACTATATATATGTATATAGCTCTCTGAGCTAGACTGAGACTCTGATCAGACATCATGTATGAGATGTG	*	PG:Z:novoalign	ZS:Z:NM
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:6822:2491	4	*	0	0	*	*	0	0	TGATACTATAGTGAGAGATACTACATGATATCACTGCTCTCTG	*	PG:Z:novoalign	ZS:Z:NM
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:7018:2483	0	Pac10	4706429	2	28M1I16M1I29M1S	*	0	0	ATAGTATCACTGCATACTATCATCTCAGCTGCTCTGCACTGCTGACTGTACTCGCTGCAGTATATCTATGATGTAT	*	PG:Z:novoalign	AS:i:122	UQ:i:122	NM:i:2	MD:Z:73	CC:Z:Pac7	CP:i:31267643	ZS:Z:R	ZN:i:2	NH:i:2	HI:i:1	IH:i:2
ILLUMINA-52179E:60:FC70G0LAAXX:6:77:7018:2483	256	Pac7	31267643	2	6M1I21M1I47M	*	0	0	ATAGTATCACTGCATACTATCATCTCAGCTGCTCTGCACTGCTGACTGTACTCGCTGCAGTATATCTATGATGTAT	*	PG:Z:novoalign	AS:i:122	UQ:i:122	NM:i:3	MD:Z:43G30	ZS:Z:R	ZN:i:2	NH:i:2	HI:i:2	IH:i:2

Could we combine this thread with the same one in the bioinformatics section?

Many thanks, Hans Jansen

Hi Hans Jansen,
Your feedback is really helpful. although LSC works well in my computer cluster now, I know there may be something wrong when it is applied in some other systems. Your test is a great example for me to find the bug.
1) your change of the python path is correct. I will fix it in the coming version.
2) LSC uses the native output format instead of SAM format in novoalign. Please don't change it. Please try my setting of the original native format again. In addition, if BWA or bowtie2 could output ALL possible mappable hits, LSC would save over 50% of running time (novoalign is somewhat slow) by using them. Do you know any possible way to let BWA and bowtie2 to output all hits (including detailed indel information)?

**ZFHans** · 08-07-2012, 10:32 PM

Hi LSC,

Thanks for your reply. I'll try the native format again, but could you tell which version of novoalign you used. Could it be that novocraft changed something in their native format?

Thanks,

Hans

**LSC** · 08-07-2012, 11:00 PM

Originally posted by ZFHans View Post

Hi LSC,

Thanks for your reply. I'll try the native format again, but could you tell which version of novoalign you used. Could it be that novocraft changed something in their native format?

Thanks,

Hans

novoalign (V2.07.10) works well in LSC.

**ZFHans** · 08-07-2012, 11:24 PM

Hi LSC,

Thanks for your quick reply. If my current run with 2.08 fails I'll try 2.07

In the meantime I looked at the bowtie2 manual http://bowtie-bio.sourceforge.net/bo...all-alignments and found this mode:

-a mode: search for and report all alignments

-a mode is similar to -k mode except that there is no upper limit on the number of alignments Bowtie 2 should report. Alignments are reported in descending order by alignment score. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. See the SAM specification for details.

Some tools are designed with this reporting mode in mind. Bowtie 2 is not! For very large genomes, this mode is very slow.

Is this of any use to you?

Regards,

Hans

**ZFHans** · 08-08-2012, 05:41 AM

Hi LSC,

As it turned out it was my mistake all along. I'm using quake corrected Illumina reads as SR input. The fasta headers of these reads contain spaces, and that was causing problems in convertNAV.py I corrected this by removing the spaces from the headers and now the novocraft native output format is understood by convertNAV.py The script now continues to correct_nonredundant.py but gives then the following error:
Traceback (most recent call last):
File "/usr/local/LSC_0.2/bin/correct_nonredundant.py", line 280, in <module>
n_rep = int(NSR.split('_')[1])
IndexError: list index out of range
This is probably still some problem of too many fields
Could you indicate how the headers of the input files should look like (both the LR and SR)

Many thanks in advance,

Hans

**Tuinhof** · 08-14-2012, 05:53 AM

Hi LSC,

I work together with Hans Jansen, he installed the previous version.
Since it is not possible to go to the pages with tutorial, i was wondering how to install the newest version?
Or can I just copy the adjusted python scripts to the existing folder?

Regards,
Nynke Tuinhof

**LSC** · 08-14-2012, 01:59 PM

Yes, you just need to copy the scripts to overwrite the existing folder

Originally posted by Tuinhof View Post

Hi LSC,

I work together with Hans Jansen, he installed the previous version.
Since it is not possible to go to the pages with tutorial, i was wondering how to install the newest version?
Or can I just copy the adjusted python scripts to the existing folder?

Regards,
Nynke Tuinhof

**shanebrubaker** · 09-14-2012, 01:28 PM

More Info on LSC

Hi, I am also very interested in LSC. I would like to see the paper and manual if they are available.

Does anyone have time comparisons of LSC vs. PacBioToCA vs. SmrtPipe?

**shanebrubaker** · 09-14-2012, 02:08 PM

I also noticed that you say it corrects the reads to a 5% error rate, but the Schatz work seems to mention a 0.1% error rate. Is there a reason for that?

Thanks,
Shane

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

LSC - a fast PacBio long read error correction tool.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News