Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
MaSuRCA error bsp017 Illumina/Solexa 18 05-15-2016 10:23 PM
masurca new run, old results.. susanklein Bioinformatics 0 05-28-2014 05:33 PM
MaSurCa Assembler Coverage. J.David General 0 01-21-2014 09:57 AM
Tophat2 ignoring -o and always writing to ./tophat_out afkoeppel Bioinformatics 2 02-11-2013 07:37 AM
Tophat ignoring '--max-multihits' flag? polarise Bioinformatics 3 03-04-2011 11:28 AM

Thread Tools
Old 07-15-2014, 07:57 AM   #1
Location: USA

Join Date: Jun 2012
Posts: 23
Default Masurca ignoring input FASTA

I have been trying to assemble some sequences using Masurca, and it runs to completion, but I recently noticed something amiss.

For input, I have three paired-end Illumina libraries, three single-end Illumina libraries, and a couple hundred individual FASTA sequences.

The FASTA sequences were converted to *.FRG format. The individual sequences range in length from ~150bp to over 40kbp.

The FASTA sequences, almost as a rule, do not seem to be passed through to the final output assemblies. I would understand (although be a bit surprised) if the longer FASTA files could not have sequences added to them by Masurca, but I'm a little disappointed/concerned that known, large, contiguous blocks of sequence are not being passed through to my final assemblies.

Has anyone else run into this issue? I've seen it with v2.0.1.4 and v. 2.2.1. Attempts to contact the program authors haven't worked out to date.

An example config file:
PE= D1 533 105 /ifs/bulk/rdouglas/bowtie1_Bseqs_unaligned_perfect/run01_1_paired_notB73perfect.fastq /ifs/bulk/rdouglas/bowtie1_Bseqs_unaligned_perfect/run01_2_paired_notB73perfect.fastq
PE= D2 747 363 /ifs/bulk/rdouglas/bowtie1_Bseqs_unaligned_perfect/run02_1_paired_notB73perfect.fastq /ifs/bulk/rdouglas/bowtie1_Bseqs_unaligned_perfect/run02_2_paired_notB73perfect.fastq
PE= D3 550 83 /ifs/bulk/rdouglas/Masurca/paired_R1_B_repeat.fastq.gz /ifs/bulk/rdouglas/Masurca/paired_R2_B_repeat.fastq.gz
PE= D4 95 15 /ifs/bulk/rdouglas/bowtie1_Bseqs_unaligned_perfect/unpaired_notB73perfect.fastq
PE= D5 290 20 /ifs/bulk/rdouglas/seq_496/unpaired_output_R1_Q20.fastq
PE= D6 290 20 /ifs/bulk/rdouglas/seq_496/unpaired_output_R2_Q20.fastq
OTHER= /ifs/bulk/rdouglas/Masurca/B_GSS.frg
OTHER= /ifs/bulk/rdouglas/Masurca/nate_ellis_seqs.frg
OTHER= /ifs/bulk/rdouglas/Masurca/theuri_seqs.frg

#this is k-mer size for deBruijn graph values between 25 and 101 are supported, auto will compute the optimal size based on the read data and GC content
#set this to 1 for Illumina-only assemblies and to 0 if you have 2x or more long (Sanger, 454) reads
#this parameter is useful if you have too many jumping library mates. Typically set it to 60 for bacteria and something large (300) for mammals
#these are the additional parameters to Celera Assembler.  do not worry about performance, number or processors or batch sizes -- these are computed automatically. for mammals do not set cgwErrorRate above 0.15!!!
CA_PARAMETERS = ovlMerSize=30 cgwErrorRate=0.25 ovlMemory=4GB
#minimum count k-mers used in error correction 1 means all k-mers are used.  one can increase to 2 if coverage >100
#auto-detected number of cpus to use
#this is mandatory jellyfish hash size
#this specifies if we do (1) or do not (0) want to trim long runs of homopolymers (e.g. GGGGGGGG) from 3' read ends, use it for high GC genomes
rndouglas is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:37 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO