Seqanswers Leaderboard Ad

**nilshomer** · 11-17-2010, 07:59 PM

Originally posted by ohofmann View Post

Nils, congratulations on getting the publication out!

I'm about to give this a try on an odd data set -- 2kb of genomic sequence at an average (but far from uniform) coverage of around 100.000 X. It's a sequencing mixture, and the lower cutoff of variation we'd like to be able to detect is at around 0.5% (after error correction) or 500 observations.

Other than the biological samples we also have a mix of known genomic frequencies and defined indel regions to optimize parameters. Can you think of a realistic set of starting parameters?

It wasn't designed for such high coverage so all bets are off.

**apratap** · 06-28-2011, 09:44 AM

Hi Nils

Just wondering can SRMA be used for rescuing orphaned reads. So we have a dataset of variable insert library as we are sequencing the 5' and 3' end of transcripts. As a result the distance between the mates( <--- --->) is dependent on the length of transcript. To map the reads initially I am first using Mosaik which i belv does a better job with variable insert mate pair data.

After mapping we still see 40% orphaned reads where one read maps and the other doesn't. I am wondering if SRMA can rescue these reads.

Thanks!
-Abhi

**nilshomer** · 06-29-2011, 08:28 AM

No, SRMA is not for read rescue. It is for re-aligning the reads to create a better consensus.

**apratap** · 06-29-2011, 09:47 AM

Ok good to know. I will start a new thread for my question then.

Best,
-Abhi

**ymc** · 08-03-2012, 01:51 AM

Dead project now? Are there other alternatives that work on the whole genome?

**nilshomer** · 08-13-2012, 06:09 PM

Originally posted by ymc View Post

Dead project now? Are there other alternatives that work on the whole genome?

It's not a dead project, feel free to post questions and bug reports etc.

**adaptivegenome** · 08-14-2012, 04:39 AM

Originally posted by ymc View Post

Dead project now? Are there other alternatives that work on the whole genome?

I have used it and it is fast. I have sometimes had trouble with files in the 100GB range but generally it works fine.

We have also parallelized the GATK implementation of LR if you are interested. I am not sure which is better at realigning. I do remember comparing SRMA and GATK LR and there are differences but it was not clear to me if one was consistently better than the other. I suspect that Nils would be a better source for info on that.

**ymc** · 08-16-2012, 05:11 AM

Tried several bams with 0.1.16 but all I got was this:

at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)

**nilshomer** · 08-16-2012, 06:53 AM

Originally posted by ymc View Post

Tried several bams with 0.1.16 but all I got was this:

at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)
at java.util.ArrayList$SubList.add(ArrayList.java:965)

Could you post the full error message?

**colindaven** · 08-17-2012, 12:00 AM

I have been interested in this tool for some time but never got it working:
Input is a sorted bam.

java -Xmx16g -jar srma-0.1.15.jar I=491_full_s.bam O=srma_491.bam R=../NC_002516.fna
[Fri Aug 17 10:00:54 CEST 2012] srma.SRMA INPUT=[491_full_s.bam] OUTPUT=[srma_491.bam] REFERENCE=../NC_002516.fna OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 CORRECT_BASES=false USE_SEQUENCE_QUALITIES=true QUIET_STDERR=false MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 GRAPH_PRUNING=false NUM_THREADS=1 TMP_DIR=/tmp/colin2 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
java.util.NoSuchElementException
at java.util.Scanner.nextLine(Scanner.java:1503)
at net.sf.picard.reference.FastaSequenceIndex.parseIndexFile(FastaSequenceIndex.java:131)
at net.sf.picard.reference.FastaSequenceIndex.<init>(FastaSequenceIndex.java:55)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:95)
at srma.SRMA.doWork(SRMA.java:131)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
at srma.SRMA.main(SRMA.java:98)
Please report bugs to [email protected]

The fasta index file looks like this:

more ../NC_002516.fna.fai
NC_002516.2 6264404 58 70 71

Cheers for any help.

**ymc** · 08-18-2012, 10:05 AM

There are thousands of lines of these error messages. If I copy the stderr output, it will be too many lines. You can replicate my problem by downloading the pair-ended reads from

ftp://ftp.1000genomes.ebi.ac.uk/vol1...sequence_read/

and then align them using bwa. I got the same bug with SRR098401_*.filt.fastq.gz and SRR035330_*.filt.fastq.gz

**nilshomer** · 08-18-2012, 12:39 PM

Originally posted by colindaven View Post

I have been interested in this tool for some time but never got it working:
Input is a sorted bam.

java -Xmx16g -jar srma-0.1.15.jar I=491_full_s.bam O=srma_491.bam R=../NC_002516.fna
[Fri Aug 17 10:00:54 CEST 2012] srma.SRMA INPUT=[491_full_s.bam] OUTPUT=[srma_491.bam] REFERENCE=../NC_002516.fna OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 CORRECT_BASES=false USE_SEQUENCE_QUALITIES=true QUIET_STDERR=false MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 GRAPH_PRUNING=false NUM_THREADS=1 TMP_DIR=/tmp/colin2 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
java.util.NoSuchElementException
at java.util.Scanner.nextLine(Scanner.java:1503)
at net.sf.picard.reference.FastaSequenceIndex.parseIndexFile(FastaSequenceIndex.java:131)
at net.sf.picard.reference.FastaSequenceIndex.<init>(FastaSequenceIndex.java:55)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:95)
at srma.SRMA.doWork(SRMA.java:131)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
at srma.SRMA.main(SRMA.java:98)
Please report bugs to [email protected]

The fasta index file looks like this:

more ../NC_002516.fna.fai
NC_002516.2 6264404 58 70 71

Cheers for any help.

It looks like your FASTA index is broken. Can you try rebuilding?

Originally posted by ymc View Post

There are thousands of lines of these error messages. If I copy the stderr output, it will be too many lines. You can replicate my problem by downloading the pair-ended reads from

ftp://ftp.1000genomes.ebi.ac.uk/vol1...sequence_read/

and then align them using bwa. I got the same bug with SRR098401_*.filt.fastq.gz and SRR035330_*.filt.fastq.gz

I am sorry, please try reducing your read set or the like to a manageable test case. Otherwise, I charge $5KUSD/hour

**madonjoe** · 04-17-2014, 08:29 AM

Empty VCF file

I tried to use SRMA to realign my reads and did a variant calling. However, after SRMA, which ran fine, I got an empty vcf file. Anything I can do to fix this problem?

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News