Seqanswers Leaderboard Ad

**dpryan** · 08-27-2014, 11:55 PM

Looks nice. Out of curiousity, can it handle chimeric (i.e., non-linear) alignments when converting from BAM to fastq? That's been a real weak point of a lot of other tools.

**Brian Bushnell** · 08-28-2014, 08:35 AM

No, the conversion from sam/bam is stateless and 'dumb' - each line in the sam file will generate a FASTQ read, so secondary and chimeric alignments would cause problems.

**dpryan** · 08-28-2014, 11:10 AM

Consider this the first feature request then

**Nanu** · 10-12-2014, 08:33 PM

When I use the command reformat.sh in bbtools package I am getting the following error::
java -ea -Xmx200m -cp /home/himanshu/Downloads/me2/bbmap/current/ jgi.ReformatReads -in=reads.fna qfin=reads.qual out=reads.fasta
Executing jgi.ReformatReads [-in=reads.fna, qfin=reads.qual, out=reads.fasta]

Input is being processed as unpaired
Exception in thread "Thread-1" java.lang.AssertionError
at stream.FastaQualReadInputStream3.makeRead(FastaQualReadInputStream3.java:257)
at stream.FastaQualReadInputStream3.toReadList(FastaQualReadInputStream3.java:147)
at stream.FastaQualReadInputStream3.toReads(FastaQualReadInputStream3.java:113)
at stream.FastaQualReadInputStream3.fillBuffer(FastaQualReadInputStream3.java:97)
at stream.FastaQualReadInputStream3.hasMore(FastaQualReadInputStream3.java:56)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:745)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:737)

Please help me

**Brian Bushnell** · 10-13-2014, 07:38 AM

This was caused by the bases and qualities having different lengths. Either the fna and qual file do not go together, or their order is different, or one of the files is misformatted.

"reads.fna" should already be in fasta format, though, which is what you specified as the output format. If you want fastq, the output filename needs to end with "fastq".

I suggest you post the first sequence in the reads.fna and reads.qual so we can see what's going on.

**Marisa_Miller** · 01-27-2015, 09:43 AM

Hi Brian,

Does the subsampling tool randomly subsample reads? The program I am using now (Geneious) only takes the first specified % of reads and does not randomize.

Thank you,
Marisa

**Brian Bushnell** · 01-27-2015, 10:14 AM

Originally posted by Marisa_Miller View Post

Hi Brian,

Does the subsampling tool randomly subsample reads? The program I am using now (Geneious) only takes the first specified % of reads and does not randomize.

Thank you,
Marisa

Yes, it randomly subsamples. You can set the RNG seed with "sampleseed=NUMBER" if you want deterministic random sampling; by default, the seed is random.

Reformat can also give the first X reads with the "reads=X" flag, or combine the reads=X and samplerate=Y to subsample a fraction of the first X reads, etc.

**jpummil** · 02-24-2015, 11:35 AM

Hey Brian!

So, to subsample a set of PE reads to reduce overall file size (creating quick running data set for a workshop), this would suffice?

reformat.sh in1=x1.fq in2=x2.fq out1=y1.fq out2=y2.fq reads=-1 samplerate=0.1 int=f

It would: Keep parings intact, give me 1/10 the data overall, ensure no interleaving (though I expect assigning the pairings at the beginning would do this as well)

I already did a quick quality trimming with:
reformat.sh in1=x1.fastq in2=x2.fastq out1=y1.fastq out2=y2.fastq outsingle=singletons.fq qtrim
=rl trimq=10 minlength=50

Edit: Seems to have worked!

-rw-rw-r-- 1 jpummil jpummil 2.0G Feb 24 13:06 L001_R1_001_Qt.fastq
-rw-rw-r-- 1 jpummil jpummil 2.0G Feb 24 13:06 L001_R2_001_Qt.fastq

-rw-rw-r-- 1 jpummil jpummil 199M Feb 24 13:43 L001_R1_001_Sub.fastq
-rw-rw-r-- 1 jpummil jpummil 199M Feb 24 13:43 L001_R2_001_Sub.fastq

And it ran just fine in SPAdes.

**Brian Bushnell** · 02-24-2015, 12:31 PM

Yep, that's the correct approach. The "reads=-1" flag is not necessary (that's the default), and "int=f" also gets forced automatically when you have dual input files.

**rodf** · 03-17-2015, 03:24 PM

addslash=t problem

Hi Brian,

I tried to use reformat to add /1 and /2 to paired read names and a space was added between the name and the slash and this does not work for the assembler.

i.e.
want name/1 but get name /1

how to fix this?

Thanks,
Rod

**Brian Bushnell** · 03-17-2015, 03:36 PM

Hi Rod,

There is currently no way to fix that. I specifically made it that way to replicate the name structure of real Illumina reads. For paired reads, many tools or formats (such as sam) require both to have exactly matching names, excluding anything after the first whitespace (such as a 1 or 2). Most aligners, therefore, trim everything after the first whitespace, though with BBMap you can disable this with the "keepnames" flag.

What are you doing that requires no space? I could add an option for that, but I'm interested in why.

**rodf** · 03-17-2015, 03:59 PM

Hi Brian,

I'm using the Mira assembler to assemble Illumina MiSeq reads I got from NCBI/SRA. I use "fastq-dump --split-files -F xxxx.sra" to extract the reads and get two files. Each set of paired end reads have exactly the same name, and mira needs the /1 and /2 added onto the reads and gives an error if it detects reads with the same name.

Thanks for your quick reply,
Rod

**Brian Bushnell** · 03-23-2015, 12:03 PM

Originally posted by rodf View Post

Hi Brian,

I'm using the Mira assembler to assemble Illumina MiSeq reads I got from NCBI/SRA. I use "fastq-dump --split-files -F xxxx.sra" to extract the reads and get two files. Each set of paired end reads have exactly the same name, and mira needs the /1 and /2 added onto the reads and gives an error if it detects reads with the same name.

Thanks for your quick reply,
Rod

Rod,

You can now use the flags "addslash=t slashspace=f" together to accomplish that.

As an unrelated note, reformat now supports the "stoptag" flag, so it can process a sam file and add the stop coordinate of the read to the optional tags, prefixed by "YS:i:".

**darthsequencer** · 06-11-2016, 12:48 PM

Percent identity filter?

Hi is there a way to filter sam/bam filters by percent identity?

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Introducing Reformat, a fast read format converter

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News