Seqanswers Leaderboard Ad

**wolma** · 06-25-2014, 07:33 AM

I don't think there's a command line switch for it, but this is just a perl script so it shouldn't be too hard to modify it to do what you want.
I haven't tested this, but in the fastx_barcode_splitter.pl file there is this function:

Code:

sub match_sequences {

.. lots of truncated lines ..
.. but ending in ..

		$best_barcode_ident = 'unmatched' 
			if ( (!defined $best_barcode_ident) || $best_barcode_mismatches_count>$allowed_mismatches) ;

		print STDERR "sequence $seq_bases matched barcode: $best_barcode_ident\n" if $debug;

		$counts{$best_barcode_ident}++;

		#get the file associated with the matched barcode.
		#(note: there's also a file associated with 'unmatched' barcode)
		my $file = $files{$best_barcode_ident};

		write_record($file);
	}
}

I think if you just enclose the write_record($file); in an if clause like this:

Code:

if ($best_barcode_ident ne  'unmatched') {
    write_record($file);
}

it should help. The unmatched output file will still be generated, but nothing should be written into it.

As I said untested, but I hope it helps,
Wolfgang

**a.cardilini** · 06-25-2014, 06:23 PM

Thanks wolfgang,

that works great! I no longer get the unmatched.fq file printed out.

Unfortunately, it is still pretty slow because it is processing these reads. Do you think it is possible to skip the processing of unmatched reads, or is this likely to cause problems with running the script? This python script is largely illegible to me so I am not sure how intertwined the unmatched stuff is with the match stuff.

Thanks again for your help, I really appreciate it.

All the best,
Adam

**Brian Bushnell** · 06-25-2014, 10:38 PM

Adam,

That's perl, not python... and if you want it to run faster, you might need to write or find a version written in a compiled language like C or Java, rather than an interpreted language like perl and python. It depends on whether it is CPU or I/O limited; run "top" and see if the cpu load is 100% while running. If it is, then you're cpu-limited and probably need a different language to speed it up.

And you probably can't speed it up by skipping reads. You have to process a read (or bar code) at least once in order to determine whether or not it matches one of your bins!

If it runs fast enough when you run it once, rather than 576 times, you may be able trick it by padding your short barcodes with extra characters, so that all are the same length.

**wolma** · 06-26-2014, 12:34 AM

Adam,
there is no way to skip processing of the reads. As Brian points out correctly you need to look at them to see if they match. My suggestion should save some time by minimizing disk write access, but that's all you can do.

Another option would be to in fact write the unmatched reads, then use only this file as input in the next round. With such a subtraction approach, the unmatched reads file would become smaller at every step, so even though your first round may take very long, subsequent steps would run faster.
Depending on how similar your barcodes are this would also eliminate the risk of accidentally assigning the same read to two different barcodes.

**gringer** · 06-26-2014, 02:03 PM

Or modify the algorithm so that it works with barcodes of different lengths. If you're already changing the code, you might as well make that little fix as well.
/usr/bin/fastx_barcode_splitter.pl
edit: or not so little.... If you can give me a bit more information about how your barcoding system works (e.g. do you have a separate barcode file and sequence file? do barcodes always appear in the first 10 bases? Do long barcodes always start at the same place in the sequence?), I might be able to crank out something that works.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Splitting fastq file by barcodes without producing unmatched.fq file?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News