Seqanswers Leaderboard Ad

**jdanderson** · 10-04-2010, 05:43 PM

Hello All,

I am updating my progress in case this may help someone in the future.

As previously mentioned I used the FASTX Toolkit on the export.txt and extended.txt files from Illumina pipeline 1.6 with minimal success and I suspected a formatting error in these files. I just tried using the same Barcode Splitting module on the sequence.txt file (prior to reformatting to Sanger Fastq) and it seems to have worked fine, with the caveat that there appears to be more reads in the unmatched file than I had expected (199,524 out of 28,223,602 or 0.7%), but perhaps this is normal. For reference, I had used the NuGen Ovation and Encore Kits for library prep.

Regards,
Johnathon

**KevinLam** · 10-06-2010, 06:39 PM

sorry to hijack your thread but would fastx toolkit be able to demultiplex SOLiD reads as well?

**hyjkim** · 10-06-2010, 07:01 PM

Fastx toolkit does not work for solid data. I wrote some perl scripts to demultiplex some solid data few months back. The code and the syntax weren't pretty. If you're interested, I can dig the scripts up and post them.

**jdanderson** · 10-06-2010, 07:01 PM

Hello Kevin,

I am not sure. I cannot directly tell from the documentation, however, i don't see any mention of color space reads. Maybe you could query the Hannon Lab if you don't get an immediate answer on here ([email protected]).

-
Johnathon

**2007lab** · 05-06-2011, 09:12 AM

Bump for the solid part of this thread.
Once I run the solid2fastq.pl to convert my csfasta and qual to a fastq.gz file, can I use fastx to do QC on my solid PE reads?

**upendra_35** · 08-26-2011, 03:36 PM

Hi jdanderson,
I think your command looks good to me and i suspect the problem is with the barcode file.Try opening the barcode file with vi and see if there is anything werid going on. Sometimes you see ^M at the end of the line and if you see so then you can manually fix this and re-run the command. Good luck....

**carmeyeii** · 01-14-2013, 08:16 PM

Hi everyone,

I've been using the FastX Barcode Splitter successfully, but regarding the --partial option, I have realized I'm losing some reads with a particular problem:

With --partial 1

The barcode

Code:

CGCGTCAGCATTGTTCATAC

will pick up the read

Code:

[COLOR="purple"]GCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACT

since it is missing just one base at the left end to match the barcode exactly.

However, the read:

Code:

C[COLOR="Purple"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACT

will not be taken as matching the barcode, since it has one extra base at the beginning. Unfortunately, there are many reads that fall into this category, but not all of them begin with the extra 'G'.

Do you use anything else to get around this?

Thanks!
Carmen

**chadn737** · 01-14-2013, 09:02 PM

A quick and dirty solution would be to trim of the first base pair of all your reads and then just use FastX barcode splitter with --partial

**carmeyeii** · 01-15-2013, 11:45 AM

Thank you, chadn!

Of course this was the easiest solution.

The barcode is:

Code:

REVERSEPRIMER	[COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]

Read 1 begins with a perfect match to the barcode.

Code:

@HWI-M00149:16:000000000-A12VK:1:2114:17873:29127 2:N:0:
[COLOR="Red"]CGCGTCAGCATTGTTCATACAAAGCTAC[/COLOR]TTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN

Read 2 has an extra base at the beginning, followed by a perfect match to the barcode.

Code:

@HWI-M00149:16:000000000-A12VK:1:2114:17873:29128 2:N:0:
A[COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN

Read 3 is missing the first base of the barcode.

Code:

@HWI-M00149:16:000000000-A12VK:1:2114:17873:29129 2:N:0:
[COLOR="red"]GCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN

By trimming the first base of every read,

we are left with

Code:

Read 1 [now missing 1 base at the beginning]

[COLOR="Red"]GCGTCAGCATTGTTCATACAAAGCTAC[/COLOR]TTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN

Read 2 [now perfect match]

[COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN

Read 3 [now missing 2 bases at the beginning]

[COLOR="red"]CGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN

and by using

Code:

--mismatch [COLOR="red"]4[/COLOR] --partial [COLOR="red"]4[/COLOR]

all reads will be matched to the barcode.

The --4 doesn't make sense to me, as I thought this would be --2, but this is the only thing hat gets it to work, so...

Thanks a lot!

Carmen

**vivi7** · 05-14-2014, 01:49 AM

fastx_barcodes_splitter issue with run

Hi,

I saw the post and I hope maybe some of you can help me

When I run fastx_barcode_splitter.pl with this script

/usr/local/bin/fastx_barcode_splitter.pl --bcfile ./Barcodes9nt.txt --prefix ./Rescued9nt --suffix .fq –bol

In the command line it looks like is running (no error message, no > sign), see attachment for screenshot.
However is not running at all, I can see with top that is not using any memory or CPUs and it has been ‘running’ for days on a very small file without producing any results.
The input file is in the STDIN folder as supposed to.

I would be very grateful if you could suggest what might be wrong.
Thanks in advance
Vivi

**smitra** · 01-25-2016, 08:44 AM

Hi vivi7,
I guess you need to provide your fastq or fasta file. You haven't provide that.
Use as

Code:

cat File.fastq | /usr/local/bin/fastx_barcode_splitter.pl --bcfile mybarcodes.txt ...other options if you want.

**smitra** · 01-25-2016, 08:50 AM

Hi Everybody,
I came back to this thread again as I am getting a very similar problem to the first post by janderson.

My code works fine:

cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 1

But none of the output files contain any reads except for the mismatched file.

This data we got from Mr.DNA and raw fastq file for 10 sample together which I need to split. Johnathon's later suggestion din't help.
Can anybody help please?
Thanks,
smitra

**GenoMax** · 01-25-2016, 09:04 AM

Can post a few lines of your fastq file and the mapping file?

**smitra** · 01-25-2016, 09:08 AM

Thanks for replying GenoMax

Code:

#SampleID	BarcodeSequence
AP1E	CGTAACCA
AP25E	CGTACCCA
AP5D	CGTAAGAA
AP8C	CGTAGATA
P29F	CGTAGGCT
P30N	CGTATTCA
P31B	CGTCAAGA
P35C	CGTATTTC
V2A	CGTCCAGG
V3J	CGTCACAG

But as the fastq files look like (I assume the bold red part is the barcode with one N)

mitras$ less test_R1.fastq

Code:

@M02542:124:000000000-AKFBJ:1:1101:13841:1000 1:N:0:5

NGTACCCAAGGGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACANNCNNGTCGAACGGTAGCNCAGAGAGCTTGCTCTNGGNTGACGAGTGGCGGACGGGNGANTAATGTCTGGGAAACTGCCCGATGGAGGGGGATANCTACTGGANANNGNNGCTAATACCGCATAACGNCGCAAGACCAAAGAGGGNGANNTCAGGGCCTCTTGNCATCGGATGNNCCCAGATGGGATNGGCTTGTAGGTGAGGTAAGNGCTCACGCNGGCGACGATCCCTAGCTTGGNNGNGAGG

+

#8ABCFGGGGGGGGEEGGGGGGGG<FGGGFFGFGFGFGGEG@FGEEGGCFGGGGG?##:##6:CFFGGGDG<CG#:CCFFGEGGGGFAFG#:<#:BBFF7FFGDGGGGGGGD#8+#+:BFGGGGGGGCFFGDGG<FGGGECCGDEGGGF@#611:D,>>#6##6##66<1CF@7FFFGEGF7E#41=8=EGFFG7*?CF>>#22##2*2;@;8C8CFC<#/2AC=E*:5##/2:CFCG+8**+#*1*1552<+*+0+8D6D4+#1**)**)*#*15/*//7>5:5<.*,*)0)##1#..73

@M02542:124:000000000-AKFBJ:1:1101:12174:1002 1:N:0:5

NGTAACCAAGGGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACANNCNNGTCGAGGGGCAGCATTTCAGTTTGCTTGCNAANTGGAGATGGCGACCGGCGNACNGGTGAGTAACACGTATCCAACCTGCCGATAACTCNGGGATAGCNTNNCNNAAGAAAGATTGATACCCNATGGTATAATCAGACCGNATGGTCTTATTATTAAANAATTTCGGTNNTCGATGGGGATGNGTTCCATTAGGCAGTTGGTGTGTTAATGNCGCACCAAACCTTCCTGTGANNGNGTTT

+

#8ACCGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGDGGGGGGGGGFGGGGGGG##:##6:CFGFDEGGGGDGGGFGGGFGGGGGGGG#:C#66=,CFFFGGGG@FGEE7#++#:BBFFGGGFCFGGGGGGCGDGGGFGGGGGGGGC=#8@<<<FGG#5##8##86DCF<FCCC:BFCFFF#6>F>FGG92;@CFFGF@#116*=CF<CG?@CFFFG#3;5375:CG##212**<5C5/::#11:91A>+<>C6CE<FC:*****0:FB<#1*)//75<F30762*-2)**##1#0)0.

But as you can see I have N, so may be I need to allow 1 mismatch for the barcode.
Thus I tried code as:

cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 1

Thanks for helping

smitra

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 45 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 46 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 39 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

FASTX Toolkit barcode splitter issue

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News