SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastX-toolkit liu_xt005 Bioinformatics 13 10-11-2014 05:52 AM
FASTX-Toolkit: quality score value thinkRNA Bioinformatics 13 09-30-2014 10:25 AM
Fastx Toolkit for Quality Stats of data from new illumina pipeline software vedbar Bioinformatics 6 09-19-2011 10:50 AM
SCS/RTA upgrade Q-score of 41 fastx toolkit crash seqfast Bioinformatics 1 08-22-2011 08:15 AM
Barcode splitter - for paired end reads & specifiy prefix in output file spoonman Bioinformatics 2 08-10-2011 04:54 PM

Reply
 
Thread Tools
Old 10-03-2010, 07:18 PM   #1
jdanderson
Member
 
Location: Davis, CA

Join Date: Sep 2010
Posts: 45
Default FASTX Toolkit barcode splitter issue

Hello All,

I've been trying to use the FASTX Toolkit barcode splitter to demultiplex my illumina reads. The following command runs okay without any errors:

[cat /home/johnathon/jda_ev_extended.txt | fastx_barcode_splitter.pl --bcfile /home/johnathon/mybarcodes.txt --bol --mismatches 1 --prefix /home/johnathon/split_bc/jda_ev_split_bc --suffix ".txt"]

But none of the output files contain any reads except for the mismatched file.

The following is mybarcode.txt file:

[#I hope the following is the appropriate format for this txt file, it should contain the barcode identifier and the barcode sequence itself in a tab delimited fashion--Johnathon David Anderson
BC1 ACCC
BC2 CGTA
BC3 GAGT
BC4 TTAG]


However, when I look at the extended.txt file i can see the right barcodes on the 5' end. I have also tried to use the export.txt file to no avail; apparently it is not formatted appropriately. I get an error message saying for the first character there is an "S" instead of an "<" or an "@".

I have not converted these files from Solexa to Sanger Fastq. Could this be the issue?

For my first data set that was not barcoded I was using the MAQ fq_all2std.pl script export2std command to convert the export.txt file. It worked just fine and I was able to visualize the data on IGV. I haven't had much success with MAQ patch ill2sanger and am wondering if this is the issue with FASTX Toolkit, then can anyone recommend a user friendly script to convert. I am using Solexa pipeline 1.6.

Is anyone familiar with the FASTX Toolkit? Is the problem probably that the Illumina files need to be converted to Sanger FASTQ first?

Any guidance would be most appreciated?

Regards,
Johnathon
jdanderson is offline   Reply With Quote
Old 10-04-2010, 06:43 PM   #2
jdanderson
Member
 
Location: Davis, CA

Join Date: Sep 2010
Posts: 45
Default

Hello All,

I am updating my progress in case this may help someone in the future.

As previously mentioned I used the FASTX Toolkit on the export.txt and extended.txt files from Illumina pipeline 1.6 with minimal success and I suspected a formatting error in these files. I just tried using the same Barcode Splitting module on the sequence.txt file (prior to reformatting to Sanger Fastq) and it seems to have worked fine, with the caveat that there appears to be more reads in the unmatched file than I had expected (199,524 out of 28,223,602 or 0.7%), but perhaps this is normal. For reference, I had used the NuGen Ovation and Encore Kits for library prep.

Regards,
Johnathon
jdanderson is offline   Reply With Quote
Old 10-06-2010, 07:39 PM   #3
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 197
Default

sorry to hijack your thread but would fastx toolkit be able to demultiplex SOLiD reads as well?
KevinLam is offline   Reply With Quote
Old 10-06-2010, 08:01 PM   #4
hyjkim
Member
 
Location: Santa Cruz

Join Date: Apr 2010
Posts: 18
Default

Fastx toolkit does not work for solid data. I wrote some perl scripts to demultiplex some solid data few months back. The code and the syntax weren't pretty. If you're interested, I can dig the scripts up and post them.
hyjkim is offline   Reply With Quote
Old 10-06-2010, 08:01 PM   #5
jdanderson
Member
 
Location: Davis, CA

Join Date: Sep 2010
Posts: 45
Default

Hello Kevin,

I am not sure. I cannot directly tell from the documentation, however, i don't see any mention of color space reads. Maybe you could query the Hannon Lab if you don't get an immediate answer on here (gordon@cshl.edu).

-
Johnathon
jdanderson is offline   Reply With Quote
Old 05-06-2011, 10:12 AM   #6
2007lab
Member
 
Location: NYC

Join Date: Mar 2009
Posts: 14
Default

Bump for the solid part of this thread.
Once I run the solid2fastq.pl to convert my csfasta and qual to a fastq.gz file, can I use fastx to do QC on my solid PE reads?
2007lab is offline   Reply With Quote
Old 08-26-2011, 04:36 PM   #7
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

Hi jdanderson,
I think your command looks good to me and i suspect the problem is with the barcode file.Try opening the barcode file with vi and see if there is anything werid going on. Sometimes you see ^M at the end of the line and if you see so then you can manually fix this and re-run the command. Good luck....
upendra_35 is offline   Reply With Quote
Old 01-14-2013, 08:16 PM   #8
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Hi everyone,

I've been using the FastX Barcode Splitter successfully, but regarding the --partial option, I have realized I'm losing some reads with a particular problem:

With --partial 1

The barcode

Code:
CGCGTCAGCATTGTTCATAC
will pick up the read

Code:
GCGTCAGCATTGTTCATACAAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACT
since it is missing just one base at the left end to match the barcode exactly.

However, the read:

Code:
CCGCGTCAGCATTGTTCATACAAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACT
will not be taken as matching the barcode, since it has one extra base at the beginning. Unfortunately, there are many reads that fall into this category, but not all of them begin with the extra 'G'.

Do you use anything else to get around this?

Thanks!
Carmen
carmeyeii is offline   Reply With Quote
Old 01-14-2013, 09:02 PM   #9
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

A quick and dirty solution would be to trim of the first base pair of all your reads and then just use FastX barcode splitter with --partial
chadn737 is offline   Reply With Quote
Old 01-15-2013, 11:45 AM   #10
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Thank you, chadn!

Of course this was the easiest solution.

The barcode is:
Code:
REVERSEPRIMER	CGCGTCAGCATTGTTCATAC
Read 1 begins with a perfect match to the barcode.

Code:
@HWI-M00149:16:000000000-A12VK:1:2114:17873:29127 2:N:0:
CGCGTCAGCATTGTTCATACAAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
Read 2 has an extra base at the beginning, followed by a perfect match to the barcode.

Code:
@HWI-M00149:16:000000000-A12VK:1:2114:17873:29128 2:N:0:
ACGCGTCAGCATTGTTCATACAAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
Read 3 is missing the first base of the barcode.

Code:
@HWI-M00149:16:000000000-A12VK:1:2114:17873:29129 2:N:0:
GCGTCAGCATTGTTCATACAAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
By trimming the first base of every read,

we are left with

Code:
Read 1 [now missing 1 base at the beginning]

GCGTCAGCATTGTTCATACAAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN

Read 2 [now perfect match]

CGCGTCAGCATTGTTCATACAAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN

Read 3 [now missing 2 bases at the beginning]

CGTCAGCATTGTTCATACAAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
and by using

Code:
--mismatch 4 --partial 4
all reads will be matched to the barcode.

The --4 doesn't make sense to me, as I thought this would be --2, but this is the only thing hat gets it to work, so...

Thanks a lot!

Carmen
carmeyeii is offline   Reply With Quote
Old 05-14-2014, 02:49 AM   #11
vivi7
Member
 
Location: Aarhus, Denmark

Join Date: Mar 2014
Posts: 10
Smile fastx_barcodes_splitter issue with run

Hi,

I saw the post and I hope maybe some of you can help me

When I run fastx_barcode_splitter.pl with this script

/usr/local/bin/fastx_barcode_splitter.pl --bcfile ./Barcodes9nt.txt --prefix ./Rescued9nt --suffix .fq –bol

In the command line it looks like is running (no error message, no > sign), see attachment for screenshot.
However is not running at all, I can see with top that is not using any memory or CPUs and it has been ‘running’ for days on a very small file without producing any results.
The input file is in the STDIN folder as supposed to.

I would be very grateful if you could suggest what might be wrong.
Thanks in advance
Vivi
vivi7 is offline   Reply With Quote
Old 01-25-2016, 08:44 AM   #12
smitra
Member
 
Location: Singapore

Join Date: May 2013
Posts: 20
Default

Hi vivi7,
I guess you need to provide your fastq or fasta file. You haven't provide that.
Use as
Code:
cat File.fastq | /usr/local/bin/fastx_barcode_splitter.pl --bcfile mybarcodes.txt ...other options if you want.
smitra is offline   Reply With Quote
Old 01-25-2016, 08:50 AM   #13
smitra
Member
 
Location: Singapore

Join Date: May 2013
Posts: 20
Default

Hi Everybody,
I came back to this thread again as I am getting a very similar problem to the first post by janderson.

My code works fine:
Quote:
cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 1
But none of the output files contain any reads except for the mismatched file.

This data we got from Mr.DNA and raw fastq file for 10 sample together which I need to split. Johnathon's later suggestion din't help.
Can anybody help please?
Thanks,
smitra
smitra is offline   Reply With Quote
Old 01-25-2016, 09:04 AM   #14
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,992
Default

Can post a few lines of your fastq file and the mapping file?
GenoMax is offline   Reply With Quote
Old 01-25-2016, 09:08 AM   #15
smitra
Member
 
Location: Singapore

Join Date: May 2013
Posts: 20
Default

Thanks for replying GenoMax

Code:
#SampleID	BarcodeSequence
AP1E	CGTAACCA
AP25E	CGTACCCA
AP5D	CGTAAGAA
AP8C	CGTAGATA
P29F	CGTAGGCT
P30N	CGTATTCA
P31B	CGTCAAGA
P35C	CGTATTTC
V2A	CGTCCAGG
V3J	CGTCACAG
But as the fastq files look like (I assume the bold red part is the barcode with one N)

mitras$ less test_R1.fastq

Code:
@M02542:124:000000000-AKFBJ:1:1101:13841:1000 1:N:0:5

NGTACCCAAGGGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACANNCNNGTCGAACGGTAGCNCAGAGAGCTTGCTCTNGGNTGACGAGTGGCGGACGGGNGANTAATGTCTGGGAAACTGCCCGATGGAGGGGGATANCTACTGGANANNGNNGCTAATACCGCATAACGNCGCAAGACCAAAGAGGGNGANNTCAGGGCCTCTTGNCATCGGATGNNCCCAGATGGGATNGGCTTGTAGGTGAGGTAAGNGCTCACGCNGGCGACGATCCCTAGCTTGGNNGNGAGG

+

#8ABCFGGGGGGGGEEGGGGGGGG<FGGGFFGFGFGFGGEG@FGEEGGCFGGGGG?##:##6:CFFGGGDG<CG#:CCFFGEGGGGFAFG#:<#:BBFF7FFGDGGGGGGGD#8+#+:BFGGGGGGGCFFGDGG<FGGGECCGDEGGGF@#611:D,>>#6##6##66<1CF@7FFFGEGF7E#41=8=EGFFG7*?CF>>#22##2*2;@;8C8CFC<#/2AC=E*:5##/2:CFCG+8**+#*1*1552<+*+0+8D6D4+#1**)**)*#*15/*//7>5:5<.*,*)0)##1#..73

@M02542:124:000000000-AKFBJ:1:1101:12174:1002 1:N:0:5

NGTAACCAAGGGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACANNCNNGTCGAGGGGCAGCATTTCAGTTTGCTTGCNAANTGGAGATGGCGACCGGCGNACNGGTGAGTAACACGTATCCAACCTGCCGATAACTCNGGGATAGCNTNNCNNAAGAAAGATTGATACCCNATGGTATAATCAGACCGNATGGTCTTATTATTAAANAATTTCGGTNNTCGATGGGGATGNGTTCCATTAGGCAGTTGGTGTGTTAATGNCGCACCAAACCTTCCTGTGANNGNGTTT

+

#8ACCGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGDGGGGGGGGGFGGGGGGG##:##6:CFGFDEGGGGDGGGFGGGFGGGGGGGG#:C#66=,CFFFGGGG@FGEE7#++#:BBFFGGGFCFGGGGGGCGDGGGFGGGGGGGGC=#8@<<<FGG#5##8##86DCF<FCCC:BFCFFF#6>F>FGG92;@CFFGF@#116*=CF<CG?@CFFFG#3;5375:CG##212**<5C5/::#11:91A>+<>C6CE<FC:*****0:FB<#1*)//75<F30762*-2)**##1#0)0.


But as you can see I have N, so may be I need to allow 1 mismatch for the barcode.
Thus I tried code as:
Quote:
cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 1
Thanks for helping
smitra

Last edited by GenoMax; 01-25-2016 at 09:10 AM. Reason: added CODE tags to improve readability
smitra is offline   Reply With Quote
Old 01-25-2016, 09:15 AM   #16
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,992
Default

I suppose you could do two things. Remove the 1st base (if it is always N, which is kind of odd, see below) from all reads and remove the first C from your barcode file.

Hypothesis: Reason that first base is an N is because every sequence in this case will actually start with C (and then have GT). I am surprised that this worked for 2nd base onwards. Having low nucleotide diversity like this is not recommended.
GenoMax is offline   Reply With Quote
Old 01-25-2016, 09:25 AM   #17
smitra
Member
 
Location: Singapore

Join Date: May 2013
Posts: 20
Default

Thanks ... but how can I do so..any quick way ? Do you suggest to use fastx_trimmer?
smitra is offline   Reply With Quote
Old 01-25-2016, 09:32 AM   #18
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,992
Default

Can you try replacing the first C with an N in your barcode file and see if fastx-splitter would accept that as a valid pattern and do demultiplexing?

If that does not work then you could use bbduk from bbmap suite (with forcetrimleft=1) option or HEADCROP:1 option for trimmomatic to remove that first base (which is N) from all reads.
GenoMax is offline   Reply With Quote
Old 01-25-2016, 09:37 AM   #19
smitra
Member
 
Location: Singapore

Join Date: May 2013
Posts: 20
Default

Here first I tried to remove all N from the read files as you suggested earlier and also remove C from barcode file:

So now
Code:
 mitras$ less test_out_R1.fastq 

@M02542:124:000000000-AKFBJ:1:1101:13841:1000 1:N:0:5
GTACCCAAGGGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACANNCNNGTCGAACGGTAGCNCAGAGAGCTTGCTCTNGGNTGACGAGTGGCGGACGGGNGANTAATGTCTGGGAAACTGCCCGATGGAGGGGGATANCTACTGGANANNGNNGCTAATACCGCATAACGNCGCAAGACCAAAGAGGGNGANNTCAGGGCCTCTTGNCATCGGATGNNCCCAGATGGGATNGGCTTGTAGGTGAGGTAAGNGCTCACGCNGGCGACGATCCCTAGCTTGGNNGNGAGG
+
8ABCFGGGGGGGGEEGGGGGGGG<FGGGFFGFGFGFGGEG@FGEEGGCFGGGGG?##:##6:CFFGGGDG<CG#:CCFFGEGGGGFAFG#:<#:BBFF7FFGDGGGGGGGD#8+#+:BFGGGGGGGCFFGDGG<FGGGECCGDEGGGF@#611:D,>>#6##6##66<1CF@7FFFGEGF7E#41=8=EGFFG7*?CF>>#22##2*2;@;8C8CFC<#/2AC=E*:5##/2:CFCG+8**+#*1*1552<+*+0+8D6D4+#1**)**)*#*15/*//7>5:5<.*,*)0)##1#..73
@M02542:124:000000000-AKFBJ:1:1101:12174:1002 1:N:0:5
GTAACCAAGGGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACANNCNNGTCGAGGGGCAGCATTTCAGTTTGCTTGCNAANTGGAGATGGCGACCGGCGNACNGGTGAGTAACACGTATCCAACCTGCCGATAACTCNGGGATAGCNTNNCNNAAGAAAGATTGATACCCNATGGTATAATCAGACCGNATGGTCTTATTATTAAANAATTTCGGTNNTCGATGGGGATGNGTTCCATTAGGCAGTTGGTGTGTTAATGNCGCACCAAACCTTCCTGTGANNGNGTTT
+
8ACCGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGDGGGGGGGGGFGGGGGGG##:##6:CFGFDEGGGGDGGGFGGGFGGGGGGGG#:C#66=,CFFFGGGG@FGEE7#++#:BBFFGGGFCFGGGGGGCGDGGGFGGGGGGGGC=#8@<<<FGG#5##8##86DCF<FCCC:BFCFFF#6>F>FGG92;@CFFGF@#116*=CF<CG?@CFFFG#3;5375:CG##212**<5C5/::#11:91A>+<>C6CE<FC:*****0:FB<#1*)//75<F30762*-2)**##1#0)0.
And the barcode file:
Quote:
#SampleID BarcodeSequence
AP1E GTAACCA
AP25E GTACCCA
AP5D GTAAGAA
AP8C GTAGATA
P29F GTAGGCT
P30N GTATTCA
P31B GTCAAGA
P35C GTATTTC
V2A GTCCAGG
V3J GTCACAG
Still I am having the same problem :
Quote:
N85567:testdata mitras$ cat test_out_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile_new.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 0
Barcode Count Location
unmatched 1000 /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/unmatched
total 1000

Last edited by GenoMax; 01-25-2016 at 09:50 AM.
smitra is offline   Reply With Quote
Old 01-25-2016, 09:42 AM   #20
smitra
Member
 
Location: Singapore

Join Date: May 2013
Posts: 20
Default

And also the same is happening if I replace the first C with an N in barcode file
Quote:
#SampleID BarcodeSequence
AP1E NGTAACCA
AP25E NGTACCCA
AP5D NGTAAGAA
AP8C NGTAGATA
P29F NGTAGGCT
P30N NGTATTCA
P31B NGTCAAGA
P35C NGTATTTC
V2A NGTCCAGG
V3J NGTCACAG
Quote:
N85567:testdata mitras$ cat test_out_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile_new.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 0
Barcode Count Location
unmatched 1000 /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/unmatched
total 1000
smitra is offline   Reply With Quote
Reply

Tags
fastx toolkit barcode

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:30 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO