Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jdanderson
    Member
    • Sep 2010
    • 45

    FASTX Toolkit barcode splitter issue

    Hello All,

    I've been trying to use the FASTX Toolkit barcode splitter to demultiplex my illumina reads. The following command runs okay without any errors:

    [cat /home/johnathon/jda_ev_extended.txt | fastx_barcode_splitter.pl --bcfile /home/johnathon/mybarcodes.txt --bol --mismatches 1 --prefix /home/johnathon/split_bc/jda_ev_split_bc --suffix ".txt"]

    But none of the output files contain any reads except for the mismatched file.

    The following is mybarcode.txt file:

    [#I hope the following is the appropriate format for this txt file, it should contain the barcode identifier and the barcode sequence itself in a tab delimited fashion--Johnathon David Anderson
    BC1 ACCC
    BC2 CGTA
    BC3 GAGT
    BC4 TTAG]


    However, when I look at the extended.txt file i can see the right barcodes on the 5' end. I have also tried to use the export.txt file to no avail; apparently it is not formatted appropriately. I get an error message saying for the first character there is an "S" instead of an "<" or an "@".

    I have not converted these files from Solexa to Sanger Fastq. Could this be the issue?

    For my first data set that was not barcoded I was using the MAQ fq_all2std.pl script export2std command to convert the export.txt file. It worked just fine and I was able to visualize the data on IGV. I haven't had much success with MAQ patch ill2sanger and am wondering if this is the issue with FASTX Toolkit, then can anyone recommend a user friendly script to convert. I am using Solexa pipeline 1.6.

    Is anyone familiar with the FASTX Toolkit? Is the problem probably that the Illumina files need to be converted to Sanger FASTQ first?

    Any guidance would be most appreciated?

    Regards,
    Johnathon
  • jdanderson
    Member
    • Sep 2010
    • 45

    #2
    Hello All,

    I am updating my progress in case this may help someone in the future.

    As previously mentioned I used the FASTX Toolkit on the export.txt and extended.txt files from Illumina pipeline 1.6 with minimal success and I suspected a formatting error in these files. I just tried using the same Barcode Splitting module on the sequence.txt file (prior to reformatting to Sanger Fastq) and it seems to have worked fine, with the caveat that there appears to be more reads in the unmatched file than I had expected (199,524 out of 28,223,602 or 0.7%), but perhaps this is normal. For reference, I had used the NuGen Ovation and Encore Kits for library prep.

    Regards,
    Johnathon

    Comment

    • KevinLam
      Senior Member
      • Nov 2009
      • 204

      #3
      sorry to hijack your thread but would fastx toolkit be able to demultiplex SOLiD reads as well?
      http://kevin-gattaca.blogspot.com/

      Comment

      • hyjkim
        Member
        • Apr 2010
        • 18

        #4
        Fastx toolkit does not work for solid data. I wrote some perl scripts to demultiplex some solid data few months back. The code and the syntax weren't pretty. If you're interested, I can dig the scripts up and post them.

        Comment

        • jdanderson
          Member
          • Sep 2010
          • 45

          #5
          Hello Kevin,

          I am not sure. I cannot directly tell from the documentation, however, i don't see any mention of color space reads. Maybe you could query the Hannon Lab if you don't get an immediate answer on here ([email protected]).

          -
          Johnathon

          Comment

          • 2007lab
            Member
            • Mar 2009
            • 14

            #6
            Bump for the solid part of this thread.
            Once I run the solid2fastq.pl to convert my csfasta and qual to a fastq.gz file, can I use fastx to do QC on my solid PE reads?

            Comment

            • upendra_35
              Senior Member
              • Apr 2010
              • 102

              #7
              Hi jdanderson,
              I think your command looks good to me and i suspect the problem is with the barcode file.Try opening the barcode file with vi and see if there is anything werid going on. Sometimes you see ^M at the end of the line and if you see so then you can manually fix this and re-run the command. Good luck....

              Comment

              • carmeyeii
                Senior Member
                • Mar 2011
                • 137

                #8
                Hi everyone,

                I've been using the FastX Barcode Splitter successfully, but regarding the --partial option, I have realized I'm losing some reads with a particular problem:

                With --partial 1

                The barcode

                Code:
                CGCGTCAGCATTGTTCATAC
                will pick up the read

                Code:
                [COLOR="purple"]GCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACT
                since it is missing just one base at the left end to match the barcode exactly.

                However, the read:

                Code:
                C[COLOR="Purple"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACT
                will not be taken as matching the barcode, since it has one extra base at the beginning. Unfortunately, there are many reads that fall into this category, but not all of them begin with the extra 'G'.

                Do you use anything else to get around this?

                Thanks!
                Carmen

                Comment

                • chadn737
                  Senior Member
                  • Jan 2009
                  • 392

                  #9
                  A quick and dirty solution would be to trim of the first base pair of all your reads and then just use FastX barcode splitter with --partial

                  Comment

                  • carmeyeii
                    Senior Member
                    • Mar 2011
                    • 137

                    #10
                    Thank you, chadn!

                    Of course this was the easiest solution.

                    The barcode is:
                    Code:
                    REVERSEPRIMER	[COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]
                    Read 1 begins with a perfect match to the barcode.

                    Code:
                    @HWI-M00149:16:000000000-A12VK:1:2114:17873:29127 2:N:0:
                    [COLOR="Red"]CGCGTCAGCATTGTTCATACAAAGCTAC[/COLOR]TTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    Read 2 has an extra base at the beginning, followed by a perfect match to the barcode.

                    Code:
                    @HWI-M00149:16:000000000-A12VK:1:2114:17873:29128 2:N:0:
                    A[COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    Read 3 is missing the first base of the barcode.

                    Code:
                    @HWI-M00149:16:000000000-A12VK:1:2114:17873:29129 2:N:0:
                    [COLOR="red"]GCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    By trimming the first base of every read,

                    we are left with

                    Code:
                    Read 1 [now missing 1 base at the beginning]
                    
                    [COLOR="Red"]GCGTCAGCATTGTTCATACAAAGCTAC[/COLOR]TTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    
                    Read 2 [now perfect match]
                    
                    [COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    
                    Read 3 [now missing 2 bases at the beginning]
                    
                    [COLOR="red"]CGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    and by using

                    Code:
                    --mismatch [COLOR="red"]4[/COLOR] --partial [COLOR="red"]4[/COLOR]
                    all reads will be matched to the barcode.

                    The --4 doesn't make sense to me, as I thought this would be --2, but this is the only thing hat gets it to work, so...

                    Thanks a lot!

                    Carmen

                    Comment

                    • vivi7
                      Member
                      • Mar 2014
                      • 10

                      #11
                      fastx_barcodes_splitter issue with run

                      Hi,

                      I saw the post and I hope maybe some of you can help me

                      When I run fastx_barcode_splitter.pl with this script

                      /usr/local/bin/fastx_barcode_splitter.pl --bcfile ./Barcodes9nt.txt --prefix ./Rescued9nt --suffix .fq –bol

                      In the command line it looks like is running (no error message, no > sign), see attachment for screenshot.
                      However is not running at all, I can see with top that is not using any memory or CPUs and it has been ‘running’ for days on a very small file without producing any results.
                      The input file is in the STDIN folder as supposed to.

                      I would be very grateful if you could suggest what might be wrong.
                      Thanks in advance
                      Vivi

                      Comment

                      • smitra
                        Member
                        • May 2013
                        • 20

                        #12
                        Hi vivi7,
                        I guess you need to provide your fastq or fasta file. You haven't provide that.
                        Use as
                        Code:
                        cat File.fastq | /usr/local/bin/fastx_barcode_splitter.pl --bcfile mybarcodes.txt ...other options if you want.

                        Comment

                        • smitra
                          Member
                          • May 2013
                          • 20

                          #13
                          Hi Everybody,
                          I came back to this thread again as I am getting a very similar problem to the first post by janderson.

                          My code works fine:
                          cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 1
                          But none of the output files contain any reads except for the mismatched file.

                          This data we got from Mr.DNA and raw fastq file for 10 sample together which I need to split. Johnathon's later suggestion din't help.
                          Can anybody help please?
                          Thanks,
                          smitra

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            Can post a few lines of your fastq file and the mapping file?

                            Comment

                            • smitra
                              Member
                              • May 2013
                              • 20

                              #15
                              Thanks for replying GenoMax

                              Code:
                              #SampleID	BarcodeSequence
                              AP1E	CGTAACCA
                              AP25E	CGTACCCA
                              AP5D	CGTAAGAA
                              AP8C	CGTAGATA
                              P29F	CGTAGGCT
                              P30N	CGTATTCA
                              P31B	CGTCAAGA
                              P35C	CGTATTTC
                              V2A	CGTCCAGG
                              V3J	CGTCACAG
                              But as the fastq files look like (I assume the bold red part is the barcode with one N)

                              mitras$ less test_R1.fastq

                              Code:
                              @M02542:124:000000000-AKFBJ:1:1101:13841:1000 1:N:0:5
                              
                              NGTACCCAAGGGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACANNCNNGTCGAACGGTAGCNCAGAGAGCTTGCTCTNGGNTGACGAGTGGCGGACGGGNGANTAATGTCTGGGAAACTGCCCGATGGAGGGGGATANCTACTGGANANNGNNGCTAATACCGCATAACGNCGCAAGACCAAAGAGGGNGANNTCAGGGCCTCTTGNCATCGGATGNNCCCAGATGGGATNGGCTTGTAGGTGAGGTAAGNGCTCACGCNGGCGACGATCCCTAGCTTGGNNGNGAGG
                              
                              +
                              
                              #8ABCFGGGGGGGGEEGGGGGGGG<FGGGFFGFGFGFGGEG@FGEEGGCFGGGGG?##:##6:CFFGGGDG<CG#:CCFFGEGGGGFAFG#:<#:BBFF7FFGDGGGGGGGD#8+#+:BFGGGGGGGCFFGDGG<FGGGECCGDEGGGF@#611:D,>>#6##6##66<1CF@7FFFGEGF7E#41=8=EGFFG7*?CF>>#22##2*2;@;8C8CFC<#/2AC=E*:5##/2:CFCG+8**+#*1*1552<+*+0+8D6D4+#1**)**)*#*15/*//7>5:5<.*,*)0)##1#..73
                              
                              @M02542:124:000000000-AKFBJ:1:1101:12174:1002 1:N:0:5
                              
                              NGTAACCAAGGGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACANNCNNGTCGAGGGGCAGCATTTCAGTTTGCTTGCNAANTGGAGATGGCGACCGGCGNACNGGTGAGTAACACGTATCCAACCTGCCGATAACTCNGGGATAGCNTNNCNNAAGAAAGATTGATACCCNATGGTATAATCAGACCGNATGGTCTTATTATTAAANAATTTCGGTNNTCGATGGGGATGNGTTCCATTAGGCAGTTGGTGTGTTAATGNCGCACCAAACCTTCCTGTGANNGNGTTT
                              
                              +
                              
                              #8ACCGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGDGGGGGGGGGFGGGGGGG##:##6:CFGFDEGGGGDGGGFGGGFGGGGGGGG#:C#66=,CFFFGGGG@FGEE7#++#:BBFFGGGFCFGGGGGGCGDGGGFGGGGGGGGC=#8@<<<FGG#5##8##86DCF<FCCC:BFCFFF#6>F>FGG92;@CFFGF@#116*=CF<CG?@CFFFG#3;5375:CG##212**<5C5/::#11:91A>+<>C6CE<FC:*****0:FB<#1*)//75<F30762*-2)**##1#0)0.


                              But as you can see I have N, so may be I need to allow 1 mismatch for the barcode.
                              Thus I tried code as:
                              cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 1
                              Thanks for helping
                              smitra
                              Last edited by GenoMax; 01-25-2016, 09:10 AM. Reason: added CODE tags to improve readability

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              41 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              102 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              123 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              114 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...