Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • newBioinfo
    Member
    • Mar 2012
    • 36

    Demultiplex Illumina reads

    Hi Everyone,
    I am kind of stuck with my Illumina data, I want to remove the barcodes from my reads. My read file looks like this
    @HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 1:N:0:
    NACAGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    +
    #1=DDDFFCADHHIIIIIIIIGIIIIIIIGIIIIIIIFHIIIIICFHH################################
    #######################################################################
    and my barcode files look like this:
    @HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 2:N:0:
    NNNNNANACACA
    +
    ############
    When I am using fastx toolkit to trim barcodes I am getting error.
    The command I am using is:
    cat lane5_NoIndex_L005_R1_001.fastq | /u2/software/fastx/fastx_toolkit-0.0.13.2/bin/fastx_barcode_splitter.pl --bcfile lane5_NoIndex_L005_R2_001.fastq --bol --prefix x --suffix ".fastq"
    The error I am getting is:
    Error: bad barcode value (2:N:0 at barcode file (lane5_NoIndex_L005_R2_001.fastq) line 1
    The reason I think is beacuse of 2:N:0 in the barcode header and 1:N:0 in the reads header.
    I am not sure how to rectify this, please if anyone has any idea could you please help me.

    Thanks!!!!
  • JackieBadger
    Senior Member
    • Mar 2009
    • 385

    #2
    Use Trimmomatic... much more versatile and mate-pair aware. Or just use the trim function in text manipulation in Galaxy.

    Comment

    • sklages
      Senior Member
      • May 2008
      • 628

      #3
      Why don't you ask your sequencing provider to demultiplex the data for you? It is not more work for them, as it is part of the fastq processing; you just need to provide some kind of sample description, a samplesheet.

      All our customers are quite happy that this work has already been done when they get their data :-)

      Sven

      Comment

      • JackieBadger
        Senior Member
        • Mar 2009
        • 385

        #4
        BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
        I would do everything with a program where you set the parameters and know what is going in and what should come out.

        Comment

        • newBioinfo
          Member
          • Mar 2012
          • 36

          #5
          Originally posted by JackieBadger View Post
          Use Trimmomatic... much more versatile and mate-pair aware. Or just use the trim function in text manipulation in Galaxy.
          Thanks jackieBadger,
          I will try it!

          Comment

          • newBioinfo
            Member
            • Mar 2012
            • 36

            #6
            Originally posted by JackieBadger View Post
            BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
            I would do everything with a program where you set the parameters and know what is going in and what should come out.
            Thanks JackieBadger,
            I am planning to use QIME, so hopefully I will not encounter such issues.



            Thanks for the help!!!

            Comment

            • newBioinfo
              Member
              • Mar 2012
              • 36

              #7
              Originally posted by sklages View Post
              Why don't you ask your sequencing provider to demultiplex the data for you? It is not more work for them, as it is part of the fastq processing; you just need to provide some kind of sample description, a samplesheet.

              All our customers are quite happy that this work has already been done when they get their data :-)

              Sven
              Thanks Seven,
              Ii would be good if they demultiplex the data before sending, but in my case it is not.

              Comment

              • sklages
                Senior Member
                • May 2008
                • 628

                #8
                Originally posted by JackieBadger View Post
                BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
                I would do everything with a program where you set the parameters and know what is going in and what should come out.
                hhm, it's pretty easy to check what the provider does if you also have the "Undetermined_indices" data files. MiSeq is another thing ... the trimming issue is known and should not be used (currently). You could also ask for some (demultiplexing) stats, to see if the results are "good" or as expected.

                If you don't trust in your sequence provider at all, you should look for another one ;-)

                What "significant errors" did you encounter in the MiSeq demultiplexing?
                We are not plexing Miseq libs, so I am just curious :-)

                Sven

                Comment

                • NextGenSeq
                  Senior Member
                  • Apr 2009
                  • 482

                  #9
                  Are you sure they did an index read? The title of your files say No_Index. The only time we ever get data titled like this is if a index read wasn't done.

                  Comment

                  • sklages
                    Senior Member
                    • May 2008
                    • 628

                    #10
                    Originally posted by NextGenSeq View Post
                    Are you sure they did an index read? The title of your files say No_Index. The only time we ever get data titled like this is if a index read wasn't done.
                    No, you'll always get that naming when there is not index specified in the samplesheet for that run (irrespective if there was run an index read).
                    You cannot safely deduce from the naming wether there has been run an index read or not (at least for the "_NoIndex_" case)..

                    Sven

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      Originally posted by sklages View Post
                      No, you'll always get that naming when there is not index specified in the samplesheet for that run (irrespective if there was run an index read).
                      You cannot safely deduce from the naming wether there has been run an index read or not (at least for the "_NoIndex_" case)..

                      Sven
                      Let us hope that is the case. If the facility did not run this as a multiplex sample then OP is out of luck. This run will have to be repeated.

                      Comment

                      • sklages
                        Senior Member
                        • May 2008
                        • 628

                        #12
                        Originally posted by GenoMax View Post
                        Let us hope that is the case. If the facility did not run this as a multiplex sample then OP is out of luck. This run will have to be repeated.
                        Sure, you are absolutely right.

                        This problem might arise if the customer doesn't mention any indices that need to be demultiplexed in their "order" (however this order looks like), maybe assuming that this is not relevant for the sequencing run itself but for the post-processing only.
                        ... and the sequencing core organizes their FCs with respect to read length and MP/no MP ...

                        We had a similar post a while ago, where the OP has hand-written a little note on the "order sheet" and as a result the sequencing didn't recognize it as "please do an index read, as my libraries have indices" ...

                        Sven

                        Comment

                        • LVAndrews
                          Member
                          • Sep 2012
                          • 55

                          #13
                          Originally posted by newBioinfo View Post
                          Thanks JackieBadger,
                          I am planning to use QIME, so hopefully I will not encounter such issues.



                          Thanks for the help!!!
                          If you are using QIIME, then you will have the option to remove unwanted sequences (such as barcodes) during the split_libraries_fastq.py step. See http://qiime.org/scripts/split_libraries_fastq.html for more information.

                          Comment

                          • LVAndrews
                            Member
                            • Sep 2012
                            • 55

                            #14
                            Originally posted by AKrohn View Post
                            If you are using QIIME, then you will have the option to remove unwanted sequences (such as barcodes) during the split_libraries_fastq.py step. See http://qiime.org/scripts/split_libraries_fastq.html for more information.
                            Oops for this application you want split_libraries.py script instead.

                            Comment

                            Latest Articles

                            Collapse

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 10:09 AM
                            0 responses
                            10 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-04-2026, 08:59 AM
                            0 responses
                            17 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            26 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            21 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...