Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Demultiplex Illumina reads

    Hi Everyone,
    I am kind of stuck with my Illumina data, I want to remove the barcodes from my reads. My read file looks like this
    @HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 1:N:0:
    NACAGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    +
    #1=DDDFFCADHHIIIIIIIIGIIIIIIIGIIIIIIIFHIIIIICFHH################################
    #######################################################################
    and my barcode files look like this:
    @HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 2:N:0:
    NNNNNANACACA
    +
    ############
    When I am using fastx toolkit to trim barcodes I am getting error.
    The command I am using is:
    cat lane5_NoIndex_L005_R1_001.fastq | /u2/software/fastx/fastx_toolkit-0.0.13.2/bin/fastx_barcode_splitter.pl --bcfile lane5_NoIndex_L005_R2_001.fastq --bol --prefix x --suffix ".fastq"
    The error I am getting is:
    Error: bad barcode value (2:N:0 at barcode file (lane5_NoIndex_L005_R2_001.fastq) line 1
    The reason I think is beacuse of 2:N:0 in the barcode header and 1:N:0 in the reads header.
    I am not sure how to rectify this, please if anyone has any idea could you please help me.

    Thanks!!!!

  • #2
    Use Trimmomatic... much more versatile and mate-pair aware. Or just use the trim function in text manipulation in Galaxy.

    Comment


    • #3
      Why don't you ask your sequencing provider to demultiplex the data for you? It is not more work for them, as it is part of the fastq processing; you just need to provide some kind of sample description, a samplesheet.

      All our customers are quite happy that this work has already been done when they get their data :-)

      Sven

      Comment


      • #4
        BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
        I would do everything with a program where you set the parameters and know what is going in and what should come out.

        Comment


        • #5
          Originally posted by JackieBadger View Post
          Use Trimmomatic... much more versatile and mate-pair aware. Or just use the trim function in text manipulation in Galaxy.
          Thanks jackieBadger,
          I will try it!

          Comment


          • #6
            Originally posted by JackieBadger View Post
            BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
            I would do everything with a program where you set the parameters and know what is going in and what should come out.
            Thanks JackieBadger,
            I am planning to use QIME, so hopefully I will not encounter such issues.



            Thanks for the help!!!

            Comment


            • #7
              Originally posted by sklages View Post
              Why don't you ask your sequencing provider to demultiplex the data for you? It is not more work for them, as it is part of the fastq processing; you just need to provide some kind of sample description, a samplesheet.

              All our customers are quite happy that this work has already been done when they get their data :-)

              Sven
              Thanks Seven,
              Ii would be good if they demultiplex the data before sending, but in my case it is not.

              Comment


              • #8
                Originally posted by JackieBadger View Post
                BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
                I would do everything with a program where you set the parameters and know what is going in and what should come out.
                hhm, it's pretty easy to check what the provider does if you also have the "Undetermined_indices" data files. MiSeq is another thing ... the trimming issue is known and should not be used (currently). You could also ask for some (demultiplexing) stats, to see if the results are "good" or as expected.

                If you don't trust in your sequence provider at all, you should look for another one ;-)

                What "significant errors" did you encounter in the MiSeq demultiplexing?
                We are not plexing Miseq libs, so I am just curious :-)

                Sven

                Comment


                • #9
                  Are you sure they did an index read? The title of your files say No_Index. The only time we ever get data titled like this is if a index read wasn't done.

                  Comment


                  • #10
                    Originally posted by NextGenSeq View Post
                    Are you sure they did an index read? The title of your files say No_Index. The only time we ever get data titled like this is if a index read wasn't done.
                    No, you'll always get that naming when there is not index specified in the samplesheet for that run (irrespective if there was run an index read).
                    You cannot safely deduce from the naming wether there has been run an index read or not (at least for the "_NoIndex_" case)..

                    Sven

                    Comment


                    • #11
                      Originally posted by sklages View Post
                      No, you'll always get that naming when there is not index specified in the samplesheet for that run (irrespective if there was run an index read).
                      You cannot safely deduce from the naming wether there has been run an index read or not (at least for the "_NoIndex_" case)..

                      Sven
                      Let us hope that is the case. If the facility did not run this as a multiplex sample then OP is out of luck. This run will have to be repeated.

                      Comment


                      • #12
                        Originally posted by GenoMax View Post
                        Let us hope that is the case. If the facility did not run this as a multiplex sample then OP is out of luck. This run will have to be repeated.
                        Sure, you are absolutely right.

                        This problem might arise if the customer doesn't mention any indices that need to be demultiplexed in their "order" (however this order looks like), maybe assuming that this is not relevant for the sequencing run itself but for the post-processing only.
                        ... and the sequencing core organizes their FCs with respect to read length and MP/no MP ...

                        We had a similar post a while ago, where the OP has hand-written a little note on the "order sheet" and as a result the sequencing didn't recognize it as "please do an index read, as my libraries have indices" ...

                        Sven

                        Comment


                        • #13
                          Originally posted by newBioinfo View Post
                          Thanks JackieBadger,
                          I am planning to use QIME, so hopefully I will not encounter such issues.



                          Thanks for the help!!!
                          If you are using QIIME, then you will have the option to remove unwanted sequences (such as barcodes) during the split_libraries_fastq.py step. See http://qiime.org/scripts/split_libraries_fastq.html for more information.

                          Comment


                          • #14
                            Originally posted by AKrohn View Post
                            If you are using QIIME, then you will have the option to remove unwanted sequences (such as barcodes) during the split_libraries_fastq.py step. See http://qiime.org/scripts/split_libraries_fastq.html for more information.
                            Oops for this application you want split_libraries.py script instead.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Advancing Precision Medicine for Rare Diseases in Children
                              by seqadmin




                              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                              12-16-2024, 07:57 AM
                            • seqadmin
                              Recent Advances in Sequencing Technologies
                              by seqadmin



                              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                              Long-Read Sequencing
                              Long-read sequencing has seen remarkable advancements,...
                              12-02-2024, 01:49 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 12-17-2024, 10:28 AM
                            0 responses
                            26 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-13-2024, 08:24 AM
                            0 responses
                            42 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-12-2024, 07:41 AM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-11-2024, 07:45 AM
                            0 responses
                            42 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X