Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to pair mates after de-barcoding by fastx BarCode splitter?

    I have been having some issues doing a paired end analysis simply due to the way I am processing things.

    I currently use custom illumina adaptors that have in-line barcodes i.e. the first 7 nucleotides sequenced in the run (either 1st read or paired end read)

    Briefly, I have designed a customized protocol which ligates a different barcode to each end of an insert. i.e. in a paired end sequence the two barcodes will be different at either end.

    After de-barcoding these split as different files. Even though the files can be combined after debarcoding there are still issues.

    Since the sequence of a barcode may not be distinguished by fastx BC splitter as a case (read) to case (read) basis, it will definitely mess up the order of the reads, such that read1 does not align with its mate in the read2 fastq file.

    I still want to conduct a paired end analysis and was wondering if there is a way of re pairing the reads after de-barcoding.

  • #2
    I wrote a tool for exactly this purpose, repair.sh. It will work as long as you have sufficient memory to store all the reads.

    That said, my normal recommendation is to not use fastx, but I don't know of another tool that replicates this functionality.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      I wrote a tool for exactly this purpose, repair.sh. It will work as long as you have sufficient memory to store all the reads.

      That said, my normal recommendation is to not use fastx, but I don't know of another tool that replicates this functionality.
      Dear Brian,
      I could not get the script to run.
      It gives the following error;
      Error: could not find or load main class jgi.SplitPairsAndsingles

      I downloaded the latest version (v33.89) as well as one version earlier (v33.73b).

      Is the script 'SplitPairsAndSingles' missing, or am I messing up somewhere?

      Comment


      • #4
        That's strange. I just downloaded it myself, and bbmap/current/jgi/SplitPairsAndSingles.class is present. Can you copy your exact command line and the complete error message?

        Also, after extracting, did you move any of the files around? You can execute the shellscripts from wherever you want, but they have to stay in their original location in /bbmap/. In other words, if the shellscripts are here:
        /path/bbmap/*sh

        The Java code must be here:
        /path/bbmap/current/

        It's also possible that you have some stuff in an environment variable that is causing a problem. Try this:

        echo $JAVA_OPTIONS

        ...and see if that prints anything.

        Comment


        • #5
          You may want to try Pairfq for another solution. It works with multi-line FASTA/Q and the files can be compressed (gzip or bzip2). The specific command you want is "pairfq makepairs" and there is more information about the usage on the wiki.

          There is also standalone script called "pairfq_lite.pl" in the "scripts" directory that has no dependencies (except Perl). If you are trying to pair a very large number of reads (>20m), I recommend you follow the full install instructions and run pairfq with the "--index" option, or interleave your reads prior to trimming. Both of these options will drastically reduce the memory usage.

          Comment


          • #6
            Thanks for the help.
            My friend actually wrote a script which hopefully is helpful to others.
            The script determines barcodes that are inline and 5' of the sequence.
            It takes the raw forward read and the mate read fastq files as input.
            It determines the barcode from the read file and puts the read and its mate in order in two files read1 and read2 (according to the barcode). Thus, it preserves the order and de-barcodes simultaneously.

            Barcode file should be tab delimited, eg

            BC1 AGTCGAG
            BC2 GCTGACG
            ... .....


            usage of this perl script is
            perl [Script path]/5'bc_splitter_for_paired_end_sequence.pl -b [Barcode Filepath] -l [Barcode Length] -m [Allowed number of mismatches] -o [output suffix.fastq] -1 [READ1 Filepath] -2 [READ 2 filepath]

            Hope this helps someone.
            Attached Files

            Comment


            • #7
              Originally posted by abyss View Post
              Thanks for the help.
              My friend actually wrote a script which hopefully is helpful to others.
              The script determines barcodes that are inline and 5' of the sequence.
              It takes the raw forward read and the mate read fastq files as input.
              It determines the barcode from the read file and puts the read and its mate in order in two files read1 and read2 (according to the barcode). Thus, it preserves the order and de-barcodes simultaneously.
              Note that this script is for a different task than what was mentioned in the original post. Here, the reads must be in order (and must be 4-line fastq) and the output is the de-barcoded reads. This wouldn't work for the general task of repairing paired-end files, as was mentioned above, but glad you found a solution.
              Last edited by SES; 11-13-2014, 02:13 PM.

              Comment


              • #8
                I agree, the script preserves the order of the read and the mate from the original fastq files while de-barcoding rather than re-order the read and the mate after de-barcoding. But I will definitely take a look into the solution that you sent. Might be useful for other purposes. Thanks for posting.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X