Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting NuGen barcodes from paired-end sequences

    Hi all,

    Does anybody know a software to split Nugen barcodes that supports PAIRED-END reads?

    Thanks,

    Ester

  • #2
    Hi Ester,
    I wrote a small program for that issue.

    The tool filters the reads by searching for the barcode only in the first read. If found the barcode is removed and written to the output. Note that the input files must be in order.
    Small example:

    java -Xmx4g -jar DemultiplexNUGEN.jar -i1 laneX_1.fastq laneY_1.fastq ... -i2 laneX_2.fastq laneY_2.fastq ... -b ATTG -o1 ATTG_demulitplex_1.fastq -o2 ATTG_demultiplex_2.fastq -s

    Hope I could help. Please keep me informed if it works.

    Alex
    Last edited by axgraf; 08-10-2011, 06:18 AM.

    Comment


    • #3
      Splitting NuGen barcodes from paired-end sequences

      Hi Alex,

      Thanks for your help.

      I tried to run your program with the following command:

      java -Xmx4g -jar DemultiplexNUGEN.jar -i1 s_7_1_sequence.txt -i2 s_7_2_sequence.txt -b ACCC -o1 test.1
      -o2 test.2 -s

      and got the following error:

      de.genzentrum.lafuga.NotFastqFormatException: Read1 has not the same identifier as read2
      at de.genzentrum.lafuga.trimmer.Demultiplex.iterateFastqPairedEnd(Demultiplex.java:96)
      at de.genzentrum.lafuga.main.MainPairedEnd.main(MainPairedEnd.java:70)

      The input files:

      >head s_7_1_sequence.txt
      @HWI-ST611_0176:7:1:1226:2054#0/1
      NGTACTCGTCCACGTCGTTCTCAGAGAGAATATTCTCTCTCCACACATCAGCAGTTAAGGAGGATGTGAAGACAATCTTTTCAACACTATCGGTCTGAGC
      +HWI-ST611_0176:7:1:1226:2054#0/1
      BYWYW[ZZZZcccccc_cccccccccccccc_ccccccccccccccccc\ccc_ccc\cccc_\cccccVccac______YUcUc\^^^\^^XZ^[X\\\
      @HWI-ST611_0176:7:1:1161:2111#0/1
      GAGTAGGCCACGCNTTCACGGTTCGTATTCGTGCTGGAAATCAGAATCAAACGAGCTTTTACCCTTTTGTTCCACACGAGATTTCTGTTCTCGTTGAGCT
      +HWI-ST611_0176:7:1:1161:2111#0/1
      gggeggggggcccBccccccggggfdgeggdbdddgfgfgdgggggeefgegeggbeegedea[gfedaagZeed]]bb`eedfegXgggabaddYaeca
      @HWI-ST611_0176:7:1:1197:2111#0/1
      GAGCCGCCCGCTCTCTGCTTTCCAAGCCTTTGCGATCTGCTTAAGCAGCTTTGACACCAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTC
      arkady Melon_2011/data> head s_7_2_sequence.txt
      @HWI-ST611_0176:7:1:1226:2054#0/2
      CAAATGGTGGATTTGGAGGTTAGAGGAACAATTAATGTCGTCGAGGCTTGTGCTCAGACCGATAGTGTTGAAAAGATTGTCTTCACATCCTCCTTAACTG
      +HWI-ST611_0176:7:1:1226:2054#0/2
      gggggggegggggggggggggggdgggggggggggggggggggggggge^cd`cddfeeffbe`d`dddd]eee_XddacaddW[aca`cadcbeMdcbT
      @HWI-ST611_0176:7:1:1161:2111#0/2
      GGTGGGCCGATCCGGGCGGAAGACATTGTCAGGTGGGGAGTTTGGCTGGGGGCGGCACATCTGTTAAAAGATAACGCAGGTGTTCTAAGATGAGCTCAAC
      +HWI-ST611_0176:7:1:1161:2111#0/2
      fhdgbgggddfefffegfggfbggggddegeea^eedd^deeebecee^cadUXd\TV]`a[]bdfeeda\VadaabcdcK^V\E]U[TY]Ybbbdb[d\
      @HWI-ST611_0176:7:1:1197:2111#0/2
      GGTGTCAAAGCTGCTTAAGCAGATCGCAAAGGCTTGGAAAGCAGAGAGCGGGCGGCTCAGATCGGAAGGGCGTCGTGTAGGGAAAGAGGGGAGATTTCGG


      Can you help with this?

      Thanks again,
      Ester

      Comment


      • #4
        Hi Ester,
        The tool compared the identifier of the reads and stopped because the names weren't the same.
        I missed the fact, that paired-end reads could have
        a "/1" and "/2" at the end of the identifier, which aren't present in our reads.

        I changed the code, so that it should work for your files.

        Alex
        Attached Files

        Comment


        • #5
          Hi Alex,

          Still having problems:


          java.lang.NullPointerException
          at java.io.File.<init>(Unknown Source)
          at de.genzentrum.lafuga.trimmer.Demultiplex.iterateFastqPairedEnd(Demultiplex.java:74)
          at de.genzentrum.lafuga.main.MainPairedEnd.main(MainPairedEnd.java:70)


          Thanks,

          Ester

          Comment


          • #6
            Have you used the same parameter as in the last post?
            It seems to me, that the -o2 switch was not set.

            If I use the same parameter and the same sequences as in the last post, I can run it successfully.

            If you copy the parameter out of your last post, the "-o2 test.2 -s"
            line is missing.

            That could have caused the file not found exception.

            Otherwise I need the exact parameter which you have used.

            Alex

            Comment


            • #7
              Hi Alex,

              You are right. It was my mistake.
              The program run but the output file is missing the read name after the +:

              arkady Melon_2011/data> more test.1
              @HWI-ST611_0176:7:1:2764:2469#0/1
              AGGAGTCCGGTATTGTTATTTATTGTCACTGCCTCCCCGTGTCAGGATTGGGTAGATCGGAAGAGCGGTTCTGCAGGAATGCCGAGACCGATACCG
              +
              gggfggggggdggggggggggggggggegeTedcdeggdfgccZegada`ecXabZX_``\`bMYY`aM^\ZX[S^dabXbBBBBBBBBBBBBBBB
              @HWI-ST611_0176:7:1:5412:2350#0/1
              CCGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATC
              +
              gggfg_gegggggegggdggfggggegggggeggaggd\eefcdbdd[edd`ddeX\\aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

              Thanks again,

              Ester

              Comment


              • #8
                You can also use novobarcode, part of Novoalign, to split reads in "buckets" based on barcodes.
                --------------------------------------
                Elia Stupka
                Co-Director and Head of Unit
                Center for Translational Genomics and Bioinformatics
                San Raffaele Scientific Institute
                Via Olgettina 58
                20132 Milano
                Italy
                ---------------------------------------

                Comment


                • #9
                  You are right.
                  Sorry for that. This tools was used up to now only here at our institute.
                  I changed it.
                  Hope everything is fine now.

                  Alex
                  Attached Files

                  Comment


                  • #10
                    Now it works fine.
                    Thanks a lot,
                    Ester

                    Comment


                    • #11
                      Originally posted by axgraf View Post
                      Hi Ester,
                      The tool compared the identifier of the reads and stopped because the names weren't the same.
                      I missed the fact, that paired-end reads could have
                      a "/1" and "/2" at the end of the identifier, which aren't present in our reads.

                      I changed the code, so that it should work for your files.

                      Alex
                      Dear Alex,
                      we met the similar problem, and our input format is for CASAVA 1.8, a little bit different with the former one (the position of "1" and "2")
                      our input are as follows:
                      @HWUSI-EAS174:6:FC:1:1:1153:945 1:Y:0:
                      GGGAGGTCGAGGCTGTAGTGAGCTGGGATCGTACCATTTCTCTCATTACGAGATCGGAAGAGCGTGGTGTTGGGACTGAGTGTAGATCTCGGTGGGCGGC
                      +
                      25+=70.6;1@@;,;A?=?:19)7;*+++5+?=;+.7;<)3>61*?=;:=BD?B@?222=?8+BB###################################
                      @HWUSI-EAS174:6:FC:1:1:1288:931 1:Y:0:
                      GAGGTCGGCTTGGAGTCAGAAAGCTCGGGGCATTGTCTCAGGTCTGTTGCTTCCTAGGAGTGTGAACGATGAGGAAGTTCCTGCATCGCTGAGGACTCAG
                      +
                      ?+@=6;2;@==B;54;=;=+:785+--/77B?B?D#################################################################
                      @HWUSI-EAS174:6:FC:1:1:1305:938 1:Y:0:
                      GGGTTCGCTCGGTGAACTGCACGCCCTTTGAAATGTCTCCTCTCGATTTGGGTGTTTTACTTGATTTTTCTTATATCTTACATCTTTTCTTTAGTCTGTC
                      +
                      ####################################################################################################
                      @HWUSI-EAS174:6:FC:1:1:1528:951 1:Y:0:
                      CGCAAGGACAAAAAACCAAATACTGCATGTTCTCAATCATAGGTGGGAATTGAACAATGAGAACACAGGGACACAGGAACACTCAGATCGGAAGAGCGTC
                      +
                      IIIHIHBBIGIIIIIIIIHHBIHIIIIEBIGIIIIDGGBGGGGDGGADGEIIIIGDGEGIHFGHI<IHDGE@HHBIFF@FIFBHHIEG@@HEDDEE>B>3
                      @HWUSI-EAS174:6:FC:1:1:1551:943 1:Y:0:
                      GGAGGCTGCTTTTAGGCCTACTATGGGTGTTAAATTTTTTACTCTCTCAAACACCGGGCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCG
                      +
                      ED=D4FEEE?8BB@B4FBFEE4BFD/:0:4B?;8B*45402921;+86=4CCE?EDDB+DA<ACAD@<GB<0><6?>:4??C>1?###############
                      @HWUSI-EAS174:6:FC:1:1:1588:935 1:Y:0:
                      CCGTGATAGTTTTTAGGTGTTAGACACCCCACCTTAAGCTTGTACCTGAAAGCTTTATCTCGTTATAAATAATTCACTGTAATTTAGGGGAGGTATGTCC
                      +
                      2+85::1:77)::1:=+9=@@32,@=3<;99@@@F=@4B8B?7C:B?CAB=??8E734282B==77241@##############################

                      Thus when I run the java, it still shows "Read 1 has not the same identifier as read 2", would you pls help me solve that?

                      Thanks so much

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-27-2024, 06:37 PM
                      0 responses
                      15 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-27-2024, 06:07 PM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      70 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X