Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I suppose you could do two things. Remove the 1st base (if it is always N, which is kind of odd, see below) from all reads and remove the first C from your barcode file.

    Hypothesis: Reason that first base is an N is because every sequence in this case will actually start with C (and then have GT). I am surprised that this worked for 2nd base onwards. Having low nucleotide diversity like this is not recommended.

    Comment


    • #17
      Thanks ... but how can I do so..any quick way ? Do you suggest to use fastx_trimmer?

      Comment


      • #18
        Can you try replacing the first C with an N in your barcode file and see if fastx-splitter would accept that as a valid pattern and do demultiplexing?

        If that does not work then you could use bbduk from bbmap suite (with forcetrimleft=1) option or HEADCROP:1 option for trimmomatic to remove that first base (which is N) from all reads.

        Comment


        • #19
          Here first I tried to remove all N from the read files as you suggested earlier and also remove C from barcode file:

          So now
          Code:
           mitras$ less test_out_R1.fastq 
          
          @M02542:124:000000000-AKFBJ:1:1101:13841:1000 1:N:0:5
          GTACCCAAGGGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACANNCNNGTCGAACGGTAGCNCAGAGAGCTTGCTCTNGGNTGACGAGTGGCGGACGGGNGANTAATGTCTGGGAAACTGCCCGATGGAGGGGGATANCTACTGGANANNGNNGCTAATACCGCATAACGNCGCAAGACCAAAGAGGGNGANNTCAGGGCCTCTTGNCATCGGATGNNCCCAGATGGGATNGGCTTGTAGGTGAGGTAAGNGCTCACGCNGGCGACGATCCCTAGCTTGGNNGNGAGG
          +
          8ABCFGGGGGGGGEEGGGGGGGG<FGGGFFGFGFGFGGEG@FGEEGGCFGGGGG?##:##6:CFFGGGDG<CG#:CCFFGEGGGGFAFG#:<#:BBFF7FFGDGGGGGGGD#8+#+:BFGGGGGGGCFFGDGG<FGGGECCGDEGGGF@#611:D,>>#6##6##66<1CF@7FFFGEGF7E#41=8=EGFFG7*?CF>>#22##2*2;@;8C8CFC<#/2AC=E*:5##/2:CFCG+8**+#*1*1552<+*+0+8D6D4+#1**)**)*#*15/*//7>5:5<.*,*)0)##1#..73
          @M02542:124:000000000-AKFBJ:1:1101:12174:1002 1:N:0:5
          GTAACCAAGGGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACANNCNNGTCGAGGGGCAGCATTTCAGTTTGCTTGCNAANTGGAGATGGCGACCGGCGNACNGGTGAGTAACACGTATCCAACCTGCCGATAACTCNGGGATAGCNTNNCNNAAGAAAGATTGATACCCNATGGTATAATCAGACCGNATGGTCTTATTATTAAANAATTTCGGTNNTCGATGGGGATGNGTTCCATTAGGCAGTTGGTGTGTTAATGNCGCACCAAACCTTCCTGTGANNGNGTTT
          +
          8ACCGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGDGGGGGGGGGFGGGGGGG##:##6:CFGFDEGGGGDGGGFGGGFGGGGGGGG#:C#66=,CFFFGGGG@FGEE7#++#:BBFFGGGFCFGGGGGGCGDGGGFGGGGGGGGC=#8@<<<FGG#5##8##86DCF<FCCC:BFCFFF#6>F>FGG92;@CFFGF@#116*=CF<CG?@CFFFG#3;5375:CG##212**<5C5/::#11:91A>+<>C6CE<FC:*****0:FB<#1*)//75<F30762*-2)**##1#0)0.
          And the barcode file:
          #SampleID BarcodeSequence
          AP1E GTAACCA
          AP25E GTACCCA
          AP5D GTAAGAA
          AP8C GTAGATA
          P29F GTAGGCT
          P30N GTATTCA
          P31B GTCAAGA
          P35C GTATTTC
          V2A GTCCAGG
          V3J GTCACAG
          Still I am having the same problem :
          N85567:testdata mitras$ cat test_out_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile_new.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 0
          Barcode Count Location
          unmatched 1000 /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/unmatched
          total 1000
          Last edited by GenoMax; 01-25-2016, 09:50 AM.

          Comment


          • #20
            And also the same is happening if I replace the first C with an N in barcode file
            #SampleID BarcodeSequence
            AP1E NGTAACCA
            AP25E NGTACCCA
            AP5D NGTAAGAA
            AP8C NGTAGATA
            P29F NGTAGGCT
            P30N NGTATTCA
            P31B NGTCAAGA
            P35C NGTATTTC
            V2A NGTCCAGG
            V3J NGTCACAG
            N85567:testdata mitras$ cat test_out_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile_new.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 0
            Barcode Count Location
            unmatched 1000 /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/unmatched
            total 1000

            Comment


            • #21
              Is your barcode file tab separated format? Hard to tell from the post.

              Comment


              • #22
                Yes it is tab separated ... I just copied and pasted here...thus looks like so

                Comment


                • #23
                  Originally posted by smitra View Post
                  And also the same is happening if I replace the first C with an N in barcode file
                  The N idea does not work but otherwise I am able to split your example reads into two files. What version of fastx_toolkit are you using?

                  Comment


                  • #24
                    Barcode file with 2 sequences you posted earlier.
                    Code:
                    #SampleID       BarcodeSequence
                    tse1    GTACCCAA
                    tse2    GTAACCAA
                    Command I used

                    Code:
                    $ cat test.fq | fastx_barcode_splitter.pl --bcfile bar --prefix /path_to/ --bol --mismatches 0
                    Barcode Count   Location
                    tse1    1       /path_to/tse1
                    tse2    1       /path_to/tse2
                    unmatched       0       /path_to/unmatched

                    Comment


                    • #25
                      hmm looks good. May be I will re-create the barcode file again.
                      My version fastx_toolkit-0.0.14
                      Thanks for helping

                      Comment


                      • #26
                        If you made the barcode file on a PC/Mac it may have some additional (invisible) characters. Use the dos2unix (or dos2unix -c for OSX files) utility to remove those characters or just create the file on the server.

                        Comment


                        • #27
                          Yes re-creating the barcode files works..there must have some problem with tab...
                          Now I got more realistic result
                          N85567:testdata mitras$ cat test_out_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile_new.txt --prefix /Volumes/Promise\ Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 0
                          Barcode Count Location
                          AP1E 63 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/AP1E
                          AP25E 57 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/AP25E
                          AP5D 39 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/AP5D
                          AP8C 27 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/AP8C
                          P29F 40 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/P29F
                          P30N 40 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/P30N
                          P31B 38 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/P31B
                          P35C 57 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/P35C
                          V2A 37 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/V2A
                          V3J 25 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/V3J
                          unmatched 577 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/unmatched
                          Thank you so very much

                          Comment


                          • #28
                            Now as I am able to match files with the barcode splitter, I tried different combinations.... as with keeping N in the seq and with mismatch --0 works also.
                            But the problem is only half of my reads match.
                            N85567:testdata mitras$ cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Promise\ Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/ --suffix "_R1.fastq" --bol --mismatches 0
                            Barcode Count Location
                            AP1E 61 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/AP1E_R1.fastq
                            AP25E 55 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/AP25E_R1.fastq
                            AP5D 37 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/AP5D_R1.fastq
                            AP8C 27 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/AP8C_R1.fastq
                            P29F 40 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/P29F_R1.fastq
                            P30N 40 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/P30N_R1.fastq
                            P31B 35 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/P31B_R1.fastq
                            P35C 55 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/P35C_R1.fastq
                            V2A 36 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/V2A_R1.fastq
                            V3J 25 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/V3J_R1.fastq
                            unmatched 589 /Volumes/Promise Pegasus/IFR/Cristina/Mr.DNA_2016/fastq_files/testdata/unmatched_R1.fastq
                            total 1000
                            So having 589/1000 reads unmatched is not a good option.
                            When I checked few reads they looks fine
                            there is no reason for the first few lines not to be matched (barcode file is copied again bellow and bold letter for the barcode that should be matched in first two lines).
                            Code:
                            N85567:unmatched_try mitras$ less unmatched.fastq 
                            
                            @M02542:124:000000000-AKFBJ:1:1101:13841:1000 1:N:0:5
                            NGTACCCAAGGGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACANNCNNGTCGAACGGTAGCNCAGAGAGCTTGCTCTNGGNTGACGAGTGGCGGACGGGNGANTAATGTCTGGGAAACTGCCCGATGGAGGGGGATANCTACTGGANANNGNNGCTAATACCGCATAACGNCGCAAGACCAAAGAGGGNGANNTCAGGGCCTCTTGNCATCGGATGNNCCCAGATGGGATNGGCTTGTAGGTGAGGTAAGNGCTCACGCNGGCGACGATCCCTAGCTTGGNNGNGAGG
                            +
                            #8ABCFGGGGGGGGEEGGGGGGGG<FGGGFFGFGFGFGGEG@FGEEGGCFGGGGG?##:##6:CFFGGGDG<CG#:CCFFGEGGGGFAFG#:<#:BBFF7FFGDGGGGGGGD#8+#+:BFGGGGGGGCFFGDGG<FGGGECCGDEGGGF@#611:D,>>#6##6##66<1CF@7FFFGEGF7E#41=8=EGFFG7*?CF>>#22##2*2;@;8C8CFC<#/2AC=E*:5##/2:CFCG+8**+#*1*1552<+*+0+8D6D4+#1**)**)*#*15/*//7>5:5<.*,*)0)##1#..73
                            @M02542:124:000000000-AKFBJ:1:1101:12174:1002 1:N:0:5
                            NGTAACCAAGGGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACANNCNNGTCGAGGGGCAGCATTTCAGTTTGCTTGCNAANTGGAGATGGCGACCGGCGNACNGGTGAGTAACACGTATCCAACCTGCCGATAACTCNGGGATAGCNTNNCNNAAGAAAGATTGATACCCNATGGTATAATCAGACCGNATGGTCTTATTATTAAANAATTTCGGTNNTCGATGGGGATGNGTTCCATTAGGCAGTTGGTGTGTTAATGNCGCACCAAACCTTCCTGTGANNGNGTTT
                            +
                            #8ACCGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGDGGGGGGGGGFGGGGGGG##:##6:CFGFDEGGGGDGGGFGGGFGGGGGGGG#:C#66=,CFFFGGGG@FGEE7#++#:BBFFGGGFCFGGGGGGCGDGGGFGGGGGGGGC=#8@<<<FGG#5##8##86DCF<FCCC:BFCFFF#6>F>FGG92;@CFFGF@#116*=CF<CG?@CFFFG#3;5375:CG##212**<5C5/::#11:91A>+<>C6CE<FC:*****0:FB<#1*)//75<F30762*-2)**##1#0)0.
                            #SampleID BarcodeSequence
                            AP1E CGTAACCA
                            AP25E CGTACCCA
                            AP5D CGTAAGAA
                            AP8C CGTAGATA
                            P29F CGTAGGCT
                            P30N CGTATTCA
                            P31B CGTCAAGA
                            P35C CGTATTTC
                            V2A CGTCCAGG
                            V3J CGTCACAG
                            I don’t have any idea why this is getting so messy. And getting half of the reads matched is not a good idea. So just still keeping this post alive for any further help. Thanks a lot. smitra
                            Last edited by GenoMax; 01-26-2016, 06:44 AM. Reason: added CODE tags to improve readability

                            Comment


                            • #29
                              Getting rid of the first base from all reads (hopefully that is consistently N, have you spot checked?) and then removing the first C from your barcode file may be the way to go.

                              That said the example reads you posted above have a lot a N' in the middle of the reads (which is a bad sign, likely indicative of low nucleotide diversity during those cycles). Are you going to be able to use this data if it were to get demultiplexed?

                              If these are amplicons and such you should look into primer schemes that stagger the start by adding random bases so as to overcome the low nucleotide diversity issue.

                              Comment


                              • #30
                                Dear GenoMax,
                                Thanks. Yes I have tried getting rid of the first base from all reads (yes consistently N) and also keeping them as it is. But both the cases my matching success rate is about 50%. Yes I know few first base pairs have more N in the middle but that will be discarded later in QC protocol. Thus at the first stage I am trying to get as much as read i can for each sample. Do you think I should try allowing more mismatch or partial option. As with the first lines I copies it still should be matched.
                                Getting really confused with this function.
                                Thanks,
                                mitra

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X