Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Bacms
    Junior Member
    • Aug 2014
    • 6

    Demultiplexing Illumina RNASeq paired reads

    Hello everyone,

    BGI normally provides us with demultiplexed reads but this time we received our fastq files before demultiplexed. Can anyone recommended a software to perform the demultiplexing? And also where I can get the fastq files for the Illumina barcodes?

    Thank you very much in advance.

    Bruno
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

    You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.

    Comment

    • gmarco
      Member
      • Oct 2012
      • 36

      #3
      You should use bcl2fastq from Illumina to demultiplex your data. Download and employ version according to the sequencing instrument used to obtain the data.

      Comment

      • Bacms
        Junior Member
        • Aug 2014
        • 6

        #4
        Originally posted by GenoMax View Post
        What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

        You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.
        This is HiSeq (2000 I believe but need to double check) and I do see barcodes on the Fastq ID. Does that mean that effectively the data has been demultiplexed just needs to be split?
        Here is the head of one of the files:
        head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_1.fq
        Code:
        @FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/1
        NCCCAAACGCGCGTGACTTCACAATAATTAGCCCGTACCTGCTGGTTACGTGGCGGCACCGTGTACAATACCCTAGGCATCAGGGTTAGGCATGGTTACT
        +
        BP\ceeeegggggghiiiiiiiiihiiiiihiiiiiiiiiiiiiifgggggeeeccaccaccaacdcccccbccccbccccccbc[`accccccc`bccc
        @FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/1
        NCCCACCAAAACCGGAAAATGCAGGCCCTGTCGTCTCGCGTGAACATCGCGGCCAAGCCCCAGCGCGCTCAGCGCCTGGTGGTCCGCGCCGAGGAGGTTA
        +
        BP\ccecegggggiihhhiegghhhhihihgiihhiiihighfhiihfggecaacca_acccccZ]]]aaXb]]aX]ac]^_]bccccccc]_a___QW`
        @FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/1
        NAACCAGGCGAACGGTTGGCGTCGGGATTCGGGACGCAAGCATGGCGCTGACCAGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCCGAAGCT
        head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_2.fq
        Code:
        @FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/2
        CTCCGGTGTCAAGTAACCATGCCTAACCCTGATGCCTAGGGTATTGTACACGGTGCCGCCACGTAACCAGCAGGTACGGGCTAATTATTGTGAAGTCACG
        +
        _bbeeecegggggihiiiiiiiiiiiiiiiiiiiiiihhiicffhhhhhighieghhhhiggeeeecddccccccccccccccbbcdddcdcbdbbbbcc
        @FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/2
        CGGGGCGCAGGATCTTCACCAGCGAGCCGCGCTTGGGGCCGACCTCCTTCTTGGGGGCAGCCTTAACCTCCTCGGCGCGGACCACCAGGCGCTGAGCGCG
        +
        ab_ceeeef`geghhiiihhiiihiihhiigeeca`accccccccccccc]bbcacW[acccccbbccccccb__cccaaccc^aa[[_`accca^baac
        @FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/2
        CCTGGTCAGCGCCATGCTTGCGTCCCGAATCCCGACGCCAACCGTTCGCCTGGTTCAGATCGGAAGAGCGTCGTGTAGGGA
        Last edited by Bacms; 01-13-2015, 08:54 AM.

        Comment

        • dolphing
          Junior Member
          • Dec 2010
          • 3

          #5
          The reads in the fastq file have the same barcode, which should have been demultiplexed.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            @Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

            One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?

            Comment

            • Bacms
              Junior Member
              • Aug 2014
              • 6

              #7
              Originally posted by GenoMax View Post
              @Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

              One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?
              This is the only data we got from BGI. They normally do the demultiplexing but this was at the end of the agreement between BGI and our University and apparently demultiplexing was not included on the cost of the contract even if they had been doing for a year. I wrote a quick python script just to look for the barcode sequence on the ID (perfect matching) and the diversity of barcodes in the sample is ridiculous including some other barcodes that Illumina provides but we did not use so I am suspecting a bit of cross contamination with someone else samples going on. Need to pull the sequences and see what they match to.

              The main question is whether I also need to cut the barcode sequence from the sequence itself or not?

              Comment

              • mastal
                Senior Member
                • Mar 2009
                • 666

                #8
                You will only get barcodes in the reads fot those reads where the insert is short and you read into the Illumina adapter, and all the way through the first part of the adapter into the barcode.

                If you trim your reads with something like Trimmomatic, the barcodes will be removed when Illumina adapter sequences are removed.

                As for having a lot of different barcodes in the file, I think that as well as perfect matches to the barcode, the demultiplexing usually allows for a one-base mismatch to the barcode sequence, and at the end you are usually left with a small number of reads that don't match to any of the barcodes because they have too many sequencing erors.

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Originally posted by Bacms View Post
                  The main question is whether I also need to cut the barcode sequence from the sequence itself or not?
                  In illumina sequencing barcode sequence is *never* part of the actual read (when the reads are pre-processed, which your reads appear to be). Did you get files with generic names like (lane1_undetermined*)? What you could have is adapter contamination in reads. That can be taken care of by an appropriate trimming program.

                  If you have written a python script to enumerate tags then separate the reads (4 lines per) into separate files. Remember to maintain the order of R1/R2 in the two files to not get reads out of order.

                  Note: If you have "not expected" barcodes present (after allowing for one error as Mastal pointed out) there may be some other issue going on.
                  Last edited by GenoMax; 01-14-2015, 09:47 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  25 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  42 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  48 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  49 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...