Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Demultiplexing Illumina RNASeq paired reads

    Hello everyone,

    BGI normally provides us with demultiplexed reads but this time we received our fastq files before demultiplexed. Can anyone recommended a software to perform the demultiplexing? And also where I can get the fastq files for the Illumina barcodes?

    Thank you very much in advance.

    Bruno

  • #2
    What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

    You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.

    Comment


    • #3
      You should use bcl2fastq from Illumina to demultiplex your data. Download and employ version according to the sequencing instrument used to obtain the data.

      Comment


      • #4
        Originally posted by GenoMax View Post
        What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

        You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.
        This is HiSeq (2000 I believe but need to double check) and I do see barcodes on the Fastq ID. Does that mean that effectively the data has been demultiplexed just needs to be split?
        Here is the head of one of the files:
        head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_1.fq
        Code:
        @FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/1
        NCCCAAACGCGCGTGACTTCACAATAATTAGCCCGTACCTGCTGGTTACGTGGCGGCACCGTGTACAATACCCTAGGCATCAGGGTTAGGCATGGTTACT
        +
        BP\ceeeegggggghiiiiiiiiihiiiiihiiiiiiiiiiiiiifgggggeeeccaccaccaacdcccccbccccbccccccbc[`accccccc`bccc
        @FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/1
        NCCCACCAAAACCGGAAAATGCAGGCCCTGTCGTCTCGCGTGAACATCGCGGCCAAGCCCCAGCGCGCTCAGCGCCTGGTGGTCCGCGCCGAGGAGGTTA
        +
        BP\ccecegggggiihhhiegghhhhihihgiihhiiihighfhiihfggecaacca_acccccZ]]]aaXb]]aX]ac]^_]bccccccc]_a___QW`
        @FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/1
        NAACCAGGCGAACGGTTGGCGTCGGGATTCGGGACGCAAGCATGGCGCTGACCAGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCCGAAGCT
        head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_2.fq
        Code:
        @FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/2
        CTCCGGTGTCAAGTAACCATGCCTAACCCTGATGCCTAGGGTATTGTACACGGTGCCGCCACGTAACCAGCAGGTACGGGCTAATTATTGTGAAGTCACG
        +
        _bbeeecegggggihiiiiiiiiiiiiiiiiiiiiiihhiicffhhhhhighieghhhhiggeeeecddccccccccccccccbbcdddcdcbdbbbbcc
        @FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/2
        CGGGGCGCAGGATCTTCACCAGCGAGCCGCGCTTGGGGCCGACCTCCTTCTTGGGGGCAGCCTTAACCTCCTCGGCGCGGACCACCAGGCGCTGAGCGCG
        +
        ab_ceeeef`geghhiiihhiiihiihhiigeeca`accccccccccccc]bbcacW[acccccbbccccccb__cccaaccc^aa[[_`accca^baac
        @FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/2
        CCTGGTCAGCGCCATGCTTGCGTCCCGAATCCCGACGCCAACCGTTCGCCTGGTTCAGATCGGAAGAGCGTCGTGTAGGGA
        Last edited by Bacms; 01-13-2015, 08:54 AM.

        Comment


        • #5
          The reads in the fastq file have the same barcode, which should have been demultiplexed.

          Comment


          • #6
            @Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

            One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?

            Comment


            • #7
              Originally posted by GenoMax View Post
              @Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

              One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?
              This is the only data we got from BGI. They normally do the demultiplexing but this was at the end of the agreement between BGI and our University and apparently demultiplexing was not included on the cost of the contract even if they had been doing for a year. I wrote a quick python script just to look for the barcode sequence on the ID (perfect matching) and the diversity of barcodes in the sample is ridiculous including some other barcodes that Illumina provides but we did not use so I am suspecting a bit of cross contamination with someone else samples going on. Need to pull the sequences and see what they match to.

              The main question is whether I also need to cut the barcode sequence from the sequence itself or not?

              Comment


              • #8
                You will only get barcodes in the reads fot those reads where the insert is short and you read into the Illumina adapter, and all the way through the first part of the adapter into the barcode.

                If you trim your reads with something like Trimmomatic, the barcodes will be removed when Illumina adapter sequences are removed.

                As for having a lot of different barcodes in the file, I think that as well as perfect matches to the barcode, the demultiplexing usually allows for a one-base mismatch to the barcode sequence, and at the end you are usually left with a small number of reads that don't match to any of the barcodes because they have too many sequencing erors.

                Comment


                • #9
                  Originally posted by Bacms View Post
                  The main question is whether I also need to cut the barcode sequence from the sequence itself or not?
                  In illumina sequencing barcode sequence is *never* part of the actual read (when the reads are pre-processed, which your reads appear to be). Did you get files with generic names like (lane1_undetermined*)? What you could have is adapter contamination in reads. That can be taken care of by an appropriate trimming program.

                  If you have written a python script to enumerate tags then separate the reads (4 lines per) into separate files. Remember to maintain the order of R1/R2 in the two files to not get reads out of order.

                  Note: If you have "not expected" barcodes present (after allowing for one error as Mastal pointed out) there may be some other issue going on.
                  Last edited by GenoMax; 01-14-2015, 09:47 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X