Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Barcodes in HiSEQ FASTQ Files

    Hi,

    For HiSEQ paired end barcoded data, I am splitting FASTQs into multiple
    sample-based FASTQs. I am wondering exactly where are the barcodes stored in the sequence string of the FASTQ files. At the beginning? AT the end? Beginning for first mate and end for the
    second mate?

    I.e., if

    mate_1=ATCGTAGA.....................TTAGACGA
    mate_2=GCATGATG.....................ATCGATAG
    which sub-strings are barcodes?

    Where are the barcodes stored in qseq.txt files?

    Is there a web-page/white paper that explains this clearly?

    Thanks,

  • #2
    Hi wdt,

    I actually have the exact same question, just wondering if you got the answer to your question or if you are still looking?
    If yes can you let me know the answer or where to find an answer, I would be very grateful.
    Thanks a lot,

    Comment


    • #3
      Barcoding can be done in a couple of different ways. If you are using Illumina barcodes then they are generally "read" as a separate sequence read. You will not see this "read" appear in the final sequence data. Illumina CASAVA (pre-processing/de-multiplexing) pipeline takes this third (and fourth if you are using dual indexing) read into consideration when doing the sample de-multiplexing.

      If you are using "inline" or custom barcodes then it will be your responsibility to do the demultiplexing since the barcode sequence will be part of the actual read.

      There is a primer at this link: http://www.umassmed.edu/uploadedFile...Sequencing.pdf

      Comment


      • #4
        Hi GenoMax,

        Thanks a lot for your reply.
        For our RNA-seq experiment, our RNA-seq libraries were prepared using the epicentre (an Illumina company) ScriptSeq™ v2 RNA-Seq Library Preparation Kit and used their ScriptSeq™ Index PCR Primers for barcoding; and then our libraries were sent for paired-end sequencing to the BGI. So from my understanding of your reply, the de-multiplexing of my samples will probably be done by the BGI with Illumina CASAVA, am I right?
        Thanks a lot,

        Comment


        • #5
          Originally posted by Nicolas Nalpas View Post
          Hi GenoMax,

          So from my understanding of your reply, the de-multiplexing of my samples will probably be done by the BGI with Illumina CASAVA, am I right?
          Thanks a lot,
          Correct. The output from a CASAVA run will be files in the following format for standard illumina tags.

          Two files for each sample/tag combination (provided they concatenate the results into one single large file for each read).

          SampleID_TAGSEQ_L00#_R1_001.fastq.gz (read 1)
          SampleID_TAGSEQ_L00#_R2_001.fastq.gz (read 2)

          SampleID - you provided
          TAGSEQ - sequence of tag
          L00# - would be the lane number on the flowcell.

          Comment


          • #6
            Dear GenoMax,
            I want to add barcode file, list of cultivars in command for bowtie2, how can i do?
            Thank you very much,

            Comment


            • #7
              Originally posted by maivantan View Post
              Dear GenoMax,
              I want to add barcode file, list of cultivars in command for bowtie2, how can i do?
              Thank you very much,
              Can you clarify what it is you are trying to do? Are your samples already de-multiplexed (i.e. are they in separate files?)?

              Comment


              • #8
                yes,
                I did 5 cultivar and they are in separate files.
                I also prepare key.txt for 5 barcodes.
                so do i need to add the key.txt file to bowtie2?

                one more question i would like to ask you is after i align using samtools and i found reference is not available (Red color)

                user$ ./bowtie2 -x ~/tan_analysis/rice1 -U ~/tan_analysis/analysis20140111/20140111_A1_PE1.fastq -S wrc20140118.sam
                2398641 reads; of these:
                2398641 (100.00%) were unpaired; of these:
                2094522 (87.32%) aligned 0 times
                228676 (9.53%) aligned exactly 1 time
                75443 (3.15%) aligned >1 times
                12.68% overall alignment rate

                CHROM POS ID REF ALT QUAL FILTER INFO FORMAT wrc20140118.sorted.bam.bam
                chr01 27400 . N G 68 . DP=22;VDB=3.856020e-04;AF1=1;AC1=2;DP4=0,0,0,8;MQ=29;FQ=-51 GT:PL:GQ 1/1:101,24,0:45
                chr01 27401 . N C,G 75 . DP=22;VDB=1.934810e-04;AF1=1;AC1=2;DP4=0,0,0,9;MQ=28;FQ=-51 GT:PL:GQ 1/1:108,24,0,104,10,101:45
                chr

                Please give me your suggestions

                Comment


                • #9
                  Are you trying to do multi-sample SNP calling with samtools?

                  Your alignment rate looks low in the example posted above.

                  Comment


                  • #10
                    yes, i am trying to do multi-sample SNP calling with samtools.

                    i don't know the reason why overall alignment rate very low.

                    Please give your suggestions

                    Comment


                    • #11
                      Is the rice variety you are sequencing very different than what you used as reference? Did you do any QC on your reads (FastQC/Trimming etc) before doing the alignments?

                      If you have already completed your alignments as independent files you can use them as shown in the samtools mpileup example to do SNP calls across multiple files. Is your question about key.txt related to @RG records (if a BAM file contains multiple samples)?

                      Comment


                      • #12
                        the rice varieties that i sequenced is indica, the reference is japonica, i think it is not big different. I did not do any QC before doing the alignments.

                        I did align only one file, so i would like to ask you how i can do all files together in bowtie2.

                        my second question is: do i need to add barcode file on the command in bowtie2

                        Thank you very much

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        29 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X