Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Barcodes in HiSEQ FASTQ Files

    Hi,

    For HiSEQ paired end barcoded data, I am splitting FASTQs into multiple
    sample-based FASTQs. I am wondering exactly where are the barcodes stored in the sequence string of the FASTQ files. At the beginning? AT the end? Beginning for first mate and end for the
    second mate?

    I.e., if

    mate_1=ATCGTAGA.....................TTAGACGA
    mate_2=GCATGATG.....................ATCGATAG
    which sub-strings are barcodes?

    Where are the barcodes stored in qseq.txt files?

    Is there a web-page/white paper that explains this clearly?

    Thanks,

  • #2
    Hi wdt,

    I actually have the exact same question, just wondering if you got the answer to your question or if you are still looking?
    If yes can you let me know the answer or where to find an answer, I would be very grateful.
    Thanks a lot,

    Comment


    • #3
      Barcoding can be done in a couple of different ways. If you are using Illumina barcodes then they are generally "read" as a separate sequence read. You will not see this "read" appear in the final sequence data. Illumina CASAVA (pre-processing/de-multiplexing) pipeline takes this third (and fourth if you are using dual indexing) read into consideration when doing the sample de-multiplexing.

      If you are using "inline" or custom barcodes then it will be your responsibility to do the demultiplexing since the barcode sequence will be part of the actual read.

      There is a primer at this link: http://www.umassmed.edu/uploadedFile...Sequencing.pdf

      Comment


      • #4
        Hi GenoMax,

        Thanks a lot for your reply.
        For our RNA-seq experiment, our RNA-seq libraries were prepared using the epicentre (an Illumina company) ScriptSeq™ v2 RNA-Seq Library Preparation Kit and used their ScriptSeq™ Index PCR Primers for barcoding; and then our libraries were sent for paired-end sequencing to the BGI. So from my understanding of your reply, the de-multiplexing of my samples will probably be done by the BGI with Illumina CASAVA, am I right?
        Thanks a lot,

        Comment


        • #5
          Originally posted by Nicolas Nalpas View Post
          Hi GenoMax,

          So from my understanding of your reply, the de-multiplexing of my samples will probably be done by the BGI with Illumina CASAVA, am I right?
          Thanks a lot,
          Correct. The output from a CASAVA run will be files in the following format for standard illumina tags.

          Two files for each sample/tag combination (provided they concatenate the results into one single large file for each read).

          SampleID_TAGSEQ_L00#_R1_001.fastq.gz (read 1)
          SampleID_TAGSEQ_L00#_R2_001.fastq.gz (read 2)

          SampleID - you provided
          TAGSEQ - sequence of tag
          L00# - would be the lane number on the flowcell.

          Comment


          • #6
            Dear GenoMax,
            I want to add barcode file, list of cultivars in command for bowtie2, how can i do?
            Thank you very much,

            Comment


            • #7
              Originally posted by maivantan View Post
              Dear GenoMax,
              I want to add barcode file, list of cultivars in command for bowtie2, how can i do?
              Thank you very much,
              Can you clarify what it is you are trying to do? Are your samples already de-multiplexed (i.e. are they in separate files?)?

              Comment


              • #8
                yes,
                I did 5 cultivar and they are in separate files.
                I also prepare key.txt for 5 barcodes.
                so do i need to add the key.txt file to bowtie2?

                one more question i would like to ask you is after i align using samtools and i found reference is not available (Red color)

                user$ ./bowtie2 -x ~/tan_analysis/rice1 -U ~/tan_analysis/analysis20140111/20140111_A1_PE1.fastq -S wrc20140118.sam
                2398641 reads; of these:
                2398641 (100.00%) were unpaired; of these:
                2094522 (87.32%) aligned 0 times
                228676 (9.53%) aligned exactly 1 time
                75443 (3.15%) aligned >1 times
                12.68% overall alignment rate

                CHROM POS ID REF ALT QUAL FILTER INFO FORMAT wrc20140118.sorted.bam.bam
                chr01 27400 . N G 68 . DP=22;VDB=3.856020e-04;AF1=1;AC1=2;DP4=0,0,0,8;MQ=29;FQ=-51 GT:PL:GQ 1/1:101,24,0:45
                chr01 27401 . N C,G 75 . DP=22;VDB=1.934810e-04;AF1=1;AC1=2;DP4=0,0,0,9;MQ=28;FQ=-51 GT:PL:GQ 1/1:108,24,0,104,10,101:45
                chr

                Please give me your suggestions

                Comment


                • #9
                  Are you trying to do multi-sample SNP calling with samtools?

                  Your alignment rate looks low in the example posted above.

                  Comment


                  • #10
                    yes, i am trying to do multi-sample SNP calling with samtools.

                    i don't know the reason why overall alignment rate very low.

                    Please give your suggestions

                    Comment


                    • #11
                      Is the rice variety you are sequencing very different than what you used as reference? Did you do any QC on your reads (FastQC/Trimming etc) before doing the alignments?

                      If you have already completed your alignments as independent files you can use them as shown in the samtools mpileup example to do SNP calls across multiple files. Is your question about key.txt related to @RG records (if a BAM file contains multiple samples)?

                      Comment


                      • #12
                        the rice varieties that i sequenced is indica, the reference is japonica, i think it is not big different. I did not do any QC before doing the alignments.

                        I did align only one file, so i would like to ask you how i can do all files together in bowtie2.

                        my second question is: do i need to add barcode file on the command in bowtie2

                        Thank you very much

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Advancing Precision Medicine for Rare Diseases in Children
                          by seqadmin




                          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                          12-16-2024, 07:57 AM
                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin



                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has seen remarkable advancements,...
                          12-02-2024, 01:49 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 12-17-2024, 10:28 AM
                        0 responses
                        39 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-13-2024, 08:24 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-12-2024, 07:41 AM
                        0 responses
                        38 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-11-2024, 07:45 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X