Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do illumina fastq file have barcode and adaptor?

    Hi all,
    I saw this thread http://seqanswers.com/forums/showthread.php?t=8176. It said that in illumina fastq file, there is no barcode. But when I see the manual of trimmomatic, why is there a function that can remove the adaptor and barcode?

  • #2
    At times you may have adapter dimers in your reads (no insert or short inserts with adapters at ends of reads) that need to be removed/trimmed. Not sure where you are seeing the barcode removal function.

    Comment


    • #3
      Originally posted by GenoMax View Post
      At times you may have adapter dimers in your reads (no insert or short inserts with adapters at ends of reads) that need to be removed/trimmed. Not sure where you are seeing the barcode removal function.
      thanks for your reply. Sorry, it is adapter removal function. Still have some questions.
      1. What does insert mean? index reads barcodes?
      2. My samples are already de-multiplexed and I didn't find the barcode sequence in my fastq file, so my reads are real reads, right?
      3. I run fastqc for my samples, and for the first 10 bases, the per base sequence does not parallel, it may be contaminated, at first I thought it was adapter or barcode, if my reads are real reads, what other factors may cause the contamination?

      Comment


      • #4
        Originally posted by shangzhong0619 View Post
        thanks for your reply. Sorry, it is adapter removal function. Still have some questions.
        1. What does insert mean? index reads barcodes?
        2. My samples are already de-multiplexed and I didn't find the barcode sequence in my fastq file, so my reads are real reads, right?
        3. I run fastqc for my samples, and for the first 10 bases, the per base sequence does not parallel, it may be contaminated, at first I thought it was adapter or barcode, if my reads are real reads, what other factors may cause the contamination?
        1) Insert is the length of the molecule that was sequenced. If it was shorter than read length, adapters will show up.

        2) They may still contain adapter sequence. A run without a single read containing an adapter would be... unusual.

        3) Sounds like you have adapter sequence in the first part of your read. The main factors that could cause this otherwise are:
        a) Nonrandom fragment cleavage
        b) Some kind of short spike-in sequence or contaminant
        c) Miscalibration, if for example there is no PhiX spike-in.

        It may help if you post the base-frequency histogram or consensus of the abnormal bases.

        Comment


        • #5
          Originally posted by Brian Bushnell View Post
          1) Insert is the length of the molecule that was sequenced. If it was shorter than read length, adapters will show up.

          2) They may still contain adapter sequence. A run without a single read containing an adapter would be... unusual.

          3) Sounds like you have adapter sequence in the first part of your read. The main factors that could cause this otherwise are:
          a) Nonrandom fragment cleavage
          b) Some kind of short spike-in sequence or contaminant
          c) Miscalibration, if for example there is no PhiX spike-in.

          It may help if you post the base-frequency histogram or consensus of the abnormal bases.
          Thank you very much Brian. I don't know how to paste the figure. I post in the attachment. Click image for larger version

Name:	per_base_sequence_content.png
Views:	1
Size:	25.4 KB
ID:	304520.
          According what you said, we should trim the adapters for each sample we analyze? But I only have fastq files, don't know how long adapter is. Is it enough to trim reads based on fastqc results?

          Comment


          • #6
            Based on the FastQC plot this is probably RNA-seq data. It is common to have this pattern at the beginning of the reads and does not indicate any problem. This seems to be due to the 'random' primers which are used in the library generation, which may not be quite as random as one would hope. There are multiple threads on SeqAnswers that discuss this phenomenon.

            See this thread for options on programs to do adapter trimming: http://seqanswers.com/forums/showthread.php?t=40692
            Last edited by GenoMax; 04-06-2014, 01:54 PM.

            Comment


            • #7
              By the way, there actually is a barcode in the ILMN fastq, but its in the header line (note the GCNACT in the read), NOT in the sequence, unless something is special or wrong with the library:

              @HWI-STXXX:XXX:XXXXXXCXX:8:1101:1214:2233 1:N:0:GCNACT
              GCTCTCTGTTTACTCTCTTAATTTTTAAAGAGTTTGTAGTGTTTTATCTTATCTACACAGTGTTGACGTAAGCTTTCGAGATGTCGGATAAGGANNNNNNN
              +
              ?@=+BDD>,==ADAFEEFDHE<FFHIF9D9CB?C991**:?:CDG9D?F>0???4?*?B##########################################

              So if you ever encounter your users cheating on you with multiplexing - you can easily show them which barcodes they really used by taking a look on the undetermined reads:
              zcat lane?_Undetermined_L00?_R1_001.fastq.gz | grep @HWI |cut -d: -f10 | grep -v "N" | sort | uniq -c | sort -r -n -k1

              Comment


              • #8
                Adaptor question again. Thanks

                Hi, I'm using a JAVA based program. here attached is the interface. Could anyone tell me if I should fill CAGATC as the adapter or not. Thanks a lot!
                Attached Files

                Comment


                • #9
                  No, that's just the barcode, and is too short for using to trim. Adapter sequences are normally over 30bp and may contain a barcode.

                  Comment


                  • #10
                    Originally posted by Brian Bushnell View Post
                    No, that's just the barcode, and is too short for using to trim. Adapter sequences are normally over 30bp and may contain a barcode.
                    Thanks a lot! One more question, could u tell me if my data looks like this. Do I still need to trim the adaptor? I'm very new in this field. Is the first 13~15bp random primer? or something else?
                    Attached Files

                    Comment


                    • #11
                      The wiggles at the beginning are probably due to nonrandom fragmentation, but if you have a reference or assembly, I encourage you to map the reads and plot the error rate as a function of position. You can do this with BBMap using the "mhist" option:

                      (this command is for Linux; on Windows it would be different)
                      bbmap.sh in=reads.fq ref=reference.fa mhist=mhist.txt nodisk

                      If the histogram does not show a higher than expected error rate at the beginning of the read, there's no reason to trim. As for adapters, those are on the right end of the read and more common with longer reads. The graph you posted does not show evidence of adapter contamination, but it wouldn't necessarily be noticeable on that graph, anyway; if you know the adapter sequences it wouldn't hurt to trim them. If not, don't worry about it. Though if you have the insert size distribution, it would help to post it. Were these paired or single-ended reads?

                      Comment


                      • #12
                        Originally posted by xuenjun1 View Post
                        Thanks a lot! One more question, could u tell me if my data looks like this. Do I still need to trim the adaptor? I'm very new in this field. Is the first 13~15bp random primer? or something else?

                        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                        Comment


                        • #13
                          Originally posted by Brian Bushnell View Post
                          The wiggles at the beginning are probably due to nonrandom fragmentation, but if you have a reference or assembly, I encourage you to map the reads and plot the error rate as a function of position. You can do this with BBMap using the "mhist" option:

                          (this command is for Linux; on Windows it would be different)
                          bbmap.sh in=reads.fq ref=reference.fa mhist=mhist.txt nodisk

                          If the histogram does not show a higher than expected error rate at the beginning of the read, there's no reason to trim. As for adapters, those are on the right end of the read and more common with longer reads. The graph you posted does not show evidence of adapter contamination, but it wouldn't necessarily be noticeable on that graph, anyway; if you know the adapter sequences it wouldn't hurt to trim them. If not, don't worry about it. Though if you have the insert size distribution, it would help to post it. Were these paired or single-ended reads?
                          Thanks a lot! It is single end, and I just want to align to arabidopsis. I have just got the non-trimmed result, looks ok.

                          Comment


                          • #14
                            Thanks. I have read a few posts, but still not clear. But now I understand.

                            Comment


                            • #15
                              Following applies to MiSeq only (I do not know about the other platforms, please check)

                              The fastq files from the MiSeq do not contain the barcodes after demultiplexing by default. Actually you dont need this information for further analysis once the demultipelxing is done.
                              Having said that, if you really need these sequences for some reason you can force the system to provide you with that information by changing the MiSeqReportor.exe.config file in C:\Illumina\MiSeq Reporter.

                              Just add this tag to the appsettings

                              <add key="CreateFastqForIndexReads" value="1" />

                              and restart the service, and requeue you run for analysis.

                              Hope this helps with your issue.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X