Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi, you've been so helpful, thanks.

    So, the adapter thing was what I suspected, and I think it's sorted. However, the barcodes are an unexpected problem. When I test the adapter sequences, even when using substrings, there are none in the trimmed reads. But there are thousands of barcodes (5687733).

    The command line for trimmomatic is a mess, but here's what I used:

    java -classpath trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE readsR1.fastq readsR2.fastq forward_paired.fastq forward_unpaired.fastq reverse_paired.fastq reverse_unpaired.fastq CROP:90 MINLEN:90

    I didn't use ILLUMINACLIP because I first used IlluQC from NGSToolkit which was supposed to get rid of that. From what I understand, I need to run trimmomatic again with other settings, but I'm worrying about something. Apparently, cutting the last 10 bases is not enough for 5687733 reads. So, if I want to keep the 90 read length, I'll loose all of these reads after it cuts off the barcodes. Maybe I should use a lower minimum length? It's a bit of a dilemma, because my coverage isn't that great...

    (Here's a really basic question: how do I create a fasta file with the adapter and barcode sequences to use with ILLUMINACLIP? Or is there a file I can download with TruSeq adapters and barcodes?)
    Thanks

    Sandra

    Comment


    • #17
      I'm sorry, I'd forgotten your reply (it was a long time ago!):

      "The latest version of trimmomatic comes with a file containing those adapter sequences, so it should work fine with your files in the ILLUMINACLIP step."

      So, I just need to call the TruSeq2-PE or TruSeq3-PE file in the command line? I understand they depend on the machine used, so I'll try to find out which one is better.

      Thanks

      Comment


      • #18
        Removing primers, adaptors, how to know if it's good?

        The TruSeq2 and 3 are different versions of Illumina sample prep kits.
        The adapter sequences that appear in your reads will be the reverse complement of the sequences in the fasta file.

        For de novo assembly it is better to clean the reads. Is there a particular reason why you need all your reads to remain the same length?

        Comment


        • #19
          Originally posted by SS Santos View Post
          Hi, you've been so helpful, thanks.

          So, the adapter thing was what I suspected, and I think it's sorted. However, the barcodes are an unexpected problem. When I test the adapter sequences, even when using substrings, there are none in the trimmed reads. But there are thousands of barcodes (5687733).

          Thanks

          Sandra
          Be careful about interchangeably using terms "adapter" and "barcode".

          Barcodes (in a multiplexed sample) are used to identify individual samples in a mixture. Generally a sequence provider will de-multiplex your samples (if they were indeed multiplexed). As a part of the de-multiplexing process the "barcodes" are identified/sorted and inserted into the sequence ID by the illumina pipeline software. They are also "read" as a separate read in Illumina sequencing.

          An example barcode (after de-multiplexing has beeon done) is identified in "red" below (taken from wikipedia article on FASTQ format).

          @HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
          TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTNNNNNNNNNNTAGTTTCTTGAGATTTGTTGGGGGAGACATTTTTGTGATTGCCTTGAT
          +HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
          efcfffffcfeefffcffffffddf`feed]`]_Ba_^__[YBBBBBBBBBBRTT\]][]dddd`ddd^dddadd^BBBBBBBBBBBBBBBBBBBBBBBB

          Comment


          • #20
            Hey,

            Just reading thourgh this thread. A few have sugessted that you might be reading in to the index/adapter sequence on the 3' end library. Becuase you have perfomed size selection (500bp) and a 2x101bp run, it would be unlikely to have read in to the adapter sequence if your size-slection was accurate....your library insert-size should be approx 370bp (500bp-130bp adapter seq).

            Have you determined the mean insert size of your library? That's my 2 cents.

            Comment


            • #21
              Originally posted by SS Santos View Post
              Or is there a file I can download with TruSeq adapters and barcodes?)


              Sandra
              You will find the sequences in this thread: http://seqanswers.com/forums/showthr...p?t=198&page=6 Post#113

              Comment


              • #22
                Hi

                GenoMax: you are right, I do have the barcode sequence in the first line of each read as you show in red, I just didn't know that it means it's not included in the read itself. If that's true it's good news.

                Maria: I used grep -c to look for the 1st adapter and then for the reverse complement of the second one. Should I have looked for the complement of the 1st adapter instead?

                I need the reads with the same length for one of the assemblers I'm testing, it doesn't run otherwise.

                Snorberg: I'm sorry, but how do I determine the mean insert size?

                Comment


                • #23
                  Hi,
                  So, all the barcodes are NOT included in the read sequences but only in their IDs. We only need to trim adapters but not barcodes.

                  Comment


                  • #24
                    Originally posted by zshuhua View Post
                    Hi,
                    So, all the barcodes are NOT included in the read sequences but only in their IDs. We only need to trim adapters but not barcodes.
                    If you used illumina's barcodes then the answer is yes. Please verify that your sequence provider has done the demultiplexing for you (as they should).

                    If you used custom barcodes that are "in-line" with the sequence read then you will need to do the demultiplexing yourself.

                    Illumina barcodes (1D or 2D) are read as a separate read(s) and are not part of the actual sequence.
                    Last edited by GenoMax; 05-09-2013, 08:51 AM.

                    Comment


                    • #25
                      Originally posted by SS Santos View Post
                      Hi

                      GenoMax: you are right, I do have the barcode sequence in the first line of each read as you show in red, I just didn't know that it means it's not included in the read itself. If that's true it's good news.
                      That is correct. See my response to "zshuhua" above.

                      Comment


                      • #26
                        Adapter ILLUMINA CLIP file format

                        Originally posted by SS Santos View Post
                        So they sent me this:

                        Adapters sequence:
                        5' P-GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
                        5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT

                        sample barcode sequence
                        IST4113 TAGCTT
                        IST4129 AGTTCC
                        IST4134 CTTGTA
                        IST439 AGTCAA

                        Do I create a text file with this, how can use it as in input for trimming/filtering tools?

                        Thanks

                        Did you solve this issue ?? I have the same issue... do you create it as fasta format or as a txt file???

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X