Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi, you've been so helpful, thanks.

    So, the adapter thing was what I suspected, and I think it's sorted. However, the barcodes are an unexpected problem. When I test the adapter sequences, even when using substrings, there are none in the trimmed reads. But there are thousands of barcodes (5687733).

    The command line for trimmomatic is a mess, but here's what I used:

    java -classpath trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE readsR1.fastq readsR2.fastq forward_paired.fastq forward_unpaired.fastq reverse_paired.fastq reverse_unpaired.fastq CROP:90 MINLEN:90

    I didn't use ILLUMINACLIP because I first used IlluQC from NGSToolkit which was supposed to get rid of that. From what I understand, I need to run trimmomatic again with other settings, but I'm worrying about something. Apparently, cutting the last 10 bases is not enough for 5687733 reads. So, if I want to keep the 90 read length, I'll loose all of these reads after it cuts off the barcodes. Maybe I should use a lower minimum length? It's a bit of a dilemma, because my coverage isn't that great...

    (Here's a really basic question: how do I create a fasta file with the adapter and barcode sequences to use with ILLUMINACLIP? Or is there a file I can download with TruSeq adapters and barcodes?)
    Thanks

    Sandra

    Comment


    • #17
      I'm sorry, I'd forgotten your reply (it was a long time ago!):

      "The latest version of trimmomatic comes with a file containing those adapter sequences, so it should work fine with your files in the ILLUMINACLIP step."

      So, I just need to call the TruSeq2-PE or TruSeq3-PE file in the command line? I understand they depend on the machine used, so I'll try to find out which one is better.

      Thanks

      Comment


      • #18
        Removing primers, adaptors, how to know if it's good?

        The TruSeq2 and 3 are different versions of Illumina sample prep kits.
        The adapter sequences that appear in your reads will be the reverse complement of the sequences in the fasta file.

        For de novo assembly it is better to clean the reads. Is there a particular reason why you need all your reads to remain the same length?

        Comment


        • #19
          Originally posted by SS Santos View Post
          Hi, you've been so helpful, thanks.

          So, the adapter thing was what I suspected, and I think it's sorted. However, the barcodes are an unexpected problem. When I test the adapter sequences, even when using substrings, there are none in the trimmed reads. But there are thousands of barcodes (5687733).

          Thanks

          Sandra
          Be careful about interchangeably using terms "adapter" and "barcode".

          Barcodes (in a multiplexed sample) are used to identify individual samples in a mixture. Generally a sequence provider will de-multiplex your samples (if they were indeed multiplexed). As a part of the de-multiplexing process the "barcodes" are identified/sorted and inserted into the sequence ID by the illumina pipeline software. They are also "read" as a separate read in Illumina sequencing.

          An example barcode (after de-multiplexing has beeon done) is identified in "red" below (taken from wikipedia article on FASTQ format).

          @HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
          TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTNNNNNNNNNNTAGTTTCTTGAGATTTGTTGGGGGAGACATTTTTGTGATTGCCTTGAT
          +HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
          efcfffffcfeefffcffffffddf`feed]`]_Ba_^__[YBBBBBBBBBBRTT\]][]dddd`ddd^dddadd^BBBBBBBBBBBBBBBBBBBBBBBB

          Comment


          • #20
            Hey,

            Just reading thourgh this thread. A few have sugessted that you might be reading in to the index/adapter sequence on the 3' end library. Becuase you have perfomed size selection (500bp) and a 2x101bp run, it would be unlikely to have read in to the adapter sequence if your size-slection was accurate....your library insert-size should be approx 370bp (500bp-130bp adapter seq).

            Have you determined the mean insert size of your library? That's my 2 cents.

            Comment


            • #21
              Originally posted by SS Santos View Post
              Or is there a file I can download with TruSeq adapters and barcodes?)


              Sandra
              You will find the sequences in this thread: http://seqanswers.com/forums/showthr...p?t=198&page=6 Post#113

              Comment


              • #22
                Hi

                GenoMax: you are right, I do have the barcode sequence in the first line of each read as you show in red, I just didn't know that it means it's not included in the read itself. If that's true it's good news.

                Maria: I used grep -c to look for the 1st adapter and then for the reverse complement of the second one. Should I have looked for the complement of the 1st adapter instead?

                I need the reads with the same length for one of the assemblers I'm testing, it doesn't run otherwise.

                Snorberg: I'm sorry, but how do I determine the mean insert size?

                Comment


                • #23
                  Hi,
                  So, all the barcodes are NOT included in the read sequences but only in their IDs. We only need to trim adapters but not barcodes.

                  Comment


                  • #24
                    Originally posted by zshuhua View Post
                    Hi,
                    So, all the barcodes are NOT included in the read sequences but only in their IDs. We only need to trim adapters but not barcodes.
                    If you used illumina's barcodes then the answer is yes. Please verify that your sequence provider has done the demultiplexing for you (as they should).

                    If you used custom barcodes that are "in-line" with the sequence read then you will need to do the demultiplexing yourself.

                    Illumina barcodes (1D or 2D) are read as a separate read(s) and are not part of the actual sequence.
                    Last edited by GenoMax; 05-09-2013, 08:51 AM.

                    Comment


                    • #25
                      Originally posted by SS Santos View Post
                      Hi

                      GenoMax: you are right, I do have the barcode sequence in the first line of each read as you show in red, I just didn't know that it means it's not included in the read itself. If that's true it's good news.
                      That is correct. See my response to "zshuhua" above.

                      Comment


                      • #26
                        Adapter ILLUMINA CLIP file format

                        Originally posted by SS Santos View Post
                        So they sent me this:

                        Adapters sequence:
                        5' P-GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
                        5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT

                        sample barcode sequence
                        IST4113 TAGCTT
                        IST4129 AGTTCC
                        IST4134 CTTGTA
                        IST439 AGTCAA

                        Do I create a text file with this, how can use it as in input for trimming/filtering tools?

                        Thanks

                        Did you solve this issue ?? I have the same issue... do you create it as fasta format or as a txt file???

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Advancing Precision Medicine for Rare Diseases in Children
                          by seqadmin




                          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                          12-16-2024, 07:57 AM
                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin



                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has seen remarkable advancements,...
                          12-02-2024, 01:49 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 12-17-2024, 10:28 AM
                        0 responses
                        33 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-13-2024, 08:24 AM
                        0 responses
                        48 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-12-2024, 07:41 AM
                        0 responses
                        34 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-11-2024, 07:45 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X