Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fewer reads R1 than R2 bowtie2

    Dear all,
    perhaps I am going to ask something too much discussed. However I am absolutely unable to find the rigth answer.
    Anyone knows how to perform a mapping step using Bowtie2 with a different number of R1 and R2 reads from Illumina? I haver heard something about using Trim_galore prior to the alignment step, but I must trim from several primers and after a cutadapt trimming process R1 and R2 have a different number of reads. I guess that using Trim_galore after that could trim useful sequences...
    Any advice would be appreciated.
    Thanks a lot!

  • #2
    If you can reformat your R1/R2 such that they only contained paired, and have an extra fastq that contains unpaired reads, you can do...

    -q -1 R1.fastq -2 R2.fastq -U unpaired.fastq

    On my end, I've had good luck with Trimmomatic, which will give you a separate file for the unpaired reads (1P, 2P, 1U, 2U)

    Comment


    • #3
      Hi ctseto! Thanks for your reply!
      I had a look to trimmomatic. However it didn't work as well as I expected. I mean, some primer sequences still remained in my fastq files.
      That's what I decided to give a chance to cutadapt, but I don't know how to "reformat" my R1 and R2 files...
      Thanks!

      Comment


      • #4
        Not familiar with cutadapt, but I'll look into it. From https://github.com/marcelm/cutadapt/...ster/README.md

        If you use one of the read-discarding options, then the --paired-output option is needed to keep the two files synchronized. First trim the forward read, writing output to temporary files:

        cutadapt -a ADAPTER_FWD --minimum-length 20 --paired-output tmp.2.fastq -o tmp.1.fastq reads.1.fastq reads.2.fastq
        Then trim the reverse read, using the temporary files as input:

        cutadapt -a ADAPTER_REV --minimum-length 20 --paired-output trimmed.1.fastq -o trimmed.2.fastq tmp.2.fastq tmp.1.fastq
        Finally, remove the temporary files:

        rm tmp.1.fastq tmp.2.fastq
        I assume this is what you tried?

        An inelegant way of pulling reads that still had pairs would be to build an index of readnames that existed in both R1 and R2; then extract those reads from R1 and R2 and construct a new pair of files that had the appropriate reads.
        Last edited by ctseto; 11-14-2013, 09:09 AM.

        Comment


        • #5
          This is why I prefer Trimmomatic. It handles paired end reads more elegantly. I suggest you give it a try.

          Comment


          • #6
            Exactly!
            Older version....
            Thanks a lot ctseto!
            However, I will let a chance to Trimmomatic again... thanks!

            Comment


            • #7
              Originally posted by jordi View Post
              Exactly!
              Older version....
              Thanks a lot ctseto!
              However, I will let a chance to Trimmomatic again... thanks!
              Might be worth checking your Trimmomatic's list of adapters to be on the safe side.

              That would be the TruSeq3-PE.fa file in your adapters directory in the Trimmomatic directory. (Current version is 0.30)

              Knowing what adapters/primers/kits you are using upstream of your NGS, and armed with something like Illumina's Customer Sequence Letter (http://support.illumina.com/download...es_letter.ilmn) It should give you enough information (to much, even) to put together a list of adaptors to trim off.

              A figure illustrating the schema of Trimmomatic is in its manual (http://www.usadellab.org/cms/uploads...nual_V0.30.pdf)

              From Trimmomatic's notes
              These sequences have not been extensively tested, and depending on specific issues which may occur in library preparation, other sequences may work better for a given dataset.

              To make a custom version of fasta, you must first understand how it will be used. Trimmomatic uses two strategies for adapter trimming: Palindrome and Simple

              With 'simple' trimming, each adapter sequence is tested against the reads, and if a sufficiently accurate match is detected, the read is clipped appropriately.

              'Palindrome' trimming is specifically designed for the case of 'reading through' a short fragment into the adapter sequence on the other end. In this approach, the appropriate adapter sequences are 'in silico ligated' onto the start of the reads, and the combined adapter+read sequences, forward and reverse are aligned. If they align in a manner which indicates 'read-through', the forward read is clipped and the reverse read dropped (since it contains no new data).

              Naming of the sequences indicates how they should be used. For 'Palindrome' clipping, the sequence names should both start with 'Prefix', and end in '/1' for the forward adapter and '/2' for the reverse adapter. All other sequences are checked using 'simple' mode. Sequences with names ending in '/1' or '/2' will be checked only against the forward or reverse read. Sequences not ending in '/1' or '/2' will be checked against both the forward and reverse read. If you want to check for the reverse-complement of a specific sequence, you need to specifically include the reverse-complemented form of the sequence as well, with another name.
              Knowing what adapters/primers/kits you are using upstream of your NGS, and armed with something like Illumina's Customer Sequence Letter (http://support.illumina.com/download...es_letter.ilmn) It should give you enough information (to much, even) to put together a list of adaptors to trim off.
              Last edited by ctseto; 11-14-2013, 11:46 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X