Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fewer reads R1 than R2 bowtie2

    Dear all,
    perhaps I am going to ask something too much discussed. However I am absolutely unable to find the rigth answer.
    Anyone knows how to perform a mapping step using Bowtie2 with a different number of R1 and R2 reads from Illumina? I haver heard something about using Trim_galore prior to the alignment step, but I must trim from several primers and after a cutadapt trimming process R1 and R2 have a different number of reads. I guess that using Trim_galore after that could trim useful sequences...
    Any advice would be appreciated.
    Thanks a lot!

  • #2
    If you can reformat your R1/R2 such that they only contained paired, and have an extra fastq that contains unpaired reads, you can do...

    -q -1 R1.fastq -2 R2.fastq -U unpaired.fastq

    On my end, I've had good luck with Trimmomatic, which will give you a separate file for the unpaired reads (1P, 2P, 1U, 2U)

    Comment


    • #3
      Hi ctseto! Thanks for your reply!
      I had a look to trimmomatic. However it didn't work as well as I expected. I mean, some primer sequences still remained in my fastq files.
      That's what I decided to give a chance to cutadapt, but I don't know how to "reformat" my R1 and R2 files...
      Thanks!

      Comment


      • #4
        Not familiar with cutadapt, but I'll look into it. From https://github.com/marcelm/cutadapt/...ster/README.md

        If you use one of the read-discarding options, then the --paired-output option is needed to keep the two files synchronized. First trim the forward read, writing output to temporary files:

        cutadapt -a ADAPTER_FWD --minimum-length 20 --paired-output tmp.2.fastq -o tmp.1.fastq reads.1.fastq reads.2.fastq
        Then trim the reverse read, using the temporary files as input:

        cutadapt -a ADAPTER_REV --minimum-length 20 --paired-output trimmed.1.fastq -o trimmed.2.fastq tmp.2.fastq tmp.1.fastq
        Finally, remove the temporary files:

        rm tmp.1.fastq tmp.2.fastq
        I assume this is what you tried?

        An inelegant way of pulling reads that still had pairs would be to build an index of readnames that existed in both R1 and R2; then extract those reads from R1 and R2 and construct a new pair of files that had the appropriate reads.
        Last edited by ctseto; 11-14-2013, 09:09 AM.

        Comment


        • #5
          This is why I prefer Trimmomatic. It handles paired end reads more elegantly. I suggest you give it a try.

          Comment


          • #6
            Exactly!
            Older version....
            Thanks a lot ctseto!
            However, I will let a chance to Trimmomatic again... thanks!

            Comment


            • #7
              Originally posted by jordi View Post
              Exactly!
              Older version....
              Thanks a lot ctseto!
              However, I will let a chance to Trimmomatic again... thanks!
              Might be worth checking your Trimmomatic's list of adapters to be on the safe side.

              That would be the TruSeq3-PE.fa file in your adapters directory in the Trimmomatic directory. (Current version is 0.30)

              Knowing what adapters/primers/kits you are using upstream of your NGS, and armed with something like Illumina's Customer Sequence Letter (http://support.illumina.com/download...es_letter.ilmn) It should give you enough information (to much, even) to put together a list of adaptors to trim off.

              A figure illustrating the schema of Trimmomatic is in its manual (http://www.usadellab.org/cms/uploads...nual_V0.30.pdf)

              From Trimmomatic's notes
              These sequences have not been extensively tested, and depending on specific issues which may occur in library preparation, other sequences may work better for a given dataset.

              To make a custom version of fasta, you must first understand how it will be used. Trimmomatic uses two strategies for adapter trimming: Palindrome and Simple

              With 'simple' trimming, each adapter sequence is tested against the reads, and if a sufficiently accurate match is detected, the read is clipped appropriately.

              'Palindrome' trimming is specifically designed for the case of 'reading through' a short fragment into the adapter sequence on the other end. In this approach, the appropriate adapter sequences are 'in silico ligated' onto the start of the reads, and the combined adapter+read sequences, forward and reverse are aligned. If they align in a manner which indicates 'read-through', the forward read is clipped and the reverse read dropped (since it contains no new data).

              Naming of the sequences indicates how they should be used. For 'Palindrome' clipping, the sequence names should both start with 'Prefix', and end in '/1' for the forward adapter and '/2' for the reverse adapter. All other sequences are checked using 'simple' mode. Sequences with names ending in '/1' or '/2' will be checked only against the forward or reverse read. Sequences not ending in '/1' or '/2' will be checked against both the forward and reverse read. If you want to check for the reverse-complement of a specific sequence, you need to specifically include the reverse-complemented form of the sequence as well, with another name.
              Knowing what adapters/primers/kits you are using upstream of your NGS, and armed with something like Illumina's Customer Sequence Letter (http://support.illumina.com/download...es_letter.ilmn) It should give you enough information (to much, even) to put together a list of adaptors to trim off.
              Last edited by ctseto; 11-14-2013, 11:46 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X