Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi, so I'm working with some similar data. Something I found is that alot of trimming tools aren't really set up for paired end stuff. I have a pipeline for trimming and aligning reads. It goes basically like this:


    //There are first two files, paired end illumina. This removes all the ones that failed basic quality checks. Outputs to Filtered
    grep -A 3 '^@.* [^:]*:N:[^:]*:' $INPUT1 > $FILTERED1
    grep -A 3 '^@.* [^:]*:N:[^:]*:' $INPUT2 > $FILTERED2

    //This tool is good for dealing with paired end reads. Best that I could find for paired end trimming. I don't remember all the parameters but theres a great resource out there describing this tool.
    fastq-mcf -o $OUTPUT1 -o $OUTPUT2 -l 16 -q 15 -w 4 -x 10 -u -P 33 $ADAPTERS $FILTERED1 $FILTERED2

    //This aligns using bowtie and gets a samfile made.
    bowtie -t -p 8 --sam $REF_GENOME -1 $OUTPUT1 -2 $OUTPUT2 $ALIGNED_OUTPUT

    //This makes a sorted bam file from our bowtie alignment, which can be used for all sorts of things.
    samtools view -bS $ALIGNED_OUTPUT | samtools sort - $SORTED_BAM
    samtools index $SORTED_BAM.bam $SORTED_BAM.bam.bai



    That's pretty much how I'm doing it for my data. It works pretty well. As for those nasty overrepresented sequences. I'm guessing you're doing quality assessment with fastqc, which is a great tool. In my case, I did RNA-seq on bacterial genomes, so my read depth is really really high, because the genome is small. Add to that some highly expressed genes and you get queues for highly represented sequences. I'm basically ignoring them in my data, but think about how overrepresented sequences apply to your data and how bad or not important they really are.

    Hope this helps.

    Comment


    • #17
      Hello everyone,


      I am working with TruSeq paired end data (150bp). I have a doubt regarding the adapter file provided in Trimmomatic for trimming adapters.

      According to the Trimmomatic provided adapter file "TruSeq3-PE-2.fa" the reverse complement of index adapter sequence is used for trimming reads from R2 file and the universal adapter is used for trimming reads from R1 file.
      >PrefixPE/1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT

      >PrefixPE/2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

      >PE1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT

      >PE1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

      >PE2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

      >PE2_rc AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

      However, it looks like that for my data the actual sequences of the index adapter is in the R1 file and the reverse complement of the universal adapter is in the R2 file.

      This information was also provided to me by Illumina support team.


      Therefore I prepared my adapter file as follows (I'm using the full sequence):
      >PrefixPE/1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG (index adapter)

      >PrefixPE/2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT ( reverse complement of universal adapter)

      >PE1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

      >PE1_rc CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (revcomp of PE1)

      >PE2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

      >PE2_rc AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT (revcomp of PE2)

      Please let me know if this adapter file I prepared is fine or is the Trimmomatic adapter file better and needs to be used always.
      I tried my custom made file as well as the Trimmomatic recommended file and found that both removed adapters when checked using FASTQC!!

      Please correct me or let me know if I'm missing something!
      Appreciate your help and guidane!
      Thanks,
      Candida
      Last edited by candida; 05-03-2017, 12:32 AM. Reason: Delete Post

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X