Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need advice for pre process and assembly

    Hi,

    I have two paired end files and I'm trying to merge them in order to assemble them, so I have two strategies but I don't what to use and I'm not really sure if they are correct or not.

    1) First method

    - Join them with fastq-join
    The outputs are 3 files: one refers to the joined reads from the original files (JOIN), one is the unjoined reads alone from the 1 (PE_1), and one likewise from 2 (PE_2).
    - Assembly with velveth :
    velveth Dir 31 -shortPaired -separate -fastq PE_1.fq PE_2.fq -short -fastq JOIN.fq
    - velvetg etc.


    2) Second method
    - Join them with velvet-shuffleSequences_fastq.pl
    The output is : one file containing both reads in interleaved format (OUT)
    - Assembly with velveth :
    velveth Dir 31 -shortPaired -fastq OUT.fq
    - velvetg etc.

    Some people had done these before ?
    Thanks

  • #2
    If you want to merge them, I suggest using BBMerge, which has a very low false positive join rate. False joins cause assembly errors.

    bbmerge.sh in1=PE_1.fq in2=PE_2.fq out=merged.fq outu=unmerged.fq

    The "outu" file will contain unmerged reads interleaved. But, I encourage you to try assembling twice, once with the original reads and once with the merged + unmerged reads, because merging is not guaranteed to improve assembly; sometimes it will make it worse.

    And as for "velvet-shuffleSequences_fastq.pl", not sure what the point is of that. Interleaving paired reads won't affect your assembly.

    Comment


    • #3
      Thanks for your answer,
      I will try your soft. To assembly merged + unmerged reads, the correct command is :
      velveth Dir 31 -shortPaired -fastq merged.fq -short -fastq unmerged.fq

      right ?

      and for velvet-shuffleSequences_fastq.pl, I've read

      Comment


      • #4
        I'm guessing that information is obsolete, as Velvet can handle paired reads in two files just fine now.

        But the command would be:

        velveth Dir 31 -shortPaired -fastq unmerged.fq -short -fastq merged.fq

        ...since "unmerged" contains paired reads while "merged" contains the single reads. Also, you will probably get a better assembly with a higher K than 31.

        Comment


        • #5
          Brian, do you suggest to trim bad quality sequences before join or after join ?

          Comment


          • #6
            I suggest this order:

            1) removal of phiX, and other artifact/contaminant reads
            2) adapter trimming
            3) normalization and/or error-correction and/or subsampling (all optional, depends on whether you have too much coverage)
            4) merging (optional)
            5) quality-trimming of unmerged reads only, to around Q10
            6) assembly

            All of this, aside from assembly, can be done with BBTools. Steps 1, 2, and 5 can be done with bbduk.sh and step 3 can be done with bbnorm.sh (error-correction or normalization) or reformat.sh (subsampling).

            1)
            bbduk.sh -Xmx1g in1=read1.fq in2=read2.fq out=clean.fq ref=phix174_ill.ref.fa.gz hdist=1 k=31

            2)
            bbduk.sh -Xmx1g in=clean.fq out=trimmed.fq ref=truseq.fq.gz ktrim=r mink=11 hdist=1 k=25

            3) (optional)
            ecc.sh -Xmx29g in=trimmed.fq out=corrected.fq

            4) (optional)
            bbmerge.sh in=corrected.fq out=merged.fq outu=unmerged.fq

            5)
            bbduk.sh -Xmx1g in=unmerged.fq out=qtrimmed.fq qtrim=rl trimq=10 minlength=50

            6)
            velveth Dir 31 -shortPaired -fastq qtrimmed.fq -short -fastq merged.fq

            The reference files phix174_ill.ref.fa.gz and truseq.fq.gz are both included with BBTools, in the "resources" directory. If you do error-correction (which will improve the rate of merging), the "-Xmx29g" flag is just an example; rather than 29g, it should be set to around 85% of the computer's physical memory.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X