Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Velvet] multiple files from each library

    I have questions about running "Velvet".
    What I want to do is de novo assembly with short read sequence.

    Data Description:
    - I have fastq files generated from Illumina.
    - These are paired end reads.
    - There are four lengths of DNA.
    - There are two lanes. (The same samples are loaded on two different lanes.)
    - One fastq file contains one end of particular length of DNA.
    e.g. lane1_2-4kb_R1.fastq was generaged from lane1, containing right end of DNA of 2-4kb length
    - 16 fastq files in total (R1 and R2 are paired ends.)
    ______________________________
    _________Lane1______Lane2_____
    2-4kb........R1...R2........R1...R2
    5-7kb........R1...R2........R1...R2
    8-10kb.......R1...R2........R1...R2
    11kb..........R1...R2........R1...R2
    ______________________________

    My Questions:
    - The first thing that I need to consider is to shuffle the paired data files into one merged file; (lane1_2-4kb_R1.fq and lane1_2-4kb_R2.fq into merged.fq)
    Q1. However, do I have to distinguish two different lanes? If not, can I just put all the fq files into one command?
    Q2. Do you think the command below makes sense?

    velveth \
    Dir 31 \
    -shortPaired2 -separate -fastq 02_L7_R1.fq 02_L7_R2.fq \
    -shortPaired2 -separate -fastq 02_L8_R1.fq 02_L8_R2.fq \
    -shortPaired2 -separate -fastq 05_L7_R1.fq 02_L7_R2.fq \
    -shortPaired2 -separate -fastq 05_L8_R1.fq 02_L8_R2.fq \
    -shortPaired2 -separate -fastq 08_L7_R1.fq 02_L7_R2.fq \
    -shortPaired2 -separate -fastq 08_L8_R1.fq 02_L8_R2.fq \
    -shortPaired2 -separate -fastq 11_L7_R1.fq 02_L7_R2.fq \
    -shortPaired2 -separate -fastq 11_L8_R1.fq 02_L8_R2.fq
    Thank you in advance.
    Attached Files
    Last edited by syintel87; 10-04-2013, 08:35 AM.

  • #2
    Hi,

    You should be able to merge the files of the same length from the two
    lanes together.

    Just do some QC with something like FastQC first, to check that the data from both lanes is OK, and that neither lane of data has any problems.

    But then you have to specify each pair of reads with a different insert length as a different category.

    You only need to specify -fastq and -separate once in your command.

    You also need to recompile velvet with 'CATEGORIES=4', because by default I think it only allows 2 categories.

    Code:
    velveth \ 
    Dir 31 \ 
    -fastq -separate -shortPaired  02_L7L8_R1.fq 02_L7L8_R2.fq \ 
    -shortPaired2   05_L7L8_R1.fq 05_L7L8_R2.fq \
    -shortPaired3 08_L7L8_R1.fq 08_L7L8_R2.fq \
    -shortPaired4 11_L7L8_R1.fq 11_L7L8_R2.fq
    But why do you have 4 different mate pair libraries and no short-insert libraries?

    Comment


    • #3
      I have a similar but a bit different issue. I have thousands of regions over genome sequenced. These reads (single-end) are clustered into separate files (fastq). How to use velvet to assemble these separate files internally. Will the -seprate switch work in this case as well?

      Veljo

      Originally posted by mastal View Post
      Hi,

      You should be able to merge the files of the same length from the two
      lanes together.

      Just do some QC with something like FastQC first, to check that the data from both lanes is OK, and that neither lane of data has any problems.

      But then you have to specify each pair of reads with a different insert length as a different category.

      You only need to specify -fastq and -separate once in your command.

      You also need to recompile velvet with 'CATEGORIES=4', because by default I think it only allows 2 categories.

      Code:
      velveth \ 
      Dir 31 \ 
      -fastq -separate -shortPaired  02_L7L8_R1.fq 02_L7L8_R2.fq \ 
      -shortPaired2   05_L7L8_R1.fq 05_L7L8_R2.fq \
      -shortPaired3 08_L7L8_R1.fq 08_L7L8_R2.fq \
      -shortPaired4 11_L7L8_R1.fq 11_L7L8_R2.fq
      But why do you have 4 different mate pair libraries and no short-insert libraries?

      Comment


      • #4
        No, you should not use the -separate switch.

        The -separate switch is for paired-end reads, when the R1 and R2 reads from the same sample are in different files.

        Comment


        • #5
          yap, thanks for quick reply! I learned the same while finally checking velveth -help, lazy me... But strangely there is nothing about these switches in manual.

          so the only way would be piping thouse multitude files separately into velvet? or any "smarter" way recommended?

          Veljo

          Comment


          • #6
            Originally posted by v_kisand View Post
            yap, thanks for quick reply! I learned the same while finally checking velveth -help, lazy me... But strangely there is nothing about these switches in manual.

            so the only way would be piping thouse multitude files separately into velvet? or any "smarter" way recommended?

            Veljo

            The other option is just to combine all the fastq together, and input one file.

            Could you explain "I have thousands of regions over genome sequenced." Is this from physical mapping?

            Comment


            • #7
              I think -separate wasn't in the manual because it is a relatively recent addition to velvet, previously for paired-end reads you had to merge the R1 and R2 files.

              just use the switches -fastq -short

              assuming that the reads in your different files were the same length before adapter and quality trimming, and you are happy for velveth to use the same kmer length on the different files, you can just list all the different files one after the other.

              If you type

              $ velveth

              you should get the usage for the velveth command, which should look something like this:

              Code:
              Usage:
              ./velveth directory hash_length {[-file_format][-read_type] filename1 [filename2 ...]} {...} [options]

              Comment


              • #8
                yes, kind a physical mapping, not long contigs are expected. And at this phase I _do not want_ any assemblies, even when possible, between reads from separate files

                Comment


                • #9
                  Originally posted by v_kisand View Post
                  yes, kind a physical mapping, not long contigs are expected. And at this phase I _do not want_ any assemblies, even when possible, between reads from separate files
                  It might might sense to run your assemblies in parallel then. You can always scaffold or combine later.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 08:47 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X