Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is interleaved-paired-end file

    Hello, I am trying to use Megahit to assemble my Illumina HiSeq reads. Here is the Megahit parameter:

    ./megahit [options] {-1 <pe_1.fq> -2 <pe_2.fq> | --12 <pe12.fq> | -r <se.fq>}
    ```

    `-1/-2`, `--12` and `-r` are parameters for inputting paired-end, interleaved-paired-end and single-end files. They accept files in fasta (*.fasta*, *.fa*, *.fna*) or fastq (*.fastq*, *.fq*) formats. They also supports gzip files (with *.gz* extensions) and bzip2 files (with *.bz2* extensions). Please run `./megahit -h` for detailed usage message.

    What does "interleaved-paried-end" file mean? I have joined my R1 and R2 fastq file into one file because they have overlaps. I am not sure if I should use "--12" (for interleaved) or "-r" parameter (for single end files)

    Thanks

  • #2
    Interleaved files are when the R1 and R2 reads are combined in one file,
    so that for each read pair, the R1 read in the file comes immediately before
    the R2 read, followed by the R1 read for the next read pair, and so on.

    I think if you have merged the reads together they are probably best described as single end reads.

    Comment


    • #3
      Hi Thanks,

      Can you paste some sequences from interleaved file?

      Comment


      • #4
        I have joined my R1 and R2 fastq file into one file because they have overlaps.
        If you joined them end-to-end then that does not account for the actual sequence overlap. If R1/R2 reads actually overlap then you need to use bbmerge.sh from BBMap or FLASH or similar software so the two reads are combined (taking into account the sequence overlap) into a single long read.

        An example of interleaved PE reads would be like this

        Code:
        @M10991:61:000000000-A7EML:1:1101:14011:1001 1:N:0:28
        NGCTCCTAGGTCGGCATGATGGGGGAAGGAGAGCATGGGAAGAAATGAGAGAGTAGCAA
        +
        #8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
        @M10991:61:000000000-A7EML:1:1101:14011:1001 2:N:0:28
        NGCTCCTAGGTCGGCATGACGCTAGCTACGATCGACTACGCTAGCATCGAGAGTAGCAA
        +
        #8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
        @M10991:61:000000000-A7EML:1:1201:15411:3101 1:N:0:28
        NGCTCCTAGGTCGGCATGATGGGGGAAGGAGAGCATGGGAAGAAATGAGAGAGTAGCAA
        +
        #8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
        @M10991:61:000000000-A7EML:1:1201:15411:3101 2:N:0:28
        CGCTAGCTACGACTCGACGACAGCGAACACGCGATCGATCGGAAATGAGAGAGTAGCAA
        +
        #8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF

        Comment


        • #5
          Hi GenoMax,

          So, interleaved PE is kind of cancatenating the two file together, as we do for fasta file in linux (e.g. cat *.fna > total.fna)? -- Does this mean "joined them end-to-end then that does not account for the actual sequence overlap"

          Am I right?

          Comment


          • #6
            Not quite. As you see from example above the arrangement is NOT like (which would be equivalent to)
            Code:
            cat R1.fastq R2.fastq > combined.fastq
            
            giving you
            
            read1_R1
            read2_R1
            read3_R1
            read1_R2
            read2_R2
            read3_R2
            but rather

            Code:
            read1_R1
            read1_R2
            read2_R1
            read2_R2
            read3_R1
            read3_R2

            It your reads overlap in the middle (meaning the number of cycles of sequencing is > (insert length/2)) then you would basically get

            Code:
            R1 ---------------------------->
                                 <----------------------------- R2
            After using bbmerge you will get (== represents overlap)

            Merged read -------------------------============----------------------------
            Last edited by GenoMax; 10-05-2016, 05:34 AM.

            Comment


            • #7
              Hi, Thanks for the help.

              I have never seen the interleaved fastq file before. I have business with several sequencing centers. For all my illumina data, they always give me R1 and R2 files.

              I'm not sure the advantage of interleaved fastq file, because one sample is one file? Or it is generated by other sequencing platform.

              Comment


              • #8
                Interleaved files ensure that all R1/R2 reads are next to each other and you would not get in a situation where the reads would be out of sync (e.g. for independent files that may be trimmed independently/incorrectly).

                That said not many places uses these. JGI (where @Brian works) does and that is one of the reasons why BBMap has support for this format.

                Comment


                • #9
                  Hi GenoMax,

                  PS, I have a question about bbMAP. I just installed it. I don't know why when I run some scripts. For example, "reformat.sh". I have to be in the bbmap folder and type ./reformat.sh. Otherwise, the system tells me "command is not found". I am using Unbuntu 64X and I have add the bbMap path in the environment.

                  Do you know why is this.

                  Comment


                  • #10
                    If you added bbmap shell scripts directory (make sure this directory is also added) to your $PATH then the scripts should work from anywhere on the system.

                    Comment


                    • #11
                      I only have one directory which is /bbmap/? What else do I need?

                      Comment


                      • #12
                        If the shell scripts are in that directory then that is all you should need. Ensure that all scripts have execute permissions (chmod a+x bbmap/*.sh).

                        Comment


                        • #13
                          If you're going to merge the fastq files with a custom script: Make sure to not put read-1 and read-2 in the wrong order, else the assembler might discard them (at least IDBA_UD does; which makes sense; the assembler sees the second read, without having the first read and therefore discards it; then the first read doesn't have the second, and so on).

                          (oh, is the 300 sec waiting time between posts really still necessary? It's quite annoying)

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM
                          • seqadmin
                            The Impact of AI in Genomic Medicine
                            by seqadmin



                            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                            02-26-2024, 02:07 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-14-2024, 06:13 AM
                          0 responses
                          34 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-08-2024, 08:03 AM
                          0 responses
                          72 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-07-2024, 08:13 AM
                          0 responses
                          82 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-06-2024, 09:51 AM
                          0 responses
                          68 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X