SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
converting paired-end (PE) bam file to single-end (SE) fastq adrian Bioinformatics 3 05-05-2015 11:00 AM
Obtain coordinates of insert (e.g. bed file) from aligned paired-end reads (sam file) jajclement Bioinformatics 0 08-20-2013 07:13 AM
Given BAM/SAM file, how to see if it's single-end or paired-end sequencing? xxatbio Bioinformatics 2 08-11-2013 03:51 AM
Sort and Split shuffled (interleaved) fastq file nposnien Bioinformatics 4 08-08-2012 08:41 AM
file format headaches - producing interleaved fastq natstreet SOLiD 1 07-28-2010 02:04 AM

Reply
 
Thread Tools
Old 10-04-2016, 02:07 PM   #1
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default What is interleaved-paired-end file

Hello, I am trying to use Megahit to assemble my Illumina HiSeq reads. Here is the Megahit parameter:

./megahit [options] {-1 <pe_1.fq> -2 <pe_2.fq> | --12 <pe12.fq> | -r <se.fq>}
```

`-1/-2`, `--12` and `-r` are parameters for inputting paired-end, interleaved-paired-end and single-end files. They accept files in fasta (*.fasta*, *.fa*, *.fna*) or fastq (*.fastq*, *.fq*) formats. They also supports gzip files (with *.gz* extensions) and bzip2 files (with *.bz2* extensions). Please run `./megahit -h` for detailed usage message.

What does "interleaved-paried-end" file mean? I have joined my R1 and R2 fastq file into one file because they have overlaps. I am not sure if I should use "--12" (for interleaved) or "-r" parameter (for single end files)

Thanks
SDPA_Pet is offline   Reply With Quote
Old 10-04-2016, 02:46 PM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Interleaved files are when the R1 and R2 reads are combined in one file,
so that for each read pair, the R1 read in the file comes immediately before
the R2 read, followed by the R1 read for the next read pair, and so on.

I think if you have merged the reads together they are probably best described as single end reads.
mastal is offline   Reply With Quote
Old 10-04-2016, 02:56 PM   #3
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default

Hi Thanks,

Can you paste some sequences from interleaved file?
SDPA_Pet is offline   Reply With Quote
Old 10-05-2016, 04:33 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,999
Default

Quote:
I have joined my R1 and R2 fastq file into one file because they have overlaps.
If you joined them end-to-end then that does not account for the actual sequence overlap. If R1/R2 reads actually overlap then you need to use bbmerge.sh from BBMap or FLASH or similar software so the two reads are combined (taking into account the sequence overlap) into a single long read.

An example of interleaved PE reads would be like this

Code:
@M10991:61:000000000-A7EML:1:1101:14011:1001 1:N:0:28
NGCTCCTAGGTCGGCATGATGGGGGAAGGAGAGCATGGGAAGAAATGAGAGAGTAGCAA
+
#8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
@M10991:61:000000000-A7EML:1:1101:14011:1001 2:N:0:28
NGCTCCTAGGTCGGCATGACGCTAGCTACGATCGACTACGCTAGCATCGAGAGTAGCAA
+
#8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
@M10991:61:000000000-A7EML:1:1201:15411:3101 1:N:0:28
NGCTCCTAGGTCGGCATGATGGGGGAAGGAGAGCATGGGAAGAAATGAGAGAGTAGCAA
+
#8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
@M10991:61:000000000-A7EML:1:1201:15411:3101 2:N:0:28
CGCTAGCTACGACTCGACGACAGCGAACACGCGATCGATCGGAAATGAGAGAGTAGCAA
+
#8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
GenoMax is offline   Reply With Quote
Old 10-05-2016, 06:07 AM   #5
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default

Hi GenoMax,

So, interleaved PE is kind of cancatenating the two file together, as we do for fasta file in linux (e.g. cat *.fna > total.fna)? -- Does this mean "joined them end-to-end then that does not account for the actual sequence overlap"

Am I right?
SDPA_Pet is offline   Reply With Quote
Old 10-05-2016, 06:26 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,999
Default

Not quite. As you see from example above the arrangement is NOT like (which would be equivalent to)
Code:
cat R1.fastq R2.fastq > combined.fastq

giving you

read1_R1
read2_R1
read3_R1
read1_R2
read2_R2
read3_R2
but rather

Code:
read1_R1
read1_R2
read2_R1
read2_R2
read3_R1
read3_R2

It your reads overlap in the middle (meaning the number of cycles of sequencing is > (insert length/2)) then you would basically get

Code:
R1 ---------------------------->
                     <----------------------------- R2
After using bbmerge you will get (== represents overlap)

Merged read -------------------------============----------------------------

Last edited by GenoMax; 10-05-2016 at 06:34 AM.
GenoMax is offline   Reply With Quote
Old 10-05-2016, 06:31 AM   #7
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default

Hi, Thanks for the help.

I have never seen the interleaved fastq file before. I have business with several sequencing centers. For all my illumina data, they always give me R1 and R2 files.

I'm not sure the advantage of interleaved fastq file, because one sample is one file? Or it is generated by other sequencing platform.
SDPA_Pet is offline   Reply With Quote
Old 10-05-2016, 06:33 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,999
Default

Interleaved files ensure that all R1/R2 reads are next to each other and you would not get in a situation where the reads would be out of sync (e.g. for independent files that may be trimmed independently/incorrectly).

That said not many places uses these. JGI (where @Brian works) does and that is one of the reasons why BBMap has support for this format.
GenoMax is offline   Reply With Quote
Old 10-05-2016, 07:02 AM   #9
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default

Hi GenoMax,

PS, I have a question about bbMAP. I just installed it. I don't know why when I run some scripts. For example, "reformat.sh". I have to be in the bbmap folder and type ./reformat.sh. Otherwise, the system tells me "command is not found". I am using Unbuntu 64X and I have add the bbMap path in the environment.

Do you know why is this.
SDPA_Pet is offline   Reply With Quote
Old 10-05-2016, 07:09 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,999
Default

If you added bbmap shell scripts directory (make sure this directory is also added) to your $PATH then the scripts should work from anywhere on the system.
GenoMax is offline   Reply With Quote
Old 10-05-2016, 07:23 AM   #11
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default

I only have one directory which is /bbmap/? What else do I need?
SDPA_Pet is offline   Reply With Quote
Old 10-05-2016, 08:25 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,999
Default

If the shell scripts are in that directory then that is all you should need. Ensure that all scripts have execute permissions (chmod a+x bbmap/*.sh).
GenoMax is offline   Reply With Quote
Old 10-06-2016, 07:47 AM   #13
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

If you're going to merge the fastq files with a custom script: Make sure to not put read-1 and read-2 in the wrong order, else the assembler might discard them (at least IDBA_UD does; which makes sense; the assembler sees the second read, without having the first read and therefore discards it; then the first read doesn't have the second, and so on).

(oh, is the 300 sec waiting time between posts really still necessary? It's quite annoying)
bastianwur is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO