SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Merge from multiple libraries + tophat + cufflinks vinay052003 RNA Sequencing 5 04-30-2012 05:19 PM
How to merge multiple sequencing runs vinay052003 Bioinformatics 4 01-31-2012 04:34 AM
a question about merge bam files camelbbs Bioinformatics 2 10-24-2011 10:00 AM
Merge individual vcf files francy Bioinformatics 5 06-21-2011 03:10 AM
Can we merge 2 csfasta files ? tdm SOLiD 9 12-10-2010 10:10 AM

Reply
 
Thread Tools
Old 08-16-2012, 05:07 AM   #1
hosseinv
Junior Member
 
Location: Melbourne

Join Date: Aug 2012
Posts: 8
Arrow Merge multiple fq read files

I have multiple Illumina Hi-seq 2000 fastq.gz files for each individual as follows;

sample1 lane1 read 1_001
sample1 lane1 read 1_002
sample1 lane1 read 2_001
sample1 lane1 read 2_002

sample1 lane2 read 1_001
sample1 lane2 read 1_002
sample1 lane2 read 2_001
sample1 lane2 read 2_002

sample1 lane3 read 1_001
sample1 lane3 read 1_002
sample1 lane3 read 2_001
sample1 lane3 read 2_002

they are all for a single individual. what script in Linux console would be best to merge all files in a final merged file? the idea is to do analysis in Galaxy.

THank you all.

Last edited by hosseinv; 01-27-2015 at 10:18 PM. Reason: Typos
hosseinv is offline   Reply With Quote
Old 08-16-2012, 05:09 AM   #2
hosseinv
Junior Member
 
Location: Melbourne

Join Date: Aug 2012
Posts: 8
Default

They are paired-end reads.

Last edited by hosseinv; 01-27-2015 at 10:20 PM. Reason: Wrong info
hosseinv is offline   Reply With Quote
Old 08-16-2012, 05:56 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Galaxy can concatenate files together, and might actually be easier to upload each file one by one to Galaxy (gzip them first), since uploading a single large file is harder.

If you do want to concatenate files at the command line, you can use the command 'cat', as in:

cat fileA fileB fileC > combinedfileABC
maubp is offline   Reply With Quote
Old 08-16-2012, 06:29 PM   #4
hosseinv
Junior Member
 
Location: Melbourne

Join Date: Aug 2012
Posts: 8
Arrow

Thanks Peter for your answer
the thing that I want to know is the order of putting files in my cat command. if you have a look on my example files you'll see I've got sequences of one individual (sample 1) in 3 lanes and different number of reads both for read 1s (001 and 002) and read 2s (001 nad 002) in each lane.
plus I know how to merge two files in Galaxy, but don't know how to merge multiple files.

Thanks again.

Last edited by hosseinv; 01-27-2015 at 10:22 PM.
hosseinv is offline   Reply With Quote
Old 08-16-2012, 08:00 PM   #5
DFJ111
Member
 
Location: Auckland

Join Date: Aug 2012
Posts: 20
Default

Not sure why you have duplicates of sample1 lane1 read 1_002, sample1 lane2 read 1_002, and sample1 lane3 read 1_002. Typo from the sequencing lab? If all you want to do is QC (via FastQC for example) I don't think it matters what order they are in. Also, if you do want to do FastQC you should assess each lane separately. If you want to do something else to them, the order or merging (or whether to merge them at all) depends on what that "something else" is.

Also, you can merge multiple files in GALAXY, use "concatenate head-to-tail". I just use cat, it's quicker.

Last edited by DFJ111; 08-16-2012 at 08:20 PM.
DFJ111 is offline   Reply With Quote
Old 08-16-2012, 08:58 PM   #6
hosseinv
Junior Member
 
Location: Melbourne

Join Date: Aug 2012
Posts: 8
Arrow

Thanks DFJ111

It's not a typo. The thing that I want to do is to map the sequences against the reference genes and find the polymorphism.
hosseinv is offline   Reply With Quote
Old 08-17-2012, 01:06 AM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Which tool are you planning to use for the mapping, and does it require the paired reads in any specific order (e.g. interleaved in one file) or as separate files (forward and reverse reads)?
maubp is offline   Reply With Quote
Old 08-23-2012, 07:36 PM   #8
hosseinv
Junior Member
 
Location: Melbourne

Join Date: Aug 2012
Posts: 8
Arrow

Hi maubp

I will be using BWA for mapping. and I think I'll hav to treat each of the reads individually and at some stage I can pool the BAM or SAM files together.

Thanks for your attention .
hosseinv is offline   Reply With Quote
Reply

Tags
illumina hiseq 2000 reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO