Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • hosseinv
    Junior Member
    • Aug 2012
    • 8

    Merge multiple fq read files

    I have multiple Illumina Hi-seq 2000 fastq.gz files for each individual as follows;

    sample1 lane1 read 1_001
    sample1 lane1 read 1_002
    sample1 lane1 read 2_001
    sample1 lane1 read 2_002

    sample1 lane2 read 1_001
    sample1 lane2 read 1_002
    sample1 lane2 read 2_001
    sample1 lane2 read 2_002

    sample1 lane3 read 1_001
    sample1 lane3 read 1_002
    sample1 lane3 read 2_001
    sample1 lane3 read 2_002

    they are all for a single individual. what script in Linux console would be best to merge all files in a final merged file? the idea is to do analysis in Galaxy.

    THank you all.
    Last edited by hosseinv; 01-27-2015, 10:18 PM. Reason: Typos
  • hosseinv
    Junior Member
    • Aug 2012
    • 8

    #2
    They are paired-end reads.
    Last edited by hosseinv; 01-27-2015, 10:20 PM. Reason: Wrong info

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      Galaxy can concatenate files together, and might actually be easier to upload each file one by one to Galaxy (gzip them first), since uploading a single large file is harder.

      If you do want to concatenate files at the command line, you can use the command 'cat', as in:

      cat fileA fileB fileC > combinedfileABC

      Comment

      • hosseinv
        Junior Member
        • Aug 2012
        • 8

        #4
        Thanks Peter for your answer
        the thing that I want to know is the order of putting files in my cat command. if you have a look on my example files you'll see I've got sequences of one individual (sample 1) in 3 lanes and different number of reads both for read 1s (001 and 002) and read 2s (001 nad 002) in each lane.
        plus I know how to merge two files in Galaxy, but don't know how to merge multiple files.

        Thanks again.
        Last edited by hosseinv; 01-27-2015, 10:22 PM.

        Comment

        • DFJ111
          Member
          • Aug 2012
          • 20

          #5
          Not sure why you have duplicates of sample1 lane1 read 1_002, sample1 lane2 read 1_002, and sample1 lane3 read 1_002. Typo from the sequencing lab? If all you want to do is QC (via FastQC for example) I don't think it matters what order they are in. Also, if you do want to do FastQC you should assess each lane separately. If you want to do something else to them, the order or merging (or whether to merge them at all) depends on what that "something else" is.

          Also, you can merge multiple files in GALAXY, use "concatenate head-to-tail". I just use cat, it's quicker.
          Last edited by DFJ111; 08-16-2012, 07:20 PM.

          Comment

          • hosseinv
            Junior Member
            • Aug 2012
            • 8

            #6
            Thanks DFJ111

            It's not a typo. The thing that I want to do is to map the sequences against the reference genes and find the polymorphism.

            Comment

            • maubp
              Peter (Biopython etc)
              • Jul 2009
              • 1544

              #7
              Which tool are you planning to use for the mapping, and does it require the paired reads in any specific order (e.g. interleaved in one file) or as separate files (forward and reverse reads)?

              Comment

              • hosseinv
                Junior Member
                • Aug 2012
                • 8

                #8
                Hi maubp

                I will be using BWA for mapping. and I think I'll hav to treat each of the reads individually and at some stage I can pool the BAM or SAM files together.

                Thanks for your attention .

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...