I have separate read 1 and read 2 fastq whole genome data and want go for their alignment. Just want to clear, shall I align them separately or first I will concatenate them and then make the alignment together?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi dear,
The data you have is the paired end data, with read 1 and read 2 fastq files are the short read data files with some insert size, most of the times it is 200-400 bps. Ask for the insert size of your data. You can use bowtie2, MAQ or BWA softwares for mapping the data onto the reference genome, choice of mapping software will depend on the read length/insert size. If you want to run bowtie, you may consider the following commands:
You will first need to create a bowtie index of your "reference genome". Run bowtie-build on the genome fasta file:
bowtie2-build Genome.fa Genome.build
This will create six new files which constitute the bowtie index necessary for bowtie:
Genome.build.1.ebwt Genome.build.4.ebwt
Genome.build.2.ebwt Genome.build.rev.1.ebwt
Genome.build.3.ebwt Genome.build.rev.2.ebwt
We can now map paired reads in fastq format to the Genome.build reference sequence:
bowtie2 -S -q --solexa1.3-quals -p 1 -I 100 -X 600 --fr Genome.build -1 read1.fastq -2 read2.fastq OUT.sam
-S: output will be in SAM format.
-q: Quality scores of your data.
-p: Number of processors you want to use.
-l: Minimum insert size.
-X: maximum possible insert size.
-fr: read files are in forward-reverse order.
-1: read 1 fastq file
-2: read 2 fastq file
OUT.sam is the output file
Hope this will be helpful.
Best wishes,
RahulRahul Sharma,
Ph.D
Frankfurt am Main, Germany
-
Yes, there is no need to concatenate the read files, you will lose the pair information. And many of the reads will not map, as they will have insert in-between. Look at the following example:
|=================================| (Reference genome)
(1)-------> <-------- (2)
(1)--------> <--------(2)
(1)--------> <--------(2)
=== is the reference genome.
(1)-----> read 1, your read 1.fastq file will contain all the (1)-----> reads.
<-------(2) read 2, your read 2.fastq file will contain all the <------(2) reads.
Please refer these terms: Paired end data, mate pairs, insert size, read quality scores, read coverage. It would help you in your analysis.
Best wishes,
RahulRahul Sharma,
Ph.D
Frankfurt am Main, Germany
Comment
-
Oh sorry my mistake, from your last post I thought you wanted to actually physically join each read.
If you have many 30 fastq files then yes you can just concatenate them together to create one fastq read library for each read pair.
ie.
Code:cat *R1.fastq > all.reads.R1.fastq cat *R2.fastq > all.reads.R2.fastq
Comment
-
Hi,
Ya you can concatenate the files, but not reads. Please check the Id's in all of your files,
they should be unique before "cat" command.
You may also consider some data preprocessing methods: To trim the adapters and primers, discard low quality reads(with its pair), discard reads with more than 5%-10% N's in it. This can improve your analysis. Please check the quality of your reads with FASTQC or FASTX tools.
Best wishes,
RahulRahul Sharma,
Ph.D
Frankfurt am Main, Germany
Comment
Latest Articles
Collapse
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
33 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
34 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
||
Started by seqadmin, 12-11-2024, 07:45 AM
|
0 responses
46 views
0 likes
|
Last Post
by seqadmin
12-11-2024, 07:45 AM
|
Comment