I am a beginner at bioinformatics but have a some experience with python and software development.
I am trying to take some Illumina sequence data (mRNA-level complementary DNA I think) and prepare it for BLAST alignment. It is supposed to be paired-end. However, I'm trying to make sure this is true.
For example, I have the following data files:
J06643_NoIndex_L002_R1_001.fastq
J06643_NoIndex_L002_R1_002.fastq
J06643_NoIndex_L002_R1_003.fastq
J06643_NoIndex_L002_R1_004.fastq
J06643_NoIndex_L002_R1_005.fastq
J06643_NoIndex_L002_R1_006.fastq
J06643_NoIndex_L002_R1_007.fastq
J06643_NoIndex_L002_R1_008.fastq
J06643_NoIndex_L002_R1_009.fastq
J06643_NoIndex_L002_R1_010.fastq
J06643_NoIndex_L002_R1_011.fastq
J06643_NoIndex_L002_R1_012.fastq
J06643_NoIndex_L002_R1_013.fastq
J06643_NoIndex_L002_R1_014.fastq
J06643_NoIndex_L002_R2_001.fastq
J06643_NoIndex_L002_R2_002.fastq
J06643_NoIndex_L002_R2_003.fastq
J06643_NoIndex_L002_R2_004.fastq
J06643_NoIndex_L002_R2_005.fastq
J06643_NoIndex_L002_R2_006.fastq
J06643_NoIndex_L002_R2_007.fastq
J06643_NoIndex_L002_R2_008.fastq
J06643_NoIndex_L002_R2_009.fastq
J06643_NoIndex_L002_R2_010.fastq
J06643_NoIndex_L002_R2_011.fastq
J06643_NoIndex_L002_R2_012.fastq
J06643_NoIndex_L002_R2_013.fastq
J06643_NoIndex_L002_R2_014.fastq
It would seem logical that R1 is one end of the pair, and that R2 is the other. However, when I look at each set of files, I do not see the "/1" and "/2" designations. (according to this site, they should be there: http://loblolly.ucdavis.edu/bipod/ft...al_RNA-Seq.pdf)
R1_001:
<#0@@#############################################
@D3NH4HQ1:710G1KACXX:2:1101:1488:2217 1:N:0:
GTAAGGGCAAGGGCACTGAGCTATGTCATCTGGGCTCAAATTCTGCTACC
+
B@@FFFFFHHHHHJJJIJJJJJIJJIIGIIIJIJJGIGGIIIGJIEIIIH
@D3NH4HQ1:710G1KACXX:2:1101:1279:2224 1:Y:0:
GGCTTATTTGATACTCATGGTACAGAAGCGACGATCAAATAGATTGAGAA
R2_001:
###4##22ADFHG#####################################
@D3NH4HQ1:710G1KACXX:2:1101:2135:2174 2:N:0:
NNGATGCAGGTGGCNNGGANNNNNNNNCGCCATNNTGCCTNNNNNNNNNN
+
##14A?DBD<CACB##42<########11??FE##00?B@##########
@D3NH4HQ1:710G1KACXX:2:1101:2088:2176 2:N:0:
NNTGTTGTCACTTTNNAGANNNNNNNNTTGCTATNAAGCTNNNNNNNNNN
Does this mean the data are not paired end?
I am trying to take some Illumina sequence data (mRNA-level complementary DNA I think) and prepare it for BLAST alignment. It is supposed to be paired-end. However, I'm trying to make sure this is true.
For example, I have the following data files:
J06643_NoIndex_L002_R1_001.fastq
J06643_NoIndex_L002_R1_002.fastq
J06643_NoIndex_L002_R1_003.fastq
J06643_NoIndex_L002_R1_004.fastq
J06643_NoIndex_L002_R1_005.fastq
J06643_NoIndex_L002_R1_006.fastq
J06643_NoIndex_L002_R1_007.fastq
J06643_NoIndex_L002_R1_008.fastq
J06643_NoIndex_L002_R1_009.fastq
J06643_NoIndex_L002_R1_010.fastq
J06643_NoIndex_L002_R1_011.fastq
J06643_NoIndex_L002_R1_012.fastq
J06643_NoIndex_L002_R1_013.fastq
J06643_NoIndex_L002_R1_014.fastq
J06643_NoIndex_L002_R2_001.fastq
J06643_NoIndex_L002_R2_002.fastq
J06643_NoIndex_L002_R2_003.fastq
J06643_NoIndex_L002_R2_004.fastq
J06643_NoIndex_L002_R2_005.fastq
J06643_NoIndex_L002_R2_006.fastq
J06643_NoIndex_L002_R2_007.fastq
J06643_NoIndex_L002_R2_008.fastq
J06643_NoIndex_L002_R2_009.fastq
J06643_NoIndex_L002_R2_010.fastq
J06643_NoIndex_L002_R2_011.fastq
J06643_NoIndex_L002_R2_012.fastq
J06643_NoIndex_L002_R2_013.fastq
J06643_NoIndex_L002_R2_014.fastq
It would seem logical that R1 is one end of the pair, and that R2 is the other. However, when I look at each set of files, I do not see the "/1" and "/2" designations. (according to this site, they should be there: http://loblolly.ucdavis.edu/bipod/ft...al_RNA-Seq.pdf)
R1_001:
<#0@@#############################################
@D3NH4HQ1:710G1KACXX:2:1101:1488:2217 1:N:0:
GTAAGGGCAAGGGCACTGAGCTATGTCATCTGGGCTCAAATTCTGCTACC
+
B@@FFFFFHHHHHJJJIJJJJJIJJIIGIIIJIJJGIGGIIIGJIEIIIH
@D3NH4HQ1:710G1KACXX:2:1101:1279:2224 1:Y:0:
GGCTTATTTGATACTCATGGTACAGAAGCGACGATCAAATAGATTGAGAA
R2_001:
###4##22ADFHG#####################################
@D3NH4HQ1:710G1KACXX:2:1101:2135:2174 2:N:0:
NNGATGCAGGTGGCNNGGANNNNNNNNCGCCATNNTGCCTNNNNNNNNNN
+
##14A?DBD<CACB##42<########11??FE##00?B@##########
@D3NH4HQ1:710G1KACXX:2:1101:2088:2176 2:N:0:
NNTGTTGTCACTTTNNAGANNNNNNNNTTGCTATNAAGCTNNNNNNNNNN
Does this mean the data are not paired end?
Comment