Seqanswers Leaderboard Ad

**gruberjd** · 04-16-2012, 11:15 AM

I'm not sure what the exact specificities of the new format are, but the 1:N:0 or 2:N:0 in the header denote what /1 and /2 used to. This wikipedia page is helpful:

FASTQ format - Wikipedia

http://en.wikipedia.org/wiki/FASTQ_format

**westerman** · 04-16-2012, 11:17 AM

It would seem logical that R1 is one end of the pair, and that R2 is the other. However, when I look at each set of files, I do not see the "/1" and "/2" designations. (according to this site, they should be there: http://loblolly.ucdavis.edu/bipod/ft...al_RNA-Seq.pdf)

What did your parents (or teachers) tell you about not trusting everything you read on the internet.

The Illumina specs have changed back and forth a couple of times in the last several months. It looks like you received files from the time that they decided to remove the '/1' and '/2' designations. Instead look at the first number after the white space:

@D3NH4HQ1:71

0G1KACXX:2:1101:2088:2176 2:N:0:

The above is an R2 read.

**adamba** · 04-16-2012, 11:36 AM

Great. That makes total sense.

The first thing I would like to do is subtract all human sequences from the data. We are only interested in viruses. I have attempted this with the following process. Does this look correct?

2. Each set of R1 and R2 files were concatenated together using the following command, producing one R1 fastq file and one R2 fastq.
a. cat J06643_NoIndex_L002_R1_001.fastq J06643_NoIndex_L002_R1_002.fastq J06643_NoIndex_L002_R1_003.fastq J06643_NoIndex_L002_R1_004.fastq J06643_NoIndex_L002_R1_005.fastq J06643_NoIndex_L002_R1_006.fastq J06643_NoIndex_L002_R1_007.fastq J06643_NoIndex_L002_R1_008.fastq J06643_NoIndex_L002_R1_009.fastq J06643_NoIndex_L002_R1_010.fastq J06643_NoIndex_L002_R1_011.fastq J06643_NoIndex_L002_R1_012.fastq J06643_NoIndex_L002_R1_013.fastq J06643_NoIndex_L002_R1_014.fastq > J06_R1.fastq

3. Illumina adapters and low quality reads were removed using cutadapt.
a. cutadapt -f fastq -q 20 -a AGATCGGAAGAGC J06_R1.fastq > ./J06_trimmed.fastq

4. Bowtie against hg19 to subtract out all human sequences
a. bowtie --un J06_subtracted.fastq -p 8 --chunkmbs 512 hg19 -1 J06_R1_trimmed.fastq -2 J06_R2_trimmed.fastq J06.sam

**swbarnes2** · 04-16-2012, 12:14 PM

cat *R1*.fastq > JO6_R1.fq

Probably would have worked just as well, with a lot less typing.

If you know the virus you expect to see, it might work slightly better if you align against a genome that has human sequence and virus sequence together. You'll have to make the index for that yourself, rather than downloading the pre-made one. You can then filter the .bam for the lines that aligned to virus.

But that won't make a very big difference.

**adamba** · 04-16-2012, 12:36 PM

Hah, thanks. That would've saved me some time.

How about the cutadapt and bowtie commands?

For cutadapt, is -q 20 appropriate? Did I select the right adapter sequence, and is there a way to make sure of this?

For Bowtie, do I need to alter the "maxins" parameter? My reads are 50bp, and the default maxins parameter is 250.

Right now, Bowtie is outputting some blank and incomplete reads. Is that normal, and will it screw up the assembly step?

For example, here are the first few lines of the R1 bowtie output:

@D3NH4HQ1:71

0G1KACXX:2:1101:1233:2172 1:Y:0:
A
+
<
@D3NH4HQ1:71

0G1KACXX:2:1101:1406:2044 1:Y:0:
AAAA
+
<<<@
@D3NH4HQ1:71

0G1KACXX:2:1101:1317:2025 1:Y:0:
AGCT
+
<<<?
@D3NH4HQ1:71

0G1KACXX:2:1101:15237:2000 1:Y:0:

+

@D3NH4HQ1:71

0G1KACXX:2:1101:15197:2000 1:Y:0:

+

@D3NH4HQ1:71

0G1KACXX:2:1101:15556:2000 1:Y:0:

+

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Help with Illumina Paired-End Data

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News