Seqanswers Leaderboard Ad

**Michael.Ante** · 04-08-2015, 02:37 AM

You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).

**kaps** · 04-11-2015, 02:38 AM

Thanks Michael,

I have not used Bowtie/ samtools before. How do I start off?

**GenoMax** · 04-11-2015, 03:39 AM

1. What format are your blast results in (html, xml, text)? You may be able to parse that result file if all you want to know is how many sequences hit a "virus".

2. If you did the blast locally do you have a sequence file with all "virus" sequences available? You will be able to use that file as an input for bowtie2 and follow the path @Michael.Ante suggested.

3. Are you comfortable using command line (e.g. linux) applications?

**kaps** · 04-11-2015, 04:57 AM

1. blast results are in txt format
2. yes. sequence file is available ( database file?)
3. am fairly comfortable with command line
4. I would prefer the command prompt option as opposed to logging on the cluster (my internet is erratic)

**Michael.Ante** · 04-14-2015, 12:23 AM

Originally posted by kaps View Post

Thanks Michael,

I have not used Bowtie/ samtools before. How do I start off?

Hi Kaps,

you should have a look at the Bowtie2 homepage. There, it is explained in detail how the programs work. At the end of the manual is a "Lambda phage example", which has quite an overlap to your problem. It also has a SAMtools downstream section...

Cheers,
Michael

**kaps** · 04-19-2015, 11:57 PM

Hi, Michael

Thanks for pointing this to me!

**kaps** · 04-22-2015, 01:54 AM

Originally posted by Michael.Ante View Post

You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).

Hello Michael, when I try samtools idxstats,
I am getting a comment as below;

samtools idxstats lib4seq.sorted.bam
[bam_idxstats] fail to load the index.

what could be the problem?

**Michael.Ante** · 04-22-2015, 02:42 AM

Have you created the index with

Code:

samtools index lib4seq.sorted.bam

?
If yes how does your bam-dile header looks like?

Code:

 samtools view -H lib4seq.sorted.bam

**kaps** · 04-22-2015, 06:05 AM

I had not created the index,
I can now see the statistics in the index file!

Thanks

**kaps** · 04-23-2015, 05:43 AM

Originally posted by Michael.Ante View Post

You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).

In a case where my index file has several sequences for different strains/isolates of the same virus which may be treated as duplicates, how do I restrict bowtie2 to do the alignment once?

**kaps** · 05-11-2015, 11:04 PM

Originally posted by Michael.Ante View Post

You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).

Hello,

After getting the samtools idxstats (on number of mapped vs unmapped reads), is it possible to extract/select reads that mapped from the raw read files/query? how is it done?

**GenoMax** · 05-12-2015, 03:20 AM

If you had used the "--un-conc and --al-conc" options (http://bowtie-bio.sourceforge.net/bo...output-options) the unmapped reads could have been written to separate files when you did the alignment.

1. You could repeat bowtie2 alignment with above parameters added to your original list (easier) OR
2. Identify read ID's of sequences that mapped and use a tool like seqtk to extract the mapped reads (e.g. seqtk subseq in.fq name.lst > out.fq)

Use @Michael.Ante's easy suggestion below

**Michael.Ante** · 05-12-2015, 04:24 AM

You can use samtools view to extract the mapped/unmapped reads by filtering the 'unmapped' flag:

Code:

samtools view -F 4 -bh lib4seq.sorted.bam > lib4seq.sorted.mapped.bam
samtools view -f 4 -bh lib4seq.sorted.bam > lib4seq.sorted.unmapped.bam

Samtools view will help a lot; just have a look at some tutorials, for instance Dave's wiki

**kaps** · 05-19-2015, 02:38 AM

if i want to convert lib4seq.sorted.mapped.bam to a fastq file (creating 2 files for paired end) do i need to sort this bam file again?

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Calculating percentage of reads aligning to a subject

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News