SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
mapping multiple low cov genomes to one ref genome wbsimey Bioinformatics 0 07-15-2019 09:29 AM
Ref Genes to Custom Genome puggie Bioinformatics 1 03-11-2012 05:38 AM
Ref Genome Repeat Masker kwebb Bioinformatics 6 03-29-2010 09:45 PM
SOCS: Efficient mapping of Applied Biosystems SOLiD sequence data to a ref genome... ECO Literature Watch 0 10-20-2008 07:53 PM

Reply
 
Thread Tools
Old 01-25-2021, 01:06 PM   #1
wbsimey
Member
 
Location: san francisco

Join Date: Jul 2010
Posts: 14
Default mapping hundreds of RADseq fastqs to ref genome.

I would like to map hundreds of fastq.gz single-end RADseq files to a single reference genome.

But, bwa mem is making a map file (.sam) for every fastq file. So I end up with hundreds of individual maps. I want a single map with all fastq files mapped to the reference.
I have tried hundreds of failed commands, here is one failed example:
bwa mem -t 16 ../reference.Arrow.fasta ../fastqs/*.F.fq.gz > RADs_mapped.sam

Any ideas what I am doing wrong? Or what is wrong with my expectation of a single .sam file with all sequences mapped to my reference?
wbsimey is offline   Reply With Quote
Old 01-25-2021, 01:17 PM   #2
cmbetts
Senior Member
 
Location: Bay Area

Join Date: Jun 2012
Posts: 121
Default

It's treating each of your fastq files as an independent sample, and making a sam file for each, which is what most people generally want to do. If you don't care about keeping them separate, you can always merge the fastq files with cat prior to alignment or merge the many individual results using samtools. If you care about keeping track of which alignment came from which fastq, it's probably going to be easier to keep them separate and deal with there being many files programatically
cmbetts is offline   Reply With Quote
Old 01-25-2021, 01:48 PM   #3
wbsimey
Member
 
Location: san francisco

Join Date: Jul 2010
Posts: 14
Default

thank you cmbetts!

I am running this bash script, which seems to be working and producing many .bam files
Code:
for f in $(ls /Projects/RADseq/fastqs/*.gz)
do
        bwa mem -t 16 reference.Arrow.fasta $f |
        samtools view -b |
        samtools sort --threads 8 > $f.bam
done
wbsimey is offline   Reply With Quote
Old 01-26-2021, 04:53 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,092
Default

Any reason you are not using an established pipeline like STACKS.
GenoMax is offline   Reply With Quote
Old 01-26-2021, 06:51 AM   #5
wbsimey
Member
 
Location: san francisco

Join Date: Jul 2010
Posts: 14
Default

Quote:
Originally Posted by GenoMax View Post
Any reason you are not using an established pipeline like STACKS.
I am using Stacks, they recommend using bwa for reference based analyses. But, I could not get their recommended pipeline to work, yet.
wbsimey is offline   Reply With Quote
Reply

Tags
bwa, fastq.gz, rad-seq, radseq, reference mapping

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:11 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO