SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mira assembly jmpi Bioinformatics 12 02-16-2015 07:47 PM
MIRA assembly with MID barcodes for 454? raw937 Bioinformatics 4 06-14-2011 11:54 PM
MIRA assembly with MID barcodes for 454? raw937 454 Pyrosequencing 1 06-13-2011 11:52 AM
MIRA transcriptome assembly and isoforms JueFish Bioinformatics 5 02-02-2011 08:34 PM
A qusetion about denovo assembly 454 sequence using MIRA kentnf Bioinformatics 7 04-24-2009 05:36 AM

Reply
 
Thread Tools
Old 07-21-2011, 06:23 AM   #1
robelb4
Member
 
Location: Virginia

Join Date: Jul 2011
Posts: 10
Default Mira assembly -shell script

hello,

I'm a beginner in Assembly and shell scripting. My question is I already have pair-end reads generated using solexa. and on the MIRA documentation about mira input file processing (processing to make /1/2 illumina naming scheme), there is a shell script called prepdata.sh .

but my two reads are not form NCBI website;they're already unziped, so how can i modify the script to just work for an already unzipped file???

thanks,
R
robelb4 is offline   Reply With Quote
Old 07-21-2011, 06:36 AM   #2
gaffa
Member
 
Location: Gothenburg/Uppsala, Sweden

Join Date: Oct 2010
Posts: 82
Default

I'm not familar with this, but is this the script in question? (from http://mira-assembler.sourceforge.ne...deToMIRA.html):

Code:
######################################################################
#######
####### Prepare paired-end Solexa downloaded from NCBI
#######
######################################################################

# srrname:    is the SRR name as downloaded form NCBI SRA
# numreads:   maximum number of forward (and reverse) reads to take from
#              each file. Just to avoid bacterial projects with a coverage
#              of 200 or so.
# strainname: name of the strain which was re-sequenced

srrname="SRR030257"
numreads=5000000
strainname="REL8593A"

################################

numlines=$((4*${numreads}))

# put "/1" Solexa reads into file
echo "Copying ${numreads} reads from _1 (forward reads)"
zcat ../origdata/${srrname}_1.fastq.gz | head -${numlines} | sed -e 's/SRR[0-9.]*/&\/1/' >${strainname}-${numreads}_in.solexa.fastq

# put "/2" Solexa reads into file
echo "Copying ${numreads} reads from _2 (reverse reads)"
zcat ../origdata/${srrname}_2.fastq.gz | head -${numlines} | sed -e 's/SRR[0-9.]*/&\/2/' >>${strainname}-${numreads}_in.solexa.fastq

# make file with strainnames
echo "Creating file with strain names for copied reads (this may take a while)."
grep "@SRR" ${strainname}-${numreads}_in.solexa.fastq | cut -f 1 -d ' ' | sed -e 's/@//' -e "s/$/ ${strainname}/" >>${strainname}-${numreads}_straindata_in.txt
If so, then it should probably just be a matter of changing the "zcat" commands to "cat" (aswell as changing the appropriate file names of course, e.g. it shouldn't say ".gz"). I haven't tried this or anything but it should work. "zcat" is for printing gzipped files to screen, cat is for normal, non-zipped files.
gaffa is offline   Reply With Quote
Old 07-21-2011, 06:57 AM   #3
robelb4
Member
 
Location: Virginia

Join Date: Jul 2011
Posts: 10
Default

gaffa,

Thanks, that's what I thought and tried, but it's i'm getting a 'No such file ..' error . May be I misplaced the reads directory.

now I know the command is correct.

thanks
Robel
robelb4 is offline   Reply With Quote
Reply

Tags
assembler, mira-illumina

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:15 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO