SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ERANGE and other packages for RNAseq analysis warrenemmett RNA Sequencing 9 07-02-2013 12:58 PM
Software packages capable of aligning roughly 9000 bp josecolquitt Bioinformatics 4 05-18-2010 04:17 AM
DNAnexus free account: next-gen sequence analysis in the cloud DNAnexus Vendor Forum 0 04-27-2010 10:46 PM
Sequence Analysis Software Developer Cofactor Genomics Industry Jobs! 0 01-27-2010 09:02 AM
Companies offering next gen sequence analysis services gavin.oliver Bioinformatics 8 01-12-2010 04:27 AM

Closed Thread
 
Thread Tools
Old 09-24-2009, 08:57 AM   #181
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default sortpeaks

Yeah sure,
I had this huge I human seq reads that I aligned using bowtie. This bowtie alignment I need to convert into wig files. So I have been using the separateReads as the first step in converting into wig. This worked fine and I got a gi|22XXXXXX|ref|NT_XXXXXX.12|.bg.bowtie also I have the same with .part.bowtie after I ran the separtereads.
Now on this file (uncompressed) I ran sortfiles using -Xmx2G memory heap specified. But after some lines it gives me a memory error.
I tried running sortfiles on the "gz"ed separate reads but did not work. The file was not recognisable or something.

Is it the bowtie mapped reads that is the problem and so I might need to use GERALD instead directly?
Or is it the separate reads/sortreads problem?
Hope this helps. I appreciate any suggestions in this matter.
I found findpeaks very cool but unfortunately not working for me now....
Ka123$ is offline  
Old 09-24-2009, 09:22 AM   #182
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

I seem to recall that bowtie is able to produce .map files - which would be pre-sorted and directly readable by FindPeaks without breaking it up into chromosomes. That might be a good first pass to try. (Assuming this is SET data. if it's PET data, you'll need to do the pairing anyhow, so SeparateReads wouldn't have been the right path to take.)

I suppose I should also mention that running SortReads.jar on .gz bowtie files *should* work. If you could send me the error you're getting, I may be able to track down the reason why it's not working for you.

And finally, I should probably also mention that bowtie seems to be doing something funny to your chromosome names. I don't use bowtie myself, but someone had previously reported to me that there was an option you can use to get more "sane" chromosome names. I would suggest you take a look - it may help you out downstream.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline  
Old 09-24-2009, 09:27 AM   #183
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

Thanks a lot I will try all the options you gave me and let u know how it worked for me.
Ka123$ is offline  
Old 09-25-2009, 07:24 AM   #184
nathan.genome
Junior Member
 
Location: udine

Join Date: Sep 2009
Posts: 1
Default hello everybody

hello everybody

i am working on a resequencing project. i have a reference genome and a set of sanger pairmates from a genotype. i identified a list of structural variations. i want to visualize them. Can i use lookseq ?

thanks
nathan
nathan.genome is offline  
Old 09-28-2009, 02:35 AM   #185
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

what kind of formats are BUSTARD and GERALD files from solexa?
Ka123$ is offline  
Old 09-28-2009, 03:10 AM   #186
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

If I would directly perform separate reads and sort reads on the GERALD alignment files what type of aligner do I need to specify? GERALD/Eland if specified give me an error on fndpeaks
Error: Did not recognize aligner type: GERALD/Eland
Error: Please check that you have not made a spelling mistake when providing the alignment type
same error if I specify only Eland.....so what type of an aligner is used GERALD files from solexa?
Ka123$ is offline  
Old 09-28-2009, 07:52 AM   #187
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi Ka123$,

Gerald and bustard are files produced by the Illumina Pipeline, as far as I know, and neither one should contain useful information about the origin of a fragment. Only output from an aligner can be used in the context of peak finding.

For a list of formats accepted by FindPeaks, please see the following page:

http://sourceforge.net/apps/mediawik...e=InputFormats

If you're having an error with Eland files, please let me know what it is, and I'll try to fix it.

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline  
Old 09-28-2009, 09:33 AM   #188
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,169
Default

Kal,

Bustard and GERALD are not files with a format in the sense you are asking. Bustard and GERALD are pipelines for processing Illumina short reads data. They generate many different output files with many different formats.

The Bustard pipeline performs base calling starting with signal intensity information. The primary output of the Bustard pipeline are qseq files. These files are a format peculiar to Illumina which contain the read ID, base calls and quality scores for each read on a single line as a set of tab separated values. Bustard may output other files (e.g. qval, prb) depending on options supplied when the pipeline is launched.

GERALD is the pipeline for performing alignments using one of two different aligners supplied with the Pipeline software. The first aligner, PhageAlign is only useful for very small genomes and data sets and is almost never used so I will forego any further mention of it. The primary aligner supplied with the Illumina pipeline is Eland. GERALD calls the Eland aligner and passes it a set of configuration parameters. Eland outputs a number of files which all have similar (but slightly different) formats. Some examples of the files generated by Eland are s_N_eland_extended.txt, s_N_eland_multi.txt (where N = lane number from the Illumina run). These files basically list each read, its sequence and quality scores, where it matches the reference sequence and what mismatches exist between the read and the reference. Which files Eland generates and details of their format will be dependent on the arguments used when invoking Eland. GERALD may also be used to output sequence files in FASTQ format.
kmcarr is offline  
Old 09-28-2009, 10:33 AM   #189
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

Thanks to both kmcarr and apfejes !
I did belive that GERALD generates the Eland format files. But when I used GERALD files to perform a separate reads according to findpeaks and I used ELAND as an aligner name it gave me an error saying that it was a wrong aligner name.......hence needed a confirmation as to what I thought was actually the correct thing or not.....
I dont know why it said that?
Did I have to use GERALD.fa or the export file? not sure....

Why I needed to use GERALD instead of aligned files?
Reason being,when I used the findpeaks tool to perform a conversion of my aligned files to wig files , I would need to go through the separate and sort files..... When I perform separate files using bowtie aligned files, I get just one gi|......|.......|.part.bowtie.gz which contains the contigs with each contig having the name gi|.....|.....| etc along with their position w.r.t the reference.

Why did I get only one gi|........file although I have separated it? if I sorted this either a gz or gunzipped I get memory error
as whenever I used sort files on this I get memory heap error: at 2300000 lines read.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.String.substring(Unknown Source)
at java.lang.String.subSequence(Unknown Source)
at java.util.regex.Pattern.split(Unknown Source)
at java.lang.String.split(Unknown Source)
at java.lang.String.split(Unknown Source)
at src.lib.ioInterfaces.BowtieIterator.next(BowtieIterator.java:145)
at src.lib.ioInterfaces.BowtieIterator.next(BowtieIterator.java:20)
at src.lib.ioInterfaces.Generic_AlignRead_Iterator.hasNext(Generic_AlignRead_Iterator.java:103)
at src.fileUtilities.SortFiles.main(SortFiles.java:79)

although I use -Xmx2G........


So we thought we could use GERALD to separate into indiv chr and then sort on each indv chr instead?????

ANy suggestions?
Ka123$ is offline  
Old 09-28-2009, 10:46 AM   #190
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi ka123$,

kmcarr is right - Gerald is an intermediate program along the way from the sequencing machine to getting results. It's not an appropriate place to look for files to work with FindPeaks.

If your problem is with the sorting and pre-processing, you might consider using the s_N_sorted.txt produced by findPeaks. It's pre-sorted, so it should make your life easier.

I should also mention that the "-aligner" format used sets the format and some of the behaviours of FindPeaks. If you've selected "-aligner eland", then FindPeaks expects the files you provide to be in the Eland format. I don't know what format Gerald uses, but I'm certain it's not the same as the output from the Eland aligner.

As for the problem you're seeing, I'm not sure why 2.3M reads would cause an out of memory error, however, I suspect that despite allocating 2Gb of RAM, the machine you're using actually has less than that free. (-Xmx2G sets the maximum the application is allowed to use, not the actual amount available.) I've certainly sorted much larger files than that with the SortFiles program, although I do tend to use a machine with more than 2Gb of Ram so I don't see that problem myself.

I'm happy to try helping, but I think you need to clarify a few things for me. What aligner are you using, and what commands are you using? If we settle on one aligner, I can point you in the right direction as to the work flow you're using, and if I can see the commands you're using, I can check to see if any of the parameters should be changed.

Cheers,

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline  
Old 09-28-2009, 01:05 PM   #191
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,169
Default

Quote:
Originally Posted by apfejes View Post
Hi ka123$,
I should also mention that the "-aligner" format used sets the format and some of the behaviours of FindPeaks. If you've selected "-aligner eland", then FindPeaks expects the files you provide to be in the Eland format. I don't know what format Gerald uses, but I'm certain it's not the same as the output from the Eland aligner.
Anthony,

Actually the GERALD output is the appropriate place to look. GERALD.pl is a wrapper script which (among other things) calls the Eland aligner. The output from Eland is then placed in the "GERALD_<DD-MM-YYYY>_<USERNAME>" folder. Included in that output is the s_N_eland_extended.txt, s_N_eland_multi.txt, s_N_export.txt and s_N_sorted.txt. As you stated the s_N_sorted.txt file should be able to be used in FindPeaks directly. (I've never done it myself so I can't speak from experience.)

After looking at your link above I think the problem may be that Kal needs to specify elandext as the "-aligner" parameter. While the program is still called the "Eland" the standard "eland" invocation is essentially deprecated. The program is now almost always invoked (through GERALD) using "eland_extended".

Last edited by kmcarr; 09-28-2009 at 01:11 PM. Reason: Add bit about eland_extended
kmcarr is offline  
Old 09-28-2009, 01:16 PM   #192
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi kmcarr - thanks for the clarification. I was under the impression that Gerald was simply one step in the process, rather than a wrapper around the Eland calls. It's getting harder and harder to keep on top of all of the different aligner formats and pipelines.

For the record, I rarely use Eland output of any form myself. We mainly use Maq here and I expect we'll be moving to SAM/BAM based formats in the future.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline  
Old 09-28-2009, 06:49 PM   #193
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

First of, thanks so much for all your guidance, from both of you!
I really appreciate it so much!

I previously tried using bowtie aligner. As bowtie aligner gave me only one separatefile.gz and I could not make sense of it.... We reverted to use GERALD alignment directly to separate and sort.........But here are the comands I have used using bowtie aligner:

Secondly I followed bowtie commands to do my alignment .
./bowtie -a -v 2 -f h_X_GERALD.fa h_sap (did I have to use the -chr here???)

I used findpeaks cmds here:
java -jar -SeparateReads.jar elandext p_align_copy p_7_ger

(before I had problems using this for gerald and it said aligner format not recognised,so according to the blog I used elandext
java -jar SeparateReads.ja
r elandext p_align_copy p_7_ger
Error: Couldn't create log file : p_7_ger/SeparateReads.log)

for sort reads previously I have used this cmd:
java -jar Sort* bowtie g_sort_7 p_7_ger/*.bowtie

(although it ran sometime gave me memory problems)
Ka123$ is offline  
Old 09-30-2009, 02:08 PM   #194
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

can anyone let me know why findpeaks separatereads.jar command cannot create a log file when I use the GERALD aligned files or the bowtie aligned files?
In GERALD aligned files I indicated elandext or eland_extended as the aligner type....?

Bowtie aligned files were giving me problems to run on findpeaks to separate and sort so I am directly converting gerald files to wig files although GERALD is probably not a best choice over bowtie alignment.
Any suggestion
Ka123$ is offline  
Old 09-30-2009, 02:11 PM   #195
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi Ka123$,

Once again, it would really help if you tell us what the error is that you're seeing. The most common errors are:

- Trying to write to a directory without permissions
- missing a parameter (FindPeaks won't start without it, and throws and error)
- a parameter is incorrect (FindPeaks won't start with an invalid parameter)

If you tell us what error you've got, I might be able to narrow it down.

EDIT:
Is the error above the same one? I think this is probably a path problem. You're trying to write to a directory called p_7_ger in the directory from which you're launching the jar program. Does that directory already exist?

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline  
Old 09-30-2009, 03:31 PM   #196
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

Hi Anthony,
so here is what I am doing. We have decided to stick with the GERALD files to convert it to wig.......(PI's order !)
I checked for unaligned files and none were there.
I have a .export file with a s_#_export.txt
java -Xmx2G -jar SeparateReads.jar elandext 7_XXXXXX_GERALD-YYYY-MM-DD.export G_sep_7
Version: Initializing class SeparateReads $Revision: 1082 $
Version: Initializing class Generic_AlignRead_Iterator $Revision: 1318 $
Version: Initializing class Log_Buffer $Revision: 1145 $
Version: Initializing class ElandExtIterator $Revision: 832 $
Exception in thread "main" java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at src.lib.ioInterfaces.ElandExtIterator.next(ElandExtIterator.java:180)
at src.lib.ioInterfaces.ElandExtIterator.next(ElandExtIterator.java:20)
at src.lib.ioInterfaces.Generic_AlignRead_Iterator.hasNext(Generic_AlignRead_Iterator.java:103)
at src.fileUtilities.SeparateReads.main(SeparateReads.java:69)
^[[A

It looks like that GERALD gives out a .txt file . How can I specify what type of aligner is gerald? If I did elandext or eland_extended it does not work......

is there a way to directly convert a .txt from solexa export files to .wig in findpeaks?
Ka123$ is offline  
Old 09-30-2009, 03:40 PM   #197
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi Ka123$,

Thanks for the detailed report! I've managed to re-create the problem by parsing a data set that is similar. I observed that the iterator crashes on reads marked with "QC", so I've modified the code in order to reject those reads.

I can do two things for you. The first is that I can compile the code for you and send you the latest version via email. The second is that I can check in the code changes so that you can check it out and compile it yourself. Either option is open.

Thanks again for the very helpful bug report!

Anthony

Edit: The code has been checked in to the repository, if you're interested in building from scratch.
__________________
The more you know, the more you know you don't know. —Aristotle

Last edited by apfejes; 09-30-2009 at 03:42 PM.
apfejes is offline  
Old 09-30-2009, 03:59 PM   #198
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

Thanks so much anthony ! If you could compile and email me that will be great!!!! I appreciate it so much!.......
Ka123$ is offline  
Old 09-30-2009, 07:15 PM   #199
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi Ka123$,

I'm sorry - I can't seem to find your email address. Could you send it to me again? I'll package up a copy for you in the morning.

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline  
Old 10-09-2009, 10:45 AM   #200
Ka123$
Member
 
Location: MD

Join Date: Jul 2009
Posts: 27
Default

Hi apfejes,
I had sent you my email ID earlier last week. I was wondering if you got it or not....Please can you check again. I am sending you a email with this thread and you can reply to me on that....Thanks
Ka123$ is offline  
Closed Thread

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO