Hi!
I'm analyzing a "second-hand" dataset generated using SOLiD 4. It is a transcriptome mate pair library that is 52 x 37 nt, and I cannot for the sake of me find the protocol that was used to generate those specific read lengths. I have F3 and R3 reads, so I am assuming it is a circularization protocol, but I do not know what the size selection parameters were, or how the circles were cut to produce the final fragments. This info would be very valuable for a more accurate mapping.
Any knowledge would be greatly appreciated!
Thanks a lot,
Carmen
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
GATK does work on SHRIMP2 produced SAM files from SOLID pair-end reads. Here are the steps:
align with SHRIMP2 with the --single-best-mapping and --all-contigs flags.
use picard to fix the Read Group
run GATK.
Leave a comment:
-
For the sake of completeness there is PerM, which does handle paired end reads (as separate files). I stopped using it as it ignores any read containing a 'N'. It is not a gapped aligner. It is still being developed.
Leave a comment:
-
SHRiMP 2.2.2 does seem to be an alternative.
I can align ~60m SE 60bp exome reads to the human genome in about 10 hours using 47 threads. That gives it about a third of the runtime of novoalign-CS on this machine.
We will be testing SNP calls from alignments in the next few weeks so I can't say anything yet.
Leave a comment:
-
Originally posted by Zaag View Post1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.
2) I think the new version of BWA also stopped supporting colour space.
Have checked Lifescope 2.5 output with GATK, and yes, it appears to work fine with GATK. This is very good news, so thanks for bringing that to my attention!
Leave a comment:
-
1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.
2) I think the new version of BWA also stopped supporting colour space.
Leave a comment:
-
Originally posted by h2karen View PostWith regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
http://solid.community.appliedbiosys...om/thread/1182
Leave a comment:
-
With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
Leave a comment:
-
Originally posted by colindaven View PostWe can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.
A few notes-
1. Lifescope will be 10000 euros a year from what I've heard.
2. I wouldn't use Bowtie on genomic data, just for transcriptomes.
3. There is some debate on the BWA mailing list on them dropping colour space altogether.
Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-
Good point about SHRiMP2 - yes it does appear to have a paired-end option and I'll check it out - thanks for pointing that out (my memory of using Shrimp before is that it needed a lot of compute resources - specifically memory - it'll be interesting too see how it copes with paired-end data ).
Am disappointed to hear that BWA may be dropping colour-space support altogether - though not entirely surprised - it has always appeared to perform particularly poorly with colour-space data data and I see there is some debate that this may be a flaw in their colour-space implementation. Further, with ECC technology now available with the 5500xl I wonder how many other developers might also drop colour-space support?
I'm curious to know why you would avoid using Bowtie on WGS data - is it purely because of the indel issue? I've found bowtie mappings tend to give marginally better mappings than BWA - although the resultant BAM files are problematic when it comes to using GATK pipeline. I'm guessing you're suggesting it's suitability for RNA-seq because of TopHat?
Lastly - yes... 10,000 euros a year for Lifescope. It is worth it? Hmm. Discuss!!! (I suspect the few of us who are ACTUALLY using LifeScope will seriously consider spending that money on alternative commercial options rather than fork out on what is a very fragile and poorly designed piece of software - however, it's fair to say it's mapper does do a good job of making the most of colour-space data... at least in terms of coverage...).
Will look forward to what you have to say regarding Novoalign. Has anyone else any experience of using this mapper with SOLiD paired-end data?
Leave a comment:
-
We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.
A few notes-
1. Lifescope will be 10000 euros a year from what I've heard.
2. I wouldn't use Bowtie on genomic data, just for transcriptomes.
3. There is some debate on the BWA mailing list on them dropping colour space altogether.
Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-
Leave a comment:
-
Mapping SOLiD colorspace paired end reads
Hi, I'm trying to get a sense of what the current consensus is regarding the best practice for mapping SOLiD colour-space paired-end reads. As far as I can tell the options appear to be rather limited:
1. LifeScope.
Advantages: designed specifically for paired-end colour-space reads and maps the most (which may also be a disadvantage… ). Disadvantages: Very slow and very resource hungry. Overly complicated command line interface. Can only run on a limited range of hardware/software. Difficult to leverage in to GATK. And to rub salt into the wound, AB are planning to charge for it in the near future.
2. Bowtie.
Advantages: Fast. Resource efficient. Multithreaded. Easy to use command line interface. Disadvantages: Tricky to leverage into GATK. Most importantly, Bowtie cannot accommodate indels. And Bowtie2 will not accept colour-space
3. BWA
Advantages: Fast. Efficient. Multithreaded. Accommodates indels . Easy to use interface. Easily leveraged into GATK pipeline (to a point - see below). BUT! NO current support for paired-end SOLiD data (Mate-pair yes, but not, it would seem, paired-end) - current workaround would appear to be to reverse F5 reads (and associated QVs), preferably trimming heavily at the 3' end to minimise potential problem of reversed error profile issues (though how much of an issue is this?). Creates all sorts of issues when leveraging into GATK (particularly as BWA does not include colour-space data in BAM, a prerequisite for GATK recalibration).
4. BFAST
Am still exploring this option -however would appear to accept paired-end colour-space data natively and can be leveraged into GATK pipelines but interface is a bit challenging, documentation a bit opaque, and parts of the process can be very slow.
Is there anything I've missed out? What are people's preferred strategy, particularly if they want to leverage into GATK for recalibration purposes? Are we all going to be using ECC generated base-space data so development in colorspace-compatible tools will dry up? Any thoughts on this subject would be most welcome!
Latest Articles
Collapse
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
33 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
34 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
||
Started by seqadmin, 12-11-2024, 07:45 AM
|
0 responses
46 views
0 likes
|
Last Post
by seqadmin
12-11-2024, 07:45 AM
|
Leave a comment: