Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • superligang
    replied
    I don't run experiments in biology either
    I downloaded this DNA data, and I guess probably RNA-seq is different from DNA-seq. When I aligned some RNA-seq data, the two ends of a pair of read frequently overlap with each other.
    soapsnp uses 400 as the default minimal insert size, and so the overlapping part is not a problem in their DNA data.
    Thank you for your clarification.
    Last edited by superligang; 01-07-2011, 01:27 PM.

    Leave a comment:


  • wilhelml
    replied
    oops... I didn't mean 400Kb fragments, I meant just 400nt.

    And I guess I can assume your read length is 35.

    Leave a comment:


  • wilhelml
    replied
    I don't think that is correct actually, regarding insert size.

    The DNA is fragmented and then adapters are ligated. The size of the fragment to which the adapters are ligated is your total size. So if you cut 400Kb fragments from a gel and attach adapters to that, then sequence 80mers, you'll have an actual insert size of 400-(80*2)=240. Though, Soap wants the insert size as the total size so you'd use the 400 number.

    So the question becomes: To what size of DNA fragment were the adapters ligated? And.. What is the length that was sequenced?

    Also, what technology was used for sequencing?

    This article may shed light...


    Disclaimer: I've never actually made libraries, I work entirely on the software side of things, just trying to help a little.

    Leave a comment:


  • superligang
    replied
    Originally posted by wilhelml View Post
    Your situation with a 35bp insert and read lengths of apparently > 35bp is not something I've ever confronted before. Is this really the case? I don't quite understand why a library would be constructed like this.
    Larry W.
    I don't know if I understand the insert size correctly, but I use insert size as the distance between the two farthest bases of the two ends of a pair of reads.
    For read of 35bp long, the minimal insert size could be 35 when the two reads of a pair overlap completely. The insert size is larger than 35bp if there is a gap between the two reads.
    Thanks.

    Leave a comment:


  • wilhelml
    replied
    'Scaffold' doesn't mean too much in this context actually. It just happens to be how the sequences to which I aligned my reads were named.

    After reading your original post more carefully I realize I'm not really addressing your issue as this was all done with single end reads. I suspect it should still work though.

    Your situation with a 35bp insert and read lengths of apparently > 35bp is not something I've ever confronted before. Is this really the case? I don't quite understand why a library would be constructed like this.

    Larry W.

    Leave a comment:


  • superligang
    replied
    Interesting, but I don't quite understand what scaffold means in this context.
    I hope that soapsnp handles the overlapping part of the two ends of a pair of read reasonably.
    Thank you.
    Last edited by superligang; 01-07-2011, 11:36 AM.

    Leave a comment:


  • wilhelml
    replied
    Hi,

    I can't help you regarding your first and third question, however this is how we handle question 2...

    Split the soap output file into one file for each scaffold like so:

    awk '{ print $0 >> $8".soap" }' alignment

    (in this example the soap alignment file is titled 'alignment' and you'll wind up with a file for each scaffold such as scaffold_1.soap).

    Then you could run this shell script (requires tcsh I believe)

    foreach file (`ls -1 *.soap`)
    sort -k 9 -n $file > $file.sorted
    end

    Or this perl script...
    (no promises that this is the simplest most concisely written piece of code in the world but I know it works)

    ----------------------------
    #!/usr/bin/perl
    use strict;

    # Sort SOAP alignment output files by position - 9th column
    # lines like this:
    # 4_100_10086_4044\1 TAACGGTAATTTTTTTTAAAAAACAAAATCATATTTTCCT ffddcffff`dfffffefffffffffffafdeffcdffff 1 a 40 - scaffold_9 1593868 0 40M 40

    print STDERR "start at: ".localtime()."\n";

    my $h = {};
    my $f = $ARGV[0];
    open(F,$f) or die "can't open input file: $f $!\n";
    while (<F>){
    my @l = split(/\t/,$_);
    $h->{$l[0]}->{'pos'} = $l[8];
    $h->{$l[0]}->{'line'} = $_;;
    }
    close F;
    print STDERR scalar(keys %{$h})." lines read from $f\n";

    foreach my $id (sort { $h->{$a}->{'pos'} <=> $h->{$b}->{'pos'} } keys %{$h}){
    print $h->{$id}->{'line'};
    }

    print STDERR "sort_soap.pl completed sorting $f successfully at ".localtime()."\n";

    ---------------------------

    Larry Wilhelm
    Bioinformatics
    Oregon State University

    Leave a comment:


  • superligang
    started a topic Questions about soapaligner and soapsnp

    Questions about soapaligner and soapsnp

    I am using soapaligner+soapsnp to find SNVs in our pair-end DNA data. I read the online manual, and the tools have been running smoothly.

    However, there are three things I am not sure about.

    First, how should I request soapaligner to report repetitive results for SNV discovery? Should it report none, random or all best-hit of the repetitive alignments of a certain read?

    Second, I understand the input for soapsnp should be sorted according to the chromosomes and coordinates. The results from soapaligner is organized in the way that each line corresponds to one end of a pair of reads. After the soapaligner output is sorted, the lines of the two ends of the same read would appear far away from each other, with the alignments of many other reads between them. Is this the correct method to sort the soapaligner result?

    Third, the minimal insert size of soapaligner is 400 while the insert size of our data is only 35bp, and so the two ends of a pair of reads could overlap each other. If there is a SNV within the overlapping part of the two ends
    of a pair-end read, and the two ends may or may not have the same base, quality score on the SNV, how would the program handle this situation? I read the paper of soapsnp, but I couldn't get a clue.

    Thank you for your help.

    Update,

    The read length is 35bp in our data.
    Last edited by superligang; 01-07-2011, 11:56 AM.

Latest Articles

Collapse

  • seqadmin
    Advancing Precision Medicine for Rare Diseases in Children
    by seqadmin




    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
    12-16-2024, 07:57 AM
  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-17-2024, 10:28 AM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-13-2024, 08:24 AM
0 responses
43 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-12-2024, 07:41 AM
0 responses
29 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
42 views
0 likes
Last Post seqadmin  
Working...
X