quick memo for genome assembly workshop
Daniel Zerbino now talking about file formats, specifically for cancer genomes and the VCF format to report SV. #assemblathon #gaw
2011-03-16 03:21:29A good representation of variation is particularly important for cancer research. #gaw #assemlathon
2011-03-16 03:21:34There is significant work has been done to parallelize velvet. #gaw #assemblathon
2011-03-16 03:22:23Next #gaw talks is Yingrui Li, from BGI-Shenzhen: “NGS de novo assembly: Progresses and challenges”
2011-03-16 03:22:44Up next is our very own Yingrui Li with the final talk of the morning: "NGS de novo assembly: Progresses and challenges". #assemblathon #gaw
2011-03-16 03:24:46Last talk before lunch: Yingrui Li from BGI to discuss NGS assembly progress and challenges. #gaw
2011-03-16 03:25:15Yay Yingrui! RT @assemblathon: Next #gaw talks is Yingrui Li, from BGI-Shenzhen: “NGS de novo assembly: Progresses and challenges”
2011-03-16 03:25:40Yingrui Li, BGI::SOAPdenovo has a similar workflow with other assemblers. differences are in details. #gaw #assemblathon
2011-03-16 03:27:05Another plug of BGI's SOAPdenovo tool by Yingrui: http://bit.ly/hLJytc #assemblathon #gaw (& @genomeresearch http://bit.ly/gACJWq)
2011-03-16 03:27:44Velvet currently one of most read Genome Research articles http://bit.ly/gknwLy #gaw
2011-03-16 03:28:39eliminate errors in original raw reads makes graph much cleaners, use less RAM, reduce load in graph-reduction step,improve reliability #gaw
2011-03-16 03:30:57Up now our BGI colleague Yingrui Li discussing reducing graph complexity in his talk on "NGS de novo assembly: progress & challenges" #gaw
2011-03-16 03:31:19YL: using 27mers, trying to improve error-correction and use insert sizes that allow reads to overlap. for SOAPdenovo #gaw
2011-03-16 03:31:41Soapdenovo can use up to 27 kmer size for a reasonable RAM usage. #gaw
2011-03-16 03:32:00larger kmer up to 12y bp for contiggin to accommodate PE and long reads coming in new technology. #gaw
2011-03-16 03:34:16YL showing some error correction examples from (unpublished) oyster genome data amongst other examples. #assemblathon #gaw
2011-03-16 03:35:13scaffolding performance are hyper sensitive to parameter setting. #gaw #assemblathon
2011-03-16 03:36:16'scaftigs' intra-scaffold gaps between contigs. #gaw
2011-03-16 03:37:58Gap closure: based on conservatively constructed scaffolds to make scafigs. #gaw #assemblathon
2011-03-16 03:38:46Gap closure: add unique regions do not pass stringent filer before, add back repeats. This step has high risk to generate errors. #gaw
2011-03-16 03:41:24PE reads fill small gaps, characteristics of repeats are considered such as repeat sizes. #gaw #assemblathon
2011-03-16 03:42:58New word of the day: #scaftigs RT @assemblathon 'scaftigs' intra-scaffold gaps between contigs. #gaw
2011-03-16 03:45:39smaller kmer size should be used for highly heterogeneous genomes. #gaw #assemblathon
2011-03-16 03:46:051 round of assembly now takes only 1 day for human genome on a 256 GB memory node #gaw
2011-03-16 03:47:11YL: BGI Developing a cloud based assembler (dev. code: Hecate), cut memory fp to <32G. #gaw
2011-03-16 03:47:24