When the weight vector of a linear classifier is in dense format, you can access weights in O(1).
2010-11-13 17:37:04When it's in sparse format, there's the overhead of using a hash table or b-tree.
2010-11-13 17:37:29So sparse-dense dot products are more efficient than sparse-sparse dot products.
2010-11-13 17:37:45Thus even for L1-induced sparse solutions, I'd go for a dense weight vector. Even w/ hundreds of millions of weights, it can fit in memory.
2010-11-13 17:38:10@mdreid The trick for frequent scalar-vector products is really smart. I think Leon Bottou got it from Pegasos' paper.
2010-11-13 17:43:44Bob Carpenter briefly mentions the problem of sparse weight vector in this post http://bit.ly/amLB5j
2010-11-13 17:45:35@mathieuen Not really. You can do a sparse-sparse dot product faster using the merge algorithm from mergesort. It's faster than a dense dp
2010-11-13 19:35:16@mathieuen Not really. You can do a sparse-sparse dot product faster using the merge algorithm from mergesort. It's faster than a dense dp
2010-11-13 19:35:16@mathieuen Say your data is a list of (index,weight) . Get two pointers, one for each vector, if they're equal add to the product, else
2010-11-13 19:36:21@mathieuen Say your data is a list of (index,weight) . Get two pointers, one for each vector, if they're equal add to the product, else
2010-11-13 19:36:21@mathieuen increment pointer to a smaller index and repeat until done. At most you will increment pointers N times.
2010-11-13 19:37:04@mathieuen increment pointer to a smaller index and repeat until done. At most you will increment pointers N times.
2010-11-13 19:37:04@mathieuen And after doing the dot product with another similar pass you can compute the SGD. You can even speed this up with bsearch
2010-11-13 19:44:26@mathieuen And after doing the dot product with another similar pass you can compute the SGD. You can even speed this up with bsearch
2010-11-13 19:44:26@mathieuen (binary search or use a tree to get the next index if one vector is quite dense so you can do this in even less time)
2010-11-13 19:45:14@mathieuen (binary search or use a tree to get the next index if one vector is quite dense so you can do this in even less time)
2010-11-13 19:45:14@atpassos_ml So the idea is to use the merge algorithm to find the dimensions which are non-zero in both vectors?
2010-11-13 20:07:35@atpassos_ml In the sparse-dense case, it can be a waste if many of the dim that are non-zero in the feat vect are zero in the weight vect
2010-11-13 20:10:35@atpassos_ml In the sparse-dense dp, u iterate over the non-zero dim in ur feat vect and retrieve the corresponding dim in the weight vect
2010-11-13 20:15:25@mathieuen Of course, in sparse-dense you just have to look at the sparse vector's entries. I mean in the sparse-sparse case.
2010-11-13 21:28:51@mathieuen Of course, in sparse-dense you just have to look at the sparse vector's entries. I mean in the sparse-sparse case.
2010-11-13 21:28:51@mathieuen My point is that with a clever sparse-sparse product (and the usual dumb sparse-dense product you mentioned) it's way faster
2010-11-13 21:31:29@mathieuen My point is that with a clever sparse-sparse product (and the usual dumb sparse-dense product you mentioned) it's way faster
2010-11-13 21:31:29@mathieuen to use a sparse representation for both the feature vector and the training data.
2010-11-13 21:31:47@mathieuen to use a sparse representation for both the feature vector and the training data.
2010-11-13 21:31:47