Monday, July 25, 2016

Week 6 and 7

What was done ?
1) Improved timings by identifying sources of runtime bottlenecks (matrix multiplication in Range Finder). 
Current timings : (100 components, 100 partitions, 100 node cluster)
10000 x 2000 - 2 min 51 sec
50000 x 10000 - 3 min 21 sec
100000 x 10000 - 6 min 

2) Experimented with sparse matrix multiplication in blocked format using eigen library
Observations :
i) Using CSC forma for storage reduce cost of distributing blocks.
ii) After local multiplication, blocks are no longer sparse. This becomes a bottleneck since we do not get any benefit from sparse addition (axpy), but bringing sparse blocks in and out of C++ is time consuming.
Thus, this approach is viable in its current state.

What needs to be done ?
1) Continue working with sparse matrix multiplication to experiment with other approaches.

Saturday, July 9, 2016

Week 4 and 5

What was done?
1) I have written four different tests for my LSA implementation, which consists of its applications. This tells us that my implementation is working correctly as well as providing appropriate results on real world tasks.
The task performed in these tests are -
 i) Document classification by SVM
 ii) Document Classification by KNN using cosine distance measure
 iii) Document clustering using Kmeans
 iv) Document Retrieval
For each of these tasks, I have used BBC news datasets in five classes, containing 2225 documents each with 9654 features. Using LSA, I have reduced the features to 50.
Time taken for LSA is approx 1 min 15 seconds on this matrix.

2) I have diagnosed and rectified the problem with current implementation of LibSVM so that it can now work with multiclass classification.

What will be done from now on?
1) As such, LSA is now completely implemented. For remaining time, I will be focusing on improving timings and research other methods for SVD. Since not many implementations are available for shared nothing architectures like HPCC platform, this will take some experimentation and trial and error approach.