Saturday, July 9, 2016

Week 4 and 5

What was done?
1) I have written four different tests for my LSA implementation, which consists of its applications. This tells us that my implementation is working correctly as well as providing appropriate results on real world tasks.
The task performed in these tests are -
 i) Document classification by SVM
 ii) Document Classification by KNN using cosine distance measure
 iii) Document clustering using Kmeans
 iv) Document Retrieval
For each of these tasks, I have used BBC news datasets in five classes, containing 2225 documents each with 9654 features. Using LSA, I have reduced the features to 50.
Time taken for LSA is approx 1 min 15 seconds on this matrix.

2) I have diagnosed and rectified the problem with current implementation of LibSVM so that it can now work with multiclass classification.

What will be done from now on?
1) As such, LSA is now completely implemented. For remaining time, I will be focusing on improving timings and research other methods for SVD. Since not many implementations are available for shared nothing architectures like HPCC platform, this will take some experimentation and trial and error approach.

No comments:

Post a Comment