Chartyn Blog

The code here suggests the optimal number of clusters (K) using the gap statistic, and can be used before implementing K-means clustering.
SVMs tend to slow down when trained on large datasets. Solutions to this problem involve reducing the size of the training datasets (without jeopardising performance of the SVM).
This beauty of this algorithm is in its simplicity. Derived from a famous 1997 publication by Vasileios Hatzivassiloglou and Kathleen R. McKeown; this algorithm can be used to build a context-specific sentiment lexicon.
Creating unique dictionaries for things -- be they people, businesses, places, products, or otherwise -- opens up quite a few opportunities for implementing text analytics. This code takes texts that describe two "things", converts them into item-specific dictionaries, and uses that to generate a similarity score.
This code may be useful for classification problems where the dependent variable is imbalanced. It oversamples the minority class using SMOTE or replication, and returns the best of these. Where the original dataset outperforms the mentioned techniques, the code returns the original dataset.