Implementation CMI |
|
Title |
CMI: An Information-Theoretic
Contrast Measure for Enhancing Subspace Cluster and Outlier
Detection |
Authors |
Hoang Vu Nguyen, Emmanuel Müller, Jilles Vreeken, Fabian Keller, Klemens Böhm
| |
|
This page provides information on how to run the implementation of our CMI algorithm. |
|
In order to execute CMI you can use the following command line structure:
Parameter |
Meaning |
-FILE_INPUT |
name of input file |
-FILE_SUB_OUTPUT |
name of output file for subspaces
|
-FILE_OSCORES_OUTPUT |
name of output file for outlier
scores |
-NUM_ROWS |
number of records |
-NUM_MEASURE_COLS |
number of columns |
-FIELD_DELIMITER |
field delimiter of the input file
|
-NUM_NEIGHBORS |
number of nearest neighbors |
-USE_DUSO_SEED |
set to 'true' to use CMIC |
-MAX_NUM_SUBSPACES |
top subspaces used |
-ALPHA |
subsample's size |
-NUM_SUBSAMPLING |
number of subsamples |
-NUM_SEEDS |
number of clusters |
-CANDIDATE_CUTOFF |
size of beam |
-MIN_PTS |
used for clustering |
-EPSILON |
used for clustering
| |
|
Default Parameter Settings
- CMI:
- Beam size M = 400 (set M = 32 for data set with less than 10
dimensions)
- Number of clusters Q = 10
- Expected subsample size ε = 0.1
- Number of subspaces = 100
|
|
Implementation and Code of CMI
|
We provide the executables and source code of our method in one file: cmi.jar
Note that in order to access the source files, one can rename and unzip the file.
For evaluation of outlier results, as in our paper, we provide an additional executable
assisting in the calculation of AUC measures: run.jar
(two input parameters: input file name and output file name) |