Home

People

Research

Publications

Contact

Information-Theoretic Outlier Detection

for Large-Scale Categorical Data

Entropy, total correlation and holoentropy for outlier detection

 

People

Shu Wu
Shengrui Wang

 

Overview

Outlier detection can usually be considered as a pre-processing step for locating, in a data set, those objects that do not conform to well-defined notions of expected behavior. It is very important in data mining for discovering novel or rare events, anomalies, vicious actions, exceptional phenomena, etc. We are investigating outlier detection for categorical data sets. This problem is especially challenging because of the difficulty of defining a meaningful similarity measure for categorical data. In this paper, we propose a formal definition of outliers and an optimization model of outlier detection, via a new concept of holoentropy that takes both entropy and total correlation into consideration. Based on this model, we define a function for the outlier factor of an object which is solely determined by the object itself and can be updated efficiently. We propose two practical 1-parameter outlier detection methods, named ITB-SS and ITB-SP, which require no user-defined parameters for deciding whether an object is an outlier. Users need only provide the number of outliers they want to detect. Experimental results show that ITB-SS and ITB-SP are more effective and efficient than mainstream methods and can be used to deal with both large and high-dimensional data sets where existing algorithms fail.

 

Paper

Information-Theoretic Outlier Detection for Large-Scale Categorical Data

Shu Wu, Shengrui Wang

IEEE Transactions on Knowledge and Data Engineering (TKDE 2013)

[PDF]

 

Experimental Results

Results of efficiency test

 

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions allowing to improve the quality of this paper. The work has been funded by Discovery and Discovery Accelerator Supplements programs of Natural Sciences and Engineering Research Council of Canada (NSERC) granted to Prof. Shengrui Wang. Shu Wu has been partly supported by the PhD Scholarship program of the China Scholarship Council.

© Multi-Modal Computing Group. All rights reserved.
95 Zhongguancun East Road, Haidian District, P.O. Box 2728, 100190 Beijing, P.R. China.