Data Sets

[Aggregation] Aristides Gionis. Heikki Mannila. Panayiotis Tsaparas. “Clustering Aggregation”. 4:1–4:30. ACM Transactions on Knowledge Discovery from Data (TKDD). 1. 1. 2007.

[Compound] C. T. Zahn. “Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters”. 68–86. IEEE Transactions on Computers. 20. 1. 1971.

[CoIL Challenge 2000] Peter van der Putten. Maarten van Someren. CoIL Challenge 2000: The Insurance Company Case. 2000. Sentient Machine Research. Amsterdam. Also a Leiden Institute of Advanced Computer Science. Technical Report 2000-09.

[Concrete] I-Cheng Yeh. “Modeling of strength of high performance concrete using artificial neural networks”. 1797–1808. Cement and Concrete Research. 28. 12. 1998.

[Detrano et al.] R. Detrano. A. Jánosi. W. Steinbrunn. M. Pfisterer. J. Schmid. S. Sandhu. K. Guppy. S. Lee. V. Froelicher. “International application of a new probability algorithm for the diagnosis of coronary artery disease”. 304–310. American Journal of Cardiology. 64. 5. 1989.

[Extended Bakery] Alex Dekhtyar. Jacob Verburg. Extended Bakery Dataset. 2009.

[Flame] Limin Fu. Enzo Medico. FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data”. BMC Bioinformatics. 8. 3. 2007.

[Jain] Anil K. Jain. Martin H.C. Law. “Data Clustering: A User's Dilemma”. 1–10. Lecture Notes in Computer Science Volume. 3776. Springer. 2005.

[Maximum Variance] C. J. Veenman. M. J. T. Reinders. E. Backer. “A Maximum Variance Cluster Algorithm”. 1273–1280. IEEE Transactions on Pattern Analysis and Machine Intelligence. 24. 9. 2002.

[SIPU Datasets] Clustering datasets. Speech and Image Processing Unit, School of Computing, University of Eastern Finland.

[StatLib] Mike Meyer. Pantelis Vlachos. StatLib—Datasets Archive. 2005.

[Titanic] Robert J. MacG. Dawson. The Unusual Episode Data Revisited”. Journal of Statistics Education. 3. 3. 1995.

[Two Spirals] Kevin J. Lang. Michael J. Witbrock. “Learning to Tell Two Spirals Apart”. 52–59. David Touretzky. Geoffrey Hinton. Terrence Sejnowski. Proceedings of the 1988 Connectionist Models Summer School. Morgan Kaufmann. 1988.

[UCI MLR] K. Bache. M. Lichman. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. 2013.


[DMBOOK] Pang-Ning Tan. Michael Steinbach. Vipin Kumar. Introduction to Data Mining. Addison-Wesley. 2005.

[LIBSVM] Chih-Chung Chang. Chih-Jen Lin. “LIBSVM: A Library for Support Vector Machines”. 27:1–27:27. ACM Transactions on Intelligent Systems and Technology. 2. 3. 2011. Software available at

[Neural Network FAQ] Warren S. Sarle. Neural Network FAQ, part 3 of 7: Generalization. 1997. Periodic posting to the Usenet newsgroup

[RapidMiner Manual] RapidMiner 5.0 Manual. Rapid-I GmbH. 2010.

[SAS Enterprise Miner] Getting Started with SAS® Enterprise Miner. SAS Institute Inc.. 2011.

[SAS Enterprise Miner Ref] SAS® Enterprise Miner™: Reference Help. SAS Institute Inc.. 2011.