Part II. RapidMiner

Table of Contents

3. Data Sources
Importing data from a CSV file
Importing data from an Excel file
Creating an AML file for reading a data file
Importing data from an XML file
Importing data from a database
4. Pre-processing
Managing data with issues - Missing, inconsistent, and duplicate values
Sampling and aggregation
Creating and filtering attributes
Discretizing and weighting attributes
5. Classification Methods 1
Classification using a decision tree
Under- and overfitting of a classification with a decision tree
Evaluation of performance for classification by decision tree
Evaluation of performance for classification by decision tree 2
Comparison of decision tree classifiers
6. Classification Methods 2
Using a rule-based classifier (1)
Using a rule-based classifier (2)
Transforming a decision tree to an equivalent rule set
7. Classification Methods 3
Linear regression
Osztályozás lineáris regresszióval
Evaluation of performance for classification by regression model
Evaluation of performance for classification by regression model 2
8. Classification Methods 4
Using a perceptron for solving a linearly separable binary classification problem
Using a feed-forward neural network for solving a classification problem
The influence of the number of hidden neurons to the performance of the feed-forward neural network
Using a linear SVM for solving a linearly separable binary classification problem
The influence of the parameter C to the performance of the linear SVM (1)
The influence of the parameter C to the performance of the linear SVM (2)
The influence of the parameter C to the performance of the linear SVM (3)
The influence of the number of training examples to the performance of the linear SVM
Solving the two spirals problem by a nonlinear SVM
The influence of the kernel width parameter to the performance of the RBF kernel SVM
Search for optimal parameter values of the RBF kernel SVM
Using an SVM for solving a multi-class classification problem
Using an SVM for solving a regression problem
9. Classification Methods 5
Introducing ensemble methods: the bagging algorithm
The influence of the number of base classifiers to the performance of bagging
The influence of the number of base classifiers to the performance of the AdaBoost method
The influence of the number of base classifiers to the performance of the random forest
10. Association rules
Extraction of association rules
Asszociációs szabályok kinyerése nem tranzakciós adathalmazból
Evaluation of performance for association rules
Performance of association rules - Simpson's paradox
11. Clustering 1
K-means method
K-medoids method
The DBSCAN method
Agglomerative methods
Divisive methods
12. Clustering 2
Support vector clustering
Choosing parameters in clustering
Cluster evaluation
Centroid method
Text clustering
13. Anomaly detection
Searching for outliers
Unsupervised search for outliers
Unsupervised statistics based anomaly detection