Chapter 3. Data Sources

Table of Contents

Importing data from a CSV file
Importing data from an Excel file
Creating an AML file for reading a data file
Importing data from an XML file
Importing data from a database

Importing data from a CSV file

Description

The process demonstrates how to import data from CSV files using the Read CSV and the Open File operators. In the experiment we use a real-time earthquake data feed provided by USGS in CSV format. First, we download the feed to able to import it into RapidMiner using the Import Configuration Wizard of the Read CSV operator. The wizard guides the user step by step through the import process and helps him to set the parameters of the operator correctly. After the local copy of the feed is successfully imported into RapidMiner, we can switch to the live feed adding the Open File to the process.

Input

The United States Geological Survey (or USGS for short) provides real time earthquake data feeds at the Earthquake Hazards Program website. Data is available in various formats, including CSV. The experiment uses the feed of the magnitude 1+ earthquakes in the past 30 days in CSV format from the URL http://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php. The feed is updated in every 15 minutes.

Output

An ExampleSet that contains data imported from the CSV feed.

Figure 3.1. Metadata of the resulting ExampleSet.

Metadata of the resulting ExampleSet.

Figure 3.2. A small excerpt of the resulting ExampleSet.

A small excerpt of the resulting ExampleSet.

Interpretation of the results

Each time the process is run, it will read live data from the web.

Video

Workflow

import_exp1.rmp

Keywords

importing data
CSV

Operators

Open File
Read CSV