rapid analytics 1.0.000 manual

22
RapidAnalytics 1.0 User and Installation Manual Simon Fischer, Rapid-I GmbH December 16, 2010 Contents 1 Installation 3 1.1 RapidAnalytics/JBoss Bundle ......................... 3 1.1.1 Prerequisites .............................. 3 1.1.2 Extracting the RapidAnalytics Archive ............... 3 1.1.3 Creating a Database and User .................... 3 1.1.4 Additional Configuration ....................... 4 1.2 Manual Installation ............................... 4 1.2.1 Prerequisites .............................. 4 1.2.2 Creating a Database and User .................... 5 1.2.3 Copy Additional Files ......................... 5 1.2.4 Configuring a Security Domain .................... 5 1.2.5 Additional Configuration ....................... 5 2 Launching RapidAnalytics 5 3 Initial Web-based Configuration 6 3.1 Setting up Database Connections ....................... 8 3.2 Creating a User ................................. 9 4 Connecting RapidMiner to RapidAnalytics 9 1

Upload: roshan-daniel

Post on 16-Oct-2014

256 views

Category:

Documents


2 download

TRANSCRIPT

RapidAnalytics 1.0User and Installation Manual

Simon Fischer, Rapid-I GmbH December 16, 2010

Contents

1 Installation 31.1 RapidAnalytics/JBoss Bundle . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Extracting the RapidAnalytics Archive . . . . . . . . . . . . . . . 31.1.3 Creating a Database and User . . . . . . . . . . . . . . . . . . . . 31.1.4 Additional Configuration . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Manual Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Creating a Database and User . . . . . . . . . . . . . . . . . . . . 51.2.3 Copy Additional Files . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.4 Configuring a Security Domain . . . . . . . . . . . . . . . . . . . . 51.2.5 Additional Configuration . . . . . . . . . . . . . . . . . . . . . . . 5

2 Launching RapidAnalytics 5

3 Initial Web-based Configuration 63.1 Setting up Database Connections . . . . . . . . . . . . . . . . . . . . . . . 83.2 Creating a User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Connecting RapidMiner to RapidAnalytics 9

1

5 Working with RapidAnalytics and RapidMiner 115.1 Using the Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.1.1 Storing and Accessing Data in the Repository . . . . . . . . . . . . 115.1.2 Managing Access Rights . . . . . . . . . . . . . . . . . . . . . . . . 135.1.3 Accessing Data in Processes . . . . . . . . . . . . . . . . . . . . . . 15

5.2 Remote Process Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2.1 Running a Process Remotely . . . . . . . . . . . . . . . . . . . . . 175.2.2 Scheduled Process Execution . . . . . . . . . . . . . . . . . . . . . 185.2.3 Monitoring Job Execution . . . . . . . . . . . . . . . . . . . . . . . 18

5.3 Accessing Processes as Services . . . . . . . . . . . . . . . . . . . . . . . . 19

2

1 Installation

There are two ways to download and install RapidAnalytics: The JBoss bundle and themanual installation. The bundled installation is the easiest and recommended for mostusers. You basically need to unzip an archive, create a database and user, and edit oneconfiguration file in RapidAnalytics to specify this database. The manual installation isonly for experienced users that know how to configure an application server.

1.1 RapidAnalytics/JBoss Bundle

This is the recommended installation routine and requires minimal configuration.

1.1.1 Prerequisites

Before you proceed with installing RapidAnalytics, make sure you downloaded and in-stalled the following:

• Download and install Java 1.6, e.g. from www.java.com.

• Download RapidAnalytics-1.0-JBoss-bundle.zip.

• Install any SQL database. RapidAnalytics will store data and andministrationinformation in there. The download bundle already contains JDBC drivers forMySQL, Ingres, Postgres, and Microsoft SQL Server. If you are using a differentdatabase make sure you also download an appropriate JDBC driver (jar file) forthis data base.

• Optionally: Install RapidMiner, version 5.1 or above. Get it from www.rapid-i.

com. (Alternatively, use the Web start version of RapidMiner.)

In case you have another JBoss instance installed on the same host, please make sureit does not conflict with the RapidAnalytics installation. To avoid such a conflict, makesure the environment variable JBOSS HOME is not set.

1.1.2 Extracting the RapidAnalytics Archive

First unzip rapidanalytics-bundle-1.0.zip to a place of your liking. Make sure noblank character appears in the parent path. (Beware of the Program Files directory!)If you are using any of the databases listed in in Section 1.1.1 above for which wealready provide a JDBC driver, you are done. If not, place the JDBC driver jar file torapidanalytics/server/default/lib.

1.1.3 Creating a Database and User

In your SQL database server, create a new database and call it rapidanalytics. Alsocreate a user rapidanalytics for this database and assign it a password.

3

Change to the directory rapidanalytics/server/default/deploy/ and search forfiles with names of the form rapidanalytics-XXX -ds.xml.template. Choose the onewhere XXX matches your database name, copy it to rapidanalytics-ds.xml, and editit. Search for the string XXX PASSWORD and replace it with the database password youjust assigned. If you like, you can as well change user names and database name in caseyou did not name them rapidanalytics as recommended. In case your database serverdoes not run on the same host as RapidAnalytics, you must also change the hostnamefrom localhost to the appropriate host name.

If you run MySQL, Oracle, or Microsoft SQL Server, RapidAnalytics will create thenecessary database schema for you. Otherwise, you must create some tables manually.Go to rapidanalytics/config/quartz/ and select the file tables XXX where XXXmatches your database system name, and run this script in your database manually tocreate the necessary tables.

1.1.4 Additional Configuration

The default installation of RapidAnalytics is configured to use 1024 MB of main memory.To change this, edit bin/run.conf or bin/run.bat.conf, search for -Xmx1024m andreplace this number by the desired one. For further configuration options, contact Rapid-I’s support.

Please continue reading in Section 2.

1.2 Manual Installation

This is only recommended for experienced users and developers that know how to con-figure an application server according to your needs, including defining a data sourceand security domain.

1.2.1 Prerequisites

• Download and install Java 1.6, e.g. from java.sun.com.

• Install a modern application server. RapidAnalytics is tested with JBoss 6 M2 andGlassfish v3. Follow the vendor’s installation instruction.

• Download RapidAnalytics-1.0.zip.

• Install any SQL database. The download bundle already contains JDBC drivers forMySQL, Ingres, Postgres, and Microsoft SQL Server. If you are using a differentdatabase make sure you also download an appropriate JDBC driver (jar file) forthis data base.

• Install RapidMiner, version 5.1 or above. Get it from www.rapid-i.com.

4

1.2.2 Creating a Database and User

Follow the steps described in Section 1.1.3. In addition, copy the JDBC driver jar filefor your database to a place where the application server can find them.

1.2.3 Copy Additional Files

Create a folder for extensions and temporary files at a place of your choice. Preferrednames are plugins and tmp.

Go to the folder into which RapidMiner was installed. Copy all jar files from lib

and lib/freehep except rapidminer.jar and launcher.jar to a place where yourapplication server can find them.

If you want to enable Web Start, copy the same files including rapidminer.jar andlauncher.jar to a place that is served by your application server under the contextroot webstart. Sign the jars with jarsigner, or edit your Web browser’s Java plugin’ssecurity settings to accept unsigned classes.

To make your server redirect you directly to RapidAnalytics, place a file index.xhtmlso your application server serves it in the root directory:

<html>

<head>

<meta HTTP-EQUIV="REFRESH" content="0; url=RA/faces/restricted/index.xhtml">

</head>

<body></body>

</html>

1.2.4 Configuring a Security Domain

In your application server, define a security domain RapidAnalyticsDomain that readscredential data from the data source jdbc/RapidAnalyticsDS. The user data are intable ra ent user, columns userName and passwd. User names are mapped to groupsby table ra rel user group, columns userName and groupName.

1.2.5 Additional Configuration

Configure your application server according to your needs: Set memory consumption,port numbers, etc.

2 Launching RapidAnalytics

Change to the folder rapidanalytics/bin/ and run run.sh on Unix-like systems andrun.bat on Windows. This will launch RapidAnalytics listening on port 8080 on thelocal host. If you want it to listen on other network interfaces as well, call run.sh -b

0.0.0.0 or run.bat -b 0.0.0.0 instead.

5

You now see a lot of messages. Check whether anything unusual appears in the mes-sages. The error message “WARNING [com.sun.xml.bind.v2.runtime.reflect.opt.Injector](HDScanner) duplicate class definition bug occured? Please report this. . . ” can be ig-nored. Please don’t report it.

3 Initial Web-based Configuration

Point your favourite Web browser to http://localhost:8080 (assuming your applica-tion server listens on port 8080 which it does for the bundled download). You will bepresented with a login screen (Figure 1). The initial user and password are admin andchangeit.

Figure 1: RapidAnalytics’ login screen. The initial user name is “admin”, password“changeit”.

After logging in, you will be presented with a setup screen (Figure 2). Check whetherRapidAnalytics detects your database system correctly so it can create the necessarytables. If this is not the case, create tables manually as described in Section 1.1.3, andcheck again.

Specify an absolute directory on your file system where RapidAnalytics searches forextensions and a directory in which RapidAnalytics places temporary files (“Uploaddirectory”). If you chose the bundled installation, these are the directories

• rapidanalytics/rapidanalytics/plugins and

• rapidanalytics/rapidanalytics/tmp.

For the manual installation, these are the folders you created in Section 1.2.3.

6

Figure 2: The RapidAnalytics installation procedure.

Click Start installation now, and check potential error messages. If everythinglooks as in Figure 3, you can click on Complete installation. You should then seethe RapidAnalytics welcome screen. That’s it.

Figure 3: The RapidAnalytics installation procedure is compelte.

You now see the Web interface of RapidAnalytics. In most views, there is a navigationbar on the left and a box with possible actions and online help on the right. The firstthing you should do is go to Administration, Preferences and change your administratorpassword (Figure 4).

7

Figure 4: Changing the password is one of the first things you should do.

3.1 Setting up Database Connections

One of the first things you probably want to configure in RapidAnalytics are yourdatabase connections. To do so, click on Administration, Database Connections in themenu on the left. You will see the screen depicted in Figure 5. Now, choose Createnew connection entry from the box on the right and enter the data for your databaseconnection as seen in Figure 6: database system, host, port, username, password, and aname under which it will be known in RapidMiner and RapidAnalytics. Press Submitand then Test in the box on the right hand side. You should see “Ping succeeded” as inthe figure. Otherwise, check your settings and network connection.

Figure 5: Creating a database connection in RapidAnalytics.

8

Figure 6: Entering connection details. The database connection test succeeded as indi-vated by the message box.

3.2 Creating a User

In everyday work, you should not work in RapidAnalytics as administrator. Instead ofthat, create a regular user by going to Administration, User management and selectingCreate new user from the box on the right hand side (see Figure 7).

You can as well create user groups in a very similar fashion. A list of users andgroups is available from the main User management view.

4 Connecting RapidMiner to RapidAnalytics

From the RapidAnalytics Web interface, you can launch RapidMiner via Web Start usingthe “Launch RapidMiner” button. If you do this, RapidMiner will be automaticallyconnected to RapidAnalytics: In your “Repositories” view, you will automatically seea repository named “Home”. This repository is actually your RapidAnalytics instance.Furthermore, database connections etc. will automatically be shared with RapidMiner.

If for some reason you do not like the Web Start solution, you can configure the con-nection to RapidAnalytics manually. Start RapidMiner and open the Repositories view.Click the Add Repository button (first button in the toolbar of the Repositories view), se-lect Remote repository, and enter the URL to your server, e.g. http://localhost:8080.Also fill in the username and password of the RapidAnalytics user you created.

Note: A common mistake becoming apparent at this stage is that the host that runs RapidAnalytics does

not know its own name. To check this, go to http://localhost:8080/RAWS/RepositoryService?wsdl.

Scroll to the bottom and search for something like this:

9

Figure 7: Managing users with RapidAnalytics.

Figure 8: Connecting RapidMiner to RapidAnalytics.

10

<port binding="tns:RepositoryServiceBinding" name="RepositoryServicePort">

<soap:address location="http://HOSTNAME:8080/RAWS/RepositoryService"/>

</port>

Check whether the host name is actually a host name under which the host is known in the local

network. If it is not, you will get weird error messages when connecting to it. This is mainly because

ISPs nowadays tend to redirect HTTP requests for unknown hosts to a search engine when they can’t

resolve a DNS entry rather than letting the request fail.

Once you are connected to RapidAnalytics, settings made in RapidAnalytics likedatabase connections etc. are also shared with RapidMiner. To check this, go to Tools,Manage database connections and check whether the database connection you defined inSection 3.1 has been published to RapidMiner.

5 Working with RapidAnalytics and RapidMiner

5.1 Using the Repository

Using RapidAnalytics as a server repository is straightforward if you know how to userepositories in RapidMiner: In the Repositories view you see a tree of folders, data, andRapidMiner processes. Using RapidMiner without RapidAnalytics, each of these entriesis stored on the local file system. With RapidAnalytics, the behaviour of RapidMinerstays the same, but the entries are stored on the server and can be accessed with a groupof people.

5.1.1 Storing and Accessing Data in the Repository

We will walk you through some common steps in everyday work with RapidAnalytics.We assume that you connected RapidMiner to RapidAnalytics as described in Section 4and that you assigned the name “RapidAnalytics”. For using the repository, we firstcreate a few folders and copy some data.

As a first step, locate the Repositories view in RapidMiner. If that view is cur-rently not showing on screen, go to View, Show View, Repositories. The top level ele-ments of the Repositories view shows the defined repositories. You should at least seea “Samples” repository and your “RapidAnalytics” repository. Now, open your homefolder in the “RapidAnalytics” repository. RapidAnalytics automatically created a folder/home/username where username is replaced by your user name.

First create two folders named data and processes: Right-click the folder corre-sponding to your user name, select Create Folder, and enter data. Repeat the same forcreating the processes folder. Now, also open the Samples repository, and navigate tothe data folder. Right-click the entry Labor-negotiations to open the context menuand select Copy. Now, right-click the data folder you just created and select Paste inthe same way.

Finally, create a new process. In RapidMiner, you typically specify the place atwhich a process is saved even before you create the process. Although this behaviour

11

may seem uncommon, you will soon see why saving the process first is a useful practice.Click File, New process (or use the first button in the tool bar), select the processes

folder you created, and enter Cleanse Data as a file name. Your (yet empty) processwill then be saved at this location. Your repository should now look as in Figure 9. Forthe figure, I have also copied an additional data set.

Figure 9: The repository populated with demo data.

You can use the repository just as any other local repository in RapidMiner. However,you can inspect it also using the RapidAnlytics Web interface. To that end, either go tothe Web interface and select Repository, Browse Repository from the navigation bar, orright-click a repository entry in RapidMiner’s repository tree and select Browse.

Each type of repository entry has an individual representation in the Web interface,but all have certain common parts:

• Actions available for an entry are at the top of the box at the right: Here, you can,e.g., rename and delete entries.

• Access rights can be defined for each entry. See Section 5.1.2 for details.

• Entries can be downloaded in a format appropriate for the type of entry.

• Entries can be navigated using the breadcrump at the top.

Folders. Folders can contain other items (including sub-folders). Figure 10 shows anexample of our folders. Folders can be downloaded as a zip dump. In the box on theright, you can create subfolders or upload new entries.

Data and Tables. This subsumes all kinds of objects that RapidMiner understands,including, e.g., example sets (i.e., tables), models, etc. Figure 11 shows the “Labor-Negotiations” data set we just copied to RapidAnalytics. The preview in the Web

12

Figure 10: A folder stored in the RapidAnalytics repository.

interface displays the meta data of the table, i.e., the types and possible values of thecolumns. Also, you can download the table in various formats, e.g., as an HTML tableor as an Excel spreadsheet.

If you click on Dependencies you will see which processes read or generate this dataset.

Processes. RapidMiner processes can also be stored on the server. They can be down-loaded as an XML file. Furthermore, in the Dependencies panel, RapidAnalytics showsthe input and output files of the processes, so you can navigate between linked objectsby a click.

Other Objects (Blobs). Finally, you can store objects like images and HTML files onthe server, in case you want to use them for reporting or other functionality. RapidAn-alytics does not interpret them, and just provides them for download exactly as theywere uploaded.

5.1.2 Managing Access Rights

You can define individual access rights on a per-entry basis. For an example, look at thebox on the right in Figure 12. In the Permissions panel you see a list of three groups forwhich we have assigned access rights for this folder: The groups “users”, “simon”, and“rapid-i”. The group “users” contains all users that are created. The group “simon”contains only the user “simon”, and this cannot be changed. It is the user’s so-calledsingleton group. Finally, the group “rapid-i” is a custom group I made that containsthe user “simon”, among others. To edit the access rights for this entry, first click the

13

Figure 11: A table stored in the RapidAnalytics repository.

Figure 12: A RapidMiner process stored in the RapidAnalytics repository.

14

small edit link. For each user you can grant (green check mark) or revoke (red cross) therights to read, write, and execute, respectively. You can remove the specifications fora particular group entirely by pressing the delete button in the rightmost column, andyou can insert a new group to the list of permission specifications by selecting a groupfrom the menu and pressing the plus sign.

In this case, the user group “simon” has full access, whereas the group “users” isrejected. The group “rapid-i” has only the right to read from this folder. All otherpermissions are inherited from the parent folder.

5.1.3 Accessing Data in Processes

Now that we know how to access entries in our repository, let’s get back to designing ourfirst process in RapidMiner. You should still have the empty process named Cleanse

Data opened. The first thing you probably want to do in almost every process is to loadsome data. To that end, you have two choices:

• Drag the data set Labor Negotiations from the repository tree right into theprocess. RapidMiner will create a Retrieve operator and set the appropriateparameter referencing this entry.

• Drag the data set onto the input port in the upper left corner of the process.RapidMiner will contect it in the so-called process context. To show the processcontext, select View, Show view, Context. Using this option has two advantages overthe Retrieve operator: First, it can save space in the process view, since processestypically start with data loading operators. Second, the entries referenced in theprocess context are those that are displayed by the Web interface as links, asoutlined in Section 5.1.1.

Note that RapidMiner automatically inserted ../data/Labor-Negotiations as therepository entry parameter of the Retrieve operator or into the process context. Thisis a relative addressing of the repository entry: The sequence .. navigates one folder up(from the processes folder). This is a practice you should always use for two reasons:

• You can move around folders without destroying functionality.

• RapidAnalytics can resolve them properly. Do not use the absolute repositoryname in the repository location (e.g. as //RapidAnalytics/home/data) becauseRapidAnalytics is an alias that only exists on your client. RapidAnalytics doesnot know that you are referencing it under this name in RapidMiner (you could,e.g., have several RapidAnalytics instances connected), and hence cannot resolvethis name. You can, however, use absolute locations without the leading repositoryreference //RapidAnalytics, i.e., only the part /home/simon/data/Labor-nego-tiations.

Before we execute our first RapidMiner process, we first add a bit of functionality.You may have noticed while looking at the Labor-Negotiations data set, that it con-tains a lot of missing values, indicated as question marks in RapidMiner. We replace

15

these values with more useful ones. Since we do not know what the correct values are, wejust replace them with the average of the respective attribute (column). This is exactlywhat the Replace Missing Values operator does. In the Operators view, open DataTransformation, Data Cleansing, and drag the Replace Missing Values operator into theprocess. Connect its input port to the process input port on the left of the process view.

We must tell RapidMiner to store this result. Likewise retriving data from theRapidAnalytics repository, we have two choices:

• Choose the Store operator from the Repository Access group in the Operatorsview. Drag it into the process, and connect its input to the topmost output ofthe Replace Missing Values operator. Enter ../data/Cleansed Data as therepository entry parameter or select it using the repository location chooseravailable from the folder button next to this parameter. Again, RapidMiner willresolve the relative location for you.

• Instead, we can again use the process context as above: Just connect the topmostoutput of the Replace Missing Values operator to the process output port onthe right and enter ../data/Cleansed Data as the first entry in the output portlist in the Context view. Here, too, you can use the folder button to bring up arepository location chooser dialog.

With the same argument as above, it is recommended to use the process contextrather than using the Store operator. Your process should now look like depicted inFigure 13.

Figure 13: Your first RapidAnalytics process.

16

5.2 Remote Process Execution

You could now run your process locally on your desktop as usual in RapidMiner, pressingthe blue Play button. With RapidAnlaytics, you have a more powerful solution: Youcan run the RapidMiner process on the server, consuming no resources on the desktop,or run multiple processes simultaneously.

5.2.1 Running a Process Remotely

Open the Remote Processes view using View, Show View. This view shows one top-level entry for each RapidAnalytics installation you are connected to. To execute yourprocess on the RapidAnalytics instance rather than on your local client machine, usethe first button in the toolbar at the top of the Remote Processes view. This item is alsoavailable directly from the Process menu. RapidMiner will show a the dialog presentedin Figure 14.

Figure 14: Executing a process on a remote RapidAnalytics instance.

For now, leave all options unchanged and press Ok. After a few seconds, you will seethat you can open the RapidAnalytics node in the Remote Processes view. You willsee an entry for the process you just started, together with information about when itstarted, when it completed, etc. If the process was still running, you would see at whichstage it was, but for such a small process this is improbable to happen. Furthermore,you should see the output produced by the process: The Cleansed Data table. The factthat this output is listed here is another advantage of using the process context. Youcan open this data in RapidMiner by selecting it and clicking the open folder icon in thetoolbar of the Remote Processes view.

17

5.2.2 Scheduled Process Execution

In case you have long-running processes that you do not want to execute immediately,the remote execution dialog shown in Figure 14 provides the option to run a processonce, but later. In that case, you can choose a date and time using a date picker. Apartfrom that, the behaviour is equivalent.

For regular execution, you choose the option to schedule the process as a so-calledcron expression. Cron expressions are a compact yet powerful way to describe repeatingevents. In general, they take the form:

seconds minutes hours dayOfMonth month dayOfWeek [year]

For each entry you can specify a number or an asterisk (*), meaning “any”, or a questionmark, meaning “don’t care”. Use SUN-SAT for dayOfView and JAN-DEC for month. E.g.,the expression

0 0 1 * * ? *

means, everyday, at 1:00 am, on everyday of the month, no matter what day of week wehave. Note that you can use the asterisk only for dayOfMonth or dayOfWeek. Use thequestion mark for the other.

For dayOfMonth, you can use L to specify the last day of the month, or use MON#2

for dayOfWeek to specify the second Monday in a month, and FRI#L to specify the lastFriday. Furthermore, k/n means every n units of this interval, starting with k, so 5/20 inthe minutes field means every twenty minutes, starting at 5, so at 5, 25, and 45 minutesafter the hour. The complete cron expression would then be “0 5/20 * * ? *”.

5.2.3 Monitoring Job Execution

As mentioned in Section 5.2.1 you can monitor the running and completed processes inthe Remote Processes view of RapidMiner. You can filter the displayed list of processesby showing only the ones executed in this RapidMiner session (make sure you use thesame system time as the server does), showing only the processes of today, or by showingall processes (with a cap on the number of displayed processes).

You can as well monitor the running and scheduled processes in the Web interface.To that end, select Processes, Process Scheduler from the navigation bar. You will see ascreen similar to Figure 15.

On the bottom, you can see a list of running and completed processes, together witherror messages, in case they aborted abnormally. E.g., the first process in the figure wasaborted because the user entered a wrong name for the input data (the dash is missing).

If a process is complete, you can directly click on the process’ output to navigate tothe corresponding repository entry and browse the data. Using the icon in the rightmostcolumn of the table, you can also access the log file. The log file is also accessible fromthe Remote Processes view of RapidMiner.

Long-running processes display there current state in this view, similar to the familiarRapidMiner status bar. The process can also be stopped here.

18

Figure 15: The Web interface to the process scheduler.

The list of processes scheduled for future execution is at the top of the page. You seea list of processes together with their last and next execution time. Each entry can beremoved using the leftmost icon or temporarily disabled using the icon next to it. Theentire scheduler can be paused using the link in the box on the right. This can be usefulfor system maintenance or before a system restart.

5.3 Accessing Processes as Services

One of the strengths of RapidAnalytics is the fact that you can access processes (orrather, their results), from outside, even without RapidMiner. To that end, we haveintroduced the concept of so-called services: You can simply expose RapidAnalyticsprocesses as Web services and easily define input parameters and output format. Tounderstand this, you must first understand the concept of macros. In RapidMiner, aprocess can use macros in place of any operator parameter. You can think of macros asvariables that take on different values.

To understand this concept, we re-use the process designed earlier. Recall that theReplace Missing Values operator by default replaces missing values by the respectiveaverage of the attribute. For the sake of simplicity, let us assume that we want tospecify the replacement value explicitly, but we want to make this particular value aconfigurable number. First, tell RapidMiner that the value replacement should onlybe applied to numerical attributes: Select the Replace Missing Values operator, setthe parameter attribute filter type to value type, value type to numeric, anddefault to value. For the actual replacement value we can now specify the parameterreplenishment value. This is a regular parameter and we could fill in a regular numberhere, but we use a macro: simply enter %{replacementValue}. If you would run theprocess now, RapidMiner would complain since the macro is not yet defined. Besides

19

defining input and output, this is the third and last functionality of the Context view:In the bottom third of the Context view, press the Add macro button to add a new(the first) line to the macro table, enter “replacementValue” as the macro name and anumber, say 2, as the value. Your result should look like Figure 16.

Figure 16: A service process configurable through macros.

If you run this process now, you will see that all missing values were replaced bythe number 2. Defining the macro in the process context is convenient in RapidMiner,because we can edit parameters we change frequently in a single place, but it has an ad-ditional advantage. Save the process and open it in the Web interface of RapidAnalytics(Figure 12). In the box on the right, you have an action Export as service. If you clickit, you will see a screen similar to Figure 17.

As you see, RapidAnalytics displays a list of macros defined in the process. In ourcase, there is only one such macro, replacementValue. RapidAnalytics proposes to bindthis macro to a service parameter of the same name. In this view, you can also makesettings that affect the output of the service. We select HTML as the output format.For now, we leave the remaining settings unchanged. Click Submit, and then choose Testfrom the box on the right. You will see the screen in Figure 18.

As you see, you are presented a form into which you can enter a value for thereplacementValue parameter. In our example we have filled in 5. On the bottom yousee the output of the service: The example set in HTML format, where all missing valueswere replaced by 5.

Despite the somehow artificial toy example, this shows that RapidAnalytics servicesare an extremely powerful tool to embed your processes into other IT environments: InFigure 18 you also see that there is a direct link. You can use this link to embed theprocess into any other page, simply supplying the process macro replacementValue as

20

Figure 17: Exporting a process as a service in the RapidAnalytics Web interface.

Figure 18: Applyning a service process in the RapidAnalytics Web interface.

21

a query parameter. In addition to the representation as an HTML table, you can as wellgenerate Flash charts, images, or, machine readable formats like XML files or JSONfiles.

22