HowTo setup a Pentaho repository

From Schaffter

Jump to: navigation, search

Contents

Introduction

There are (at least) two ways of maintaining the software that is locally developed using the Pentaho suite of software components.

  • Using external XML formatted files to save and deploy the locally developed software solutions.
  • Using a repository in a database to store the locally developed software solutions.


They both have their pros and cons, and in a real scenario it's very likely that the combination of using the repository for development, combined with XML files for deployment and version control be chosen.


The setup of a repository database isn't 100% intuitive for a new Pentaho user. An experienced user will most likely think that it's trivial, but the new user is already somewhat intimidated by the share volume of new information to take in.

I failed to find a good step-by-step documentation for the repository setup procedure on the official Pentaho sites. Since I went through this procedure myself, I thought it would be good to put down my findings in a document like this, so that it can be easily shared with other Pentaho users. The idea being that the new Pentaho user should get up and running quickly, concentrating on using Pentaho for productive development instead on figuring out how to setup the local infrastructure to make it happen.


In this HowTo, it is assumed that:

  • There is a need (or wish) to use a repository database.
  • The repository will be created in a MySQL database.
  • The repository is needed to do ETL development, using the Pentaho spoon component.
  • We are using version 3.1.0 of spoon. (It is likely that this document be valid for the whole 3.1.x series and potentially even for the whole 3.x series, but that's still to be seen.)
  • If you follow the same procedures as I did, and if you brake something or if it doesn't work, I'm not responsible. It's your fault, having followed my procedures.


Main objectives

  • Create a repository database
  • Create a database connection to the repository
  • Initialize the repository
  • Use the repository for development activities


Create a repository database

The creation of a repository database is not done through the Pentaho software. It must be done with any means you have to create a new database and to maintain it. Due to the numerous types of databases available, and supported by the Pentaho software, the setup procedures of the repository database is totally dependent on what database engine you have available.

In this example, it is assumed that the repository be setup in a MySQL version 5.x system. If you have a different RDBMS, then adjust accordingly.


Think hard of a good name for your repository. You may want to add more repositories in the future. Find a naming standard. Do you want a repository per project team? Do you want a common repository for all your projects or do you want a repository per project? How to isolate development activities from test activities and, even more so, from production? Etc.


Important!

Let it be clear, that the Pentaho repository database should be used exclusively for this purpose. Do not go down the route of sharing the same database for the repository function with any other usage. Do not put your repository information in your data warehouse, operational data store or some such.

Such sharing is a quick recipe for headaches. Keep them separate and you will require less aspirin during database maintenance.


The steps for creating a repository in MySQL V5.x :

  • Login to the MySQL server as root.
  • Create the MySQL database. For example:
CREATE DATABASE PENTAHO_REPO_DEV;

(Or some smarter name that fits into your current and future organization.)

  • Grant access to a new repository user to the new database. The new user will need all but grant rights. At the same time, set the password to the repository for the new user. For example:
GRANT ALL ON PENTAHO_REPO_DEV.* to 'pentaho'@'localhost' identified by 'secretpassword';

Remember the password. You will need it.

By the way, in this example we name the user pentaho, but you could and should choose a name that corresponds to your existing naming standards. If you have no such standards, you may either define a standard or just go for the name as used here.

Also, notice that we here grant access to the user pentaho, but only for access from the localhost computer. If you need to work from a workstation different from the computer where MySQL is installed, then you should give your workstation name instead of "localhost".

Please, read up on how to create new users and grant access in the MySQL manuals. Or, if you use any other database system, read up on how you create users in your database system.

Your Pentaho repository database is now created, but it's still empty. You will fill it with its required database objects (tables, views, etc.) with help of the spoon application.

Create a connection to the repository

So, you have created an empty repository database. Now you must make sure that you're able to connect to that database from spoon.


Start by launching the spoon component. If you're considering setting up a repository, then it's likely that you have already launched spoon at least once and used it without a repository. Anyway, you should recognize the Welcome screen that will look something like the image to the right.


If you have already once unchecked the "Present this dialog at startup" check box, then this welcome window will not be shown. In such case, you must open the "Options" dialog of spoon and check the "Show welcome page at startup?" check box and restart spoon. From there, the welcome window should be visible again.


In this window, there should be no surprise that you will click the "New" button to create a new repository. Or even more precise, to connect to and initialize a new repository.


Repository Information

At this point, you will be presented with the "Repository Information" window.


Image:PentahoSpoonRepositoryInformation.png

You must here start by creating a database connection. It's quite unlikely that any of your existing database connections will do the work for you, since you just created the database in the previous step.


So, click the "New" button to create a new database connection for this repository. You should now see the window for setting up a database connection.


Database Connection

As already stated, we use MySQL for this example, so start by selecting MySQL as Connection type.

You should also at a minimum fill in the fields:

  • Connection Name (Give a meaningful name to this connection, something that makes sure that you won't accidentally use this database connection for some other purpose later on, potentially screwing up your repository.)
  • Host Name ("localhost" if using MySQL on the local computer. If not, give the relevant computer name.)
  • Database Name (This is the MySQL database we just created with the "CREATE DATABASE" statement;
  • User Name (We used "pentaho" as a demo user name, so we fill in the same name here.)
  • Password (You did remember the password, didn't you?)


When all is duly filled in, verify the information you provided by clicking on the "Test" button. If spoon successfully connects to the database, then everything is fine and you can go forward.

If the connection fails, you must amend the information and make any changes so that your database connection can connect to the database. Failing this test, your repository will not be accessible at all and cannot be neither initialized, nor used. Don't skip this test. You will regret it.


When your connection is functional, click the "OK" button and you will be back to the "Repository Information" window.

In the drop down listbox, chose the database connection you just created, "Pentaho Repository Meaningful Name". The connection will let spoon connect to the underlying repository database.


Repository Naming

We have here created just one repository database, but you could work with several repositories. To identify the repository among other repositories, give a meaningful "Name" to the repository, optionally followed by a more verbal "Description".


Bare in mind that the "Name" and "Description" of the repository will be stored in your local configuration file in your home directory. It will not be shared with other users, unless they set it up with the same name. If you intend to share the repository database with other users, consider having a common vocabulary to denominate your repository, especially if you use more than one of them.

Initialize the repository

When you have selected the right database connection in the "Repository Information" window, you can click the Create or Upgrade button to initialize the database with the repository tables and other required database objects. Be careful to chose the right database connection!

When done, click the OK button to confirm your actions. The "Repository Information" window will close.


Connecting to the repository

Back at the welcome window, you should now be able to connect to the repository for the first time.


From the dropdown listbox, select the repository that you just created and fill in the "Login" and the "Password". This is when you start to wonder what login and password you should use.


Well, it is not the "pentaho" user that we used in the database connection to the repository. You must here provide a login and password that the Pentaho components will recognize, which will be recognized within the repository.


What users are recognized within the repository?

When you clicked the "Create or Upgrade" button to initialize the repository, a default user was automatically created for you in the new repository. You will now use that default user to login to the repository and you will thereafter use this default user to create at least one other user, which is you. You should from that point only use your own login when you work in the repository. The default user can also be used for other administrative tasks, like creating a repository login for your colleague, or for your friend. These "users" are not authenticated by MySQL (or whatever database you use), but they are authenticated by the Pentaho components. In this case, they are authenticated by spoon itself, as soon as spoon has been able to log itself in to the MySQL database.

So, the question remains, what is the login and password to use for this administrative user?


The answer is simple:

  • Login: admin
  • Password: admin

Fill in the information and click OK to login to your repository, with administrative rights.


Setting up repository users

Carefully study this image. It shows several steps you should follow.

  • Once spoon is launched, you know that the connection to the repository is functional.
  • From the application menu, select Repository and then Explore repository. This can also be achieved with Ctrl-E.
  • The "Repository explorer" window will open up. (In this example, the screen-shot comes from a different repository of mine, but yours should show the name of the repository that you just created.)
  • Click on the small arrow to the left of Users. The arrow will tilt down and show a list of recognized repository users. In your case, with your newly initialized repository, you will find the users "admin" and "guest". (I already have an additional user "gus" that I use for my development in this repository.)
  • To create a new repository user, double-click on Users. (Not very intuitive.) The "User information" window will open up from where you can create your own repository user. Or create additional repository users, if you will share the repository with your "Dear Colleague".
  • Fill in the user details and choose the relevant profile from the drop-down listbox. It's unlikely that you will create an additional "Administrator". You will most likely create your own user with the profile of User. You could also create users with a Read-only profile, but that is something left for your local needs.
  • Click OK to create the new user. The "User information" window will close.
  • In the "Repository explorer" window, click Commit changes to save you administrative work to the repository database.

Your repository is now setup and functional. You have created at least one "ordinary" user that you will use for all development in spoon.

  • Log out from spoon and log back in, but this time, log in using the new repository user you created.
  • If everything works so far, you're done. Happy development!

Using the repository

This is the easiest part.

You create a new transformation or a new job in spoon. You save it with either Ctrl-S or by clicking the Save button. The object(s) are saved in the current repository.

You open an existing object (transformation or job) from the Open menu alternative. No surprises here.

If you want to, or have to, you may export the object to an external XML file. Thereafter you may import the XML file into another repository.

Pretty straight forward.


Disclaimer

If you screw up things by following the above procedures, it's not my fault. It's your fault, and you assume all direct and indirect consequences yourself. Don't blame me.

This article was valid in March 2009. Things have a tendency to change over time.

Personal tools