Monday 21 September 2009

OBI EE : Design To Address Performance

The topic that I am going to discuss in this blog is so common place that you will find at least 9 out of every 10 OBI EE bloggers have written something on it. As a matter of fact it is highly probable that you will find nothing new in my blog. I will try to explain the A to X [Y and Z are still under researchJ] of performance engineering that governs the readiness of over all OBI EE application. The thoughts are made to be as generic as possible to address the similar issues in other reporting tools as well.

Let me start with a simple question –

What are the factors you think have an impact on the report performance?

The solution of most of the problems is under the answer of this one single question.

I am putting my own answers, just cross check your answers with mine. [And do append anything that I missed]

1> Physical Specification of Servers. Both Database and BI Servers.

2> Configuration of the database. In case of Oracle DB - PGA Size, SGA Size etc.

3> Configuration of BI Servers

4> Structure of Database Objects

5> Physical Layer and BMM Layer Modelling

6> Reporting Requirement Considerations

7> Caching

First two points are not in the direct scope OBI EE 10g discussion so I will keep them out of this blog. But you can always check ASK Tom website for bolgs related to these two steps. Just remember that the performance of OBI EE is limited to the physical resources like CPU, Memory, I/O and Network Bandwidth you have provided to the application. So more the resources you have, faster the performance you will get.

Step 3> Configuration of BI Servers
Performance is not the last but the first step of any successful BI implementation. Faster the information you provide, better the response you get from business. This is the base of every BI reporting application, so if this is not met and preserved throughout the lifecycle of the application, it is very possible that you will end up with a dead solution.

We need to define the physical architecture of the implementation. In case of production system clustering is a must to have thing. This not only boosts the performance of reports but also increase the availability of overall application. It eliminates the possibility of single node failure of the platform which is very important for any application has 24 X 7 availability commitment.

A simple OBI EE architecture can be like this –

2 Servers : Presentation Service, Java Host and OAS instance for J2EE Plug-In

2 Servers : Oracle BI Server and Scheduler

You can host your cluster controllers to any of these boxes but Oracle BI Server nodes are my personal preference. With added nodes you will increase the load applied on the platform at pick hours. But there is a limitation of maximum 16 nodes that can be added in a single OBI Server cluster. Read OBI EE documentation on how to configure a cluster for better understanding. There is another thing that you have to keep in mind when you are setting up a system - in a 32 bit server OBI can use only 3GB of memory per process.

Step 4> Structure of Database Objects
OBI as a reporting tool takes the logical queries and send the corresponding physical queries to the database layer. Database layer then executes the query and fetches the data rows and then it is sent to the OBI server for report preparation. So, if your physical query is not sending the data in time, most probably you will hit a performance related issue in your reports. The best way to address this type of issues is to create aggregate tables whenever possible. Suppose if you have a base table with day, sales qty, sales price and you have a monthly report at your OBI end then creating an aggregate table on month, sales qty and sales price will give you better performance than the earlier one. But this does not mean you have to create aggregate table for every OBI report. Just remember one thing when you create an aggregate table you put on yourself to populate (or refresh) the aggregate tables during data load time. So with the number of aggregation table your data load window will also increase.

Table also can be designed to be partitioned on date or month column to give better query response. Note that partition in Oracle means different under laying physical tables (all are different segments) accessed using same logical name.

Once you classified the data for aggregate tables and partitioning, you can build a set of indexes to make the query even faster. When you create your indexes make sure they are build on the columns that are used in the WHERE clauses of the OBI queries.

You can also use Oracle OLAP to build the aggregate layer to improve physical query response time. You can get the implementation details in Oracle OLAP website.

Remember data modelling is 25% science and 75% arts. So do not always go by the rule book to achieve the best solution.

Step 5> Physical Layer and BMM Layer Modelling
OBI internally uses a set of rules to parse the logical query to an executable physical query and it is the model that you have designed in the repository that governs the decision made by OBI during its up time. So building a proper repository is very much important for a successful BI application. Step by step documentation of how to build physical or logical object can be found in OBI EE documentation and I will not discuss them here. Rather I will focus on some points that you have to remember when building the repository.

When you are building your physical layer, try to keep these things in your mind.

1) Use the connection pools properly. Specify proper driver type (Native Call Interfaces). For example, if you are connecting to an Oracle database then an OCI 10/11g connection will be more suitable than an ODBC connection. Specify the pool size. If you have 100 concurrent users in the system then 10% to 20% of that number should be your pool size. You can increase this size but before increasing this value do consult with your database administrators. Your database should be able to handle that many concurrent open connections. Moreover every connect needs some memory in the application space, so adding more connection will increase the memory footprint of your application and require more time during start up. Make sure connection pooling is enabled.

2) Use Physical layer Alias from the physical tables. This gives your better flexibility on designing the physical model. If you have a role playing dimension (like date), this approach is very helpful. This is also helps you to model scenarios like Circular join. More over it helps you to keep the number of database views in control and easily manageable.

3) Do not forget to specify the primary keys of the tables.

4) Nullable setup is another thing that has to be defined properly for every column. This has some strange impacts. Firstly if you are selection a column that has null value but not declared as nullable. The OBI server will convert it to a double and show will 0. This may be acceptable in some cases but if you do joins then this types of conversion cases great problems. Let me explain with an example – In your physical layer you have a dimension D and a Fact F. The dimension key is D_PK and fact foreign key is F_FK. Now suppose by some mistake you have checked the Nullable option for this F_FK column. Now all the joins between your fact and dimension will take a form like this- NVL(D.D_PK, 88.00) = NVL(F.F_FK,88.00) and NVL(D.D_PK, 99.00) = NVL(F.F_FK,99.00) instead of D.D_PK = F.F_FK. As you can understand, this will impact the join performance heavily. Setting the column to null when it is really not null will give you the correct answer, however it will be slower than the straight join due to the non-use of indexes. Setting the column to not null when it is really nullable will result in the wrong answer, as the equi-join will remove null rows from the record set completely.

5) Do not use complex joins in the physical layer. Use of complex join is only required when you have a poor database design. This affects the query performance badly.

6) OBI EE allows you to import tables from different databases and join them in the physical layer but it is strongly not recommended. This type of join has to be resolved in the OBI server and makes the whole application slow.

If physical layer is the backbone of the application then the business model and mapping layer (hence forth BMM) can be seen as the heart and mind of the OBI applications. This is the place where all mystery reveals. For an OBI EE developer this is the area where he has to show his magic. I will start my discussion with a simple diagram of pyramids.


In this diagram I consider the dimension and the fact as pyramid structures with the width proportion to the data volume. In both case (i.e. Dimension and Fact), I have three layers. The Base data layer, Intermediate Aggregate layer and final Aggregate later. As we know the volume of data decreases with the increase in summarization. We will traverse less data to form out output if we fetch the information from the final aggregate layer and the same will be more if we fetch the information from the base data layer. With the amount of data traversed the time elapsed in the database side increases so the overall performance of the report decreases. This is the main conceptual building block that you always have to remember when you are modelling your business model and mapping layer. The aim is to hit the proper aggregation layer through the generated physical SQL queries from the logical SQL queries. The basic question is how to do that? Many of you already knew the answer. But I will explain for other who does not know the trick.

Fact data normally grouped (or sectioned) by the values present in the dimension and in most cases if you notice, these are the dimensional levels or hierarchies. So it is very important to model the dimensional hierarchies in the BMM model properly. For every dimension if possible create a dimensional hierarchy and specify the level keys properly. For logical dimensional tables create more than one logical table sources (hence forth LTS) and map them properly to different dimensional aggregation layers. These LTS are not selected automatically to the most performing physical sources. You have to specify proper dimensional level information in the content section of the LTS objects so that OBI can generate physical SQL that are optimum in nature. If you have dimensional values placed in more than one physical table based on some value indexing (like A to M to one physical table and N to Z to another physical table) then put proper fragmentation information in the content tab of each LTS. The same rule applies for logical fact tables. If you are hitting some performance related problems due to improper physical SQL, always remember to check your BMM layer for improper mappings and content selection.

Step 6> Reporting Requirement Consideration

From my past experience I have seen that end user comes with complains that the reports are not very fast. Specially, if you have provided OBI Answers access and end users are building their own analytical reports. The issue in most of the cases is how you are building your reports. OBI EE provides a long range of report view options and all of them have some trade offs. In most of the cases if you remember these few points then you will be able to come out from many of the issues.

1) Pivot views are normally slower than Table views as they need some extra calculation to produce the pivot functionality, so if you are considering performance then Table view is always better than Pivot views.

2) Try to implement the business rules inside the database layer as far it is possible. If it is not possible try to do it in OBI repository. Answer requests should be your last option when you are considering any business rule. This is not only gives you performing but also reusable design.

3) Try to limit the number of dashboard prompts to 6 or 10 if possible. Use multiple select instead of dropdown list. This will provide better performance for the dashboard pages. Avoid constraints and always put some default values for the prompts. This will reduce the initial search space by adding some condition in the SQL and thus provide better performance.

4) Like the above try to avoid the in-line queries from the dashboard prompts.

5) Try to avoid conditional filter based on other request. In this case the first request executes before the actual request and thus may cause slower response.

6) Make sure you have sufficient space in the temporary folder. If possible try to have more than one disk drive and configure more than one temporary location. This will provided faster response. The rule of thumb is one temporary location per physical disk. You can also use in-memory locations as temporary locations.

7) SAW has its own caching mechanism, try to use that cache as long it is possible. Bypass Oracle BI Presentation Services Cache only it is advised by the administrator.

Moreover, The OBI reports give better performance in Mozilla Firefox 3 or higher versions. This is just an observation from my past experiences. I do not know why it is like that. So when you are drawing any conclusion on performance. Make sure you have run the reports in all the available internet browsers (like IE and FireFox).

Step 7> Caching

Caching mechanism designed in a proper way can boost the end user report running experience. But this should not be treated as the solution for any performance related issue. Moreover caching if not designed properly may produce stall data related problem in the reports. Defining a caching strategy is very important.

1) If the under laying physical tables are refreshing less than 15 minutes time then it is better not to cache those tables. Because the overhead to maintain the cache will be much costly than its actual benefits.

2) If the tables are refreshing in hours and the tables are encountering similar types of queries from different user sets then you can consider caching with event tables. This will maintain the cache automatically once the table data is changed.

3) If the table refreshes only once or twice in a day then you can use programmatic cache purging and cache building. In this case after the ETL job loads the data in the tables purge the OBI cache and then run a set of Logical Queries to rebuild the cache. Normally it is suggested from Oracle to use the iBot jobs to rebuild the SAS cache. But it has some limitations. Especially if you have a clustered environment iBots build cache for Dashboard are not propagated to the cluster even if you have cluster aware cache. We have encountered this issue in our production environment and Oracle Support suggested us to us SASeedQuery command to populate the cache instead of iBots.

The effectiveness of the cache can be analyzed from the user log. If you have User Tracking enabled in your environment then you can track the efficiency from the tracking table. This information is normally helpful if you want to make some changes in your caching strategy.

No comments:

Post a Comment

Blog Archive