Oracle Web Cache Monitor User Guide

 

Text Box:

 

 

Oct 2009

Issue 2.6.2

 

 

Table of Contents

1      Introduction.. 2

1.1       The Problem... 2

1.2       Thresholds, Notifications and Automatic Fix.. 3

1.3       Running a Fix Script.. 3

1.4       Startup and Unattended Operation.. 3

1.5       Termination on Unauthorized Response – "FATAL" email.. 4

1.6       Status Report Email – Example.. 4

2      Configuration ParameTers. 6

Appendix A       Java RUNTIME REQuirements. 6

 

1       Introduction

 

Linxcel Oracle Web Cache Monitor (LxWebCacheMon) is a Java command line utility for monitoring activity in Oracle Application Server (OAS) Web Cache in real time. As such, it can identify performance issues immediately, enabling them to be addressed long before OAS's own self-healing capabilities do the same.


Please contact consulting@linxcel.co.uk for more information on Linxcel Oracle Web Cache Monitor.

 

The utility measures the instantaneous number of "stalled" sessions in Web Cache on a configurable interval. These sessions correspond to user requests that are awaiting completion from tiers below the Web Cache. A high number of stalled sessions with long response times can be an indicator of processing issues that are directly impacting end-user response times, especially when normal behaviour shows a relatively small number of stalled sessions at any time, and short-lived requests.

LxWebCacheMon can therefore quickly identify unusual behaviour associated with poor performance. Root cause analysis still requires further investigation. Due to the number of interconnected components and services in a typical multi-tier application, the root cause can have many possible sources, including web servers, application servers, databases, and combinations of those. The first step to resolving an issue is to identify abnormal behaviour as soon as it occurs.

 

1.1       The Problem

 

Consider a database locking issue (due to badly written business logic code in the application tier) that causes contention between database sessions for a particular record, causing one or more sessions to wait for the locked row until a commit or rollback takes place in the blocking session. The delays propagate to end-user sessions that requested the resource (typically identified by a JSESSIONID cookie) connecting through Oracle Web Cache. If enough end-users require the same resource and their sessions become blocked, eventually the request queues from Web Cache into Apache become full, resulting in a site outage:

 

 

A solution to address the issue in this case would be to restart Apache to terminate the database client sessions involved in the locking issue. Terminating Apache would in turn cause all application server sessions into the database to end, and Oracle would clear the locks for the terminated sessions: the issue would be resolved. OAS provides a built-in self checking capability that restarts Apache in this scenario but it typically takes a matter of minutes to determine an issue by which time a site outage has already occurred. LxWebCacheMon can do the same using a sophisticated check that’s easily configurable and can identify issues more quickly, enabling the issue to be resolved before a site outage occurs, thereby minimizing the impact on end-users. Of course, the root cause in the code needs to be addressed to eliminate the underlying cause and provide a permanent solution. However, in real-world applications where it’s not possible to predict all behaviours in advance, it’s essential to have a pro-active support process in place to minimize the effects of issues.

 

1.2       Thresholds, Notifications and Automatic Fix

 

You can set thresholds for warning and critical numbers of stalled sessions at any instant in time. When these are breached/violated for a configurable number of consecutive intervals you can take actions as follows:

 

·         for warning threshold breaches, send an email

·         for critical threshild breaches, send an email and optionally run a "fix" script for example to restart an OAS component

 

This is an example of the email subject for a WARNING email:

 

 

Similarly, a CRITICAL email subject looks like this:

 

 

The are no rules for what constitutes the warning and critical threshold levels – these depend on the behaviour of your application under normal operation and you should set them appropriately based on observations of “good” performance.

An SMTP mail host is required in order to send email notifications. Email configuration is mandatory in order to use a fix script, to ensure a record of the operation has taken place.

You can run LxWebCacheMon from any Java client with access to the Web Cache diagnostics URL and port. The OAS Administrator username and password must be provided. For security, the password can be supplied in a properties file rather than the command line.

 

1.3       Running a Fix Script

 

To run a fix script, you must run LxWebCacheMon on the same server as the monitored Web Cache instance, and the fix script must be accessible from the same server. The usual CRITICAL email is sent, following by two INFO status emails that identify the script about to be run and the completion of the script.  The "completed fix" email contains the command(s) executed and any output from them in the body of the email.

 

           

 

You choose the contents of the fix script depending on which component (or components) you wish to restart. Possible commands include the following, which restart Apache Web server and Web Cache respectively:

 

D:\Oracle\OAS_Tools\opmn\bin\opmnctl restartproc ias-component=HTTP_Server

D:\Oracle\OAS_Tools\opmn\bin\opmnctl restartproc ias-component=WebCache

 

1.4       Startup and Unattended Operation

 

Upon startup, LxWebCacheMon sends an INFO email (example as follows) containing the configuration settings used:

 

 

LxWebCacheMon is intended to run unattended, for example as a service on Microsoft Windows. It will continue to run for example if the Web Cache is shutdown or the Web Cache server is shutdown. It also produces detailed terminal output and logs which are useful to review if a performance issue in ongoing. For each sample, the output shows the instantaneous number of stalled sessions, their state, and the time spent in that state. An example is shown in the following screen shot:

 

 

1.5       Termination on Unauthorized Response – "FATAL" email

 

LxWebCacheMon will terminate itself if an "Unauthorized" response is received from Web Cache. This means that an invalid OAS username/password combination was provided. In this case, LxWebCacheMon terminates (and sends a "FATAL" email) to prevent the OAS Administrator account from being locked. This could occur for example if the ias_admin password was modified without changing the LxWebCacheMon startup configuration to match. The email subject looks like this:

 

 

1.6       Status Report Email – Example

 

An optional Summary Report (email) can be sent at configurable intervals (for example at 09:00 and 17:00 daily). This contains the number of requests, warning and critical breaches and maximum stalled sessions since the last report.

 


 

2       Configuration ParameTers

 

The following table provides details of command line parameters for LxWebCacheMon.

To use an SMTP server requiring authentication, you must provide the muser and mauth values in the properties file.

The optional properties files is LxWebCacheMon.properties.

 

Parameter name

Command Line

Properties File

Mandatory?

Usage

-url <string>

a

 

a

 OAS url for diagnostics

-mode stall

a

 

a

 run mode: stall

-user <string>

a

 

a

 OAS admin user

-password <string>

a

a

a

 OAS admin password

-fix <scriptpath>

a

 

 

action script triggered by critical violations -  monitoring and script must run on Web Cache host. SMTP email settings required

-interval <seconds>

a

 

 

interval between samples (default 20)

-trigger <count>

a

 

 

consecutive threshold violations which trigger an action

-warning <count>

a

 

 

stalled session count: warning threshold

-critical <count>

a

 

 

stalled session count: critical threshold

-mailfrom <string>

a

 

 

SMTP mail-from address

-mailhost <string>

a

 

 

SMTP mailhost

-mailto <string>

a

 

 

SMTP mail-to address list (csv)

statusat=<csv list of hours>

 

a

 

Send status report at hourly intervals e.g 8,18

muser=<string>

 

a

 

Username for email server authentication

mauth=<string>

 

a

 

Password for email server authentication

 

 

 

Appendix A  Java RUNTIME REQuirements

 

LxWebCacheMon requires the following jar files  – the JRE must be 1.5 or higher

 

activation.jar

commons-cli-1.1.jar

commons-httpclient-3.1.jar

commons-logging-1.1.1.jar

htmllexer.jar

htmlparser.jar

mail.jar

ns-codec-1.3.jar