How to Analyze Long running UBE’s

Source: Oracle UBE Performance Webcast

Example Issue

UBE runs for three days and never finishes.

Need a valid profile to answer the question

WHERE IS THE CODE/SYSTEM SPENDING ITS TIME?

The problem question:

How to get that profile?
How to “get your arms around the process”?

Analysis of Long running UBE’s

In JDE.INI [DEBUG] DumpLPDS=0
These settings:

Reduce log file size; debug gets large less quickly
Give a more accurate profile by stripping out stuff which does not help with performance analysis

UBEDebugLevel=0 eliminates UBE ER entries which do not have time stamps and do not add performance relevant data DumpLPDS=0 eliminates dumping BSFN data structures twice for each call.
Approaches

Run the unmodified production use case with debug on
Run the UBE with Reduced Data Selection
Run for 30-60 minutes, then terminate job
Run for a long period, collecting log samples at intervals
Capture runtime call stacks over an extended interval

Run the unmodified production use case with debug on
Pros:

This is the ideal case – the entire run is captured beginning to end
No compromise to data sample.

Cons:

For UBEs longer than two hours (absolute maximum), this method is almost certainly not viable.
Logs get too large too fast: several GB per hour
Some customers have 2GB file size limit on the operating system.
- Note that this is usually modifiable
Debug logging adds a factor of 2-3 to the runtime

Reduced Data Selection

Pros:

Shorter, manageable run
Smaller log size
Job can finish, so a complete start-to-finish picture results
No need to terminate the job

Cons:

Under sampling
Short time frame skews the profile
Fixed-cost / one-time hits at startup will be exaggerated
A 10-minute run of a 10 hour job will NOT give a reliable profile
Avoidance of problems you are trying to observe
If the problem is related to a sp specific data range, this may be missed
Job may be in infinite loop

Run for 30-60 minutes, then terminate job

Pros:

Less risk from under sampling
A reasonable sized log in the few GB range

Cons:

UBE may have multiple sections, and the real problems may occur in a section that is never reached in the 30-60 minutes
- Get a listing of the UBE’s ER to help get a clearer picture
- Look at the UBE in the RDA tool
What the UBE is doing in the first hour may NOT be the same as what it’s doing in the third hour
- Remember: a one-hour run with debugging on is probably 20-30 minutes of runtime without debug logging on
If the job is running the same code for a long period – but slowing down gradually – this will also be missed by just getting the first hour
Killing the job prematurely means not seeing the complete picture
The graceful end of a UBE contains important indications of cache memory leaks:
- Such as jdeCacheInit()not matched with jdeCacheTerminateAll()
- Failure to call jdeCacheTerminateAll() will call the jdeCacheDestroyAllUserCaches()
  - This will prove difficult to count and match all jdeCacheInit() calls, as they may be initialized and closed in as many BSFNs.
Multiple sections in UBEs
- When UBE processing contains multiple sections which process in serial fashion, a one-hour sample at the start of the run may never even capture a sample of the serious problem.
Problem data ranges in UBEs
- Even if there is only a single section which does all the processing, specific data ranges later in the process may trigger slower throughput.
- Other factors may cause a precipitous drop in throughput later in the process, such as memory consumption reaching thresholds.

Run for a long period, collecting log samples at intervals

Pros:

Obtain profiles for a much longer period
Perhaps collect a 30 minute log every 2-4 hours

Cons:

The run needs monitoring, babysitting
Process is a bit of a “kludge”
Takes a long time, requires machine resources during that time

Capture runtime call stacks over an extended interval

DO NOT enable EOne debug logging

Capture a set of snapshots at different intervals over a long run

Pros :

Can combine this method with other monitoring
Poor man’s “manual” sampling…can be effective
Use existing operating system commands
Debug code NOT required
Can help to spot infinite looping behavior

Cons :

Raw call stacks are a bit obscure, not as intuitive to read
The run needs monitoring, babysitting
Takes a long time, requires machine resources during that time

Conclusion

If the use case is less than two hours:
- Run the unmodified production use case with debug on
In general, avoid reducing data selection
- But this MAY help in identifying cache leaks – since the UBE finishes gracefully
- Can view the end of the log file for missing cache terminates
Perhaps a reduced data selection case could be run in addition to one of the longer use cases
- This would allow the end of the job to be captured
Try a one hour terminated run first
- In general one single long-running SELECT should NOT drive the analysis
Next – try log samples throughout the run
Finally, try call stack samples

1 thought on “How to Analyze Long running UBE’s”

Sudhakar Motade December 6, 2012 Reply

We have a strange issue. We have a sales historical report that run since 6 months in production. We do minor code changes last week that doesn’t involve any Data Selection.
Since theses minor changes, the UBE process one loop in the Do Section then just freeze when he goes at the top of the Do Section. We get ‘JDB_SetSelectionX was invoked with an invalid 0 number of selection criteria. Selection criteria need to be positive’ in the jde.log.
We can’t process successfully the report and we can’t figure out what is the situation for this one.
Any hints will be gladly welcome.

How to Analyze Long running UBE’s

Example Issue

Need a valid profile to answer the question

The problem question:

Analysis of Long running UBE’s

Reduced Data Selection

Run for 30-60 minutes, then terminate job

Run for a long period, collecting log samples at intervals

Capture runtime call stacks over an extended interval

Related

1 thought on “How to Analyze Long running UBE’s”

Leave a ReplyCancel reply