How to Analyze Long running UBE’s

Source: Oracle UBE Performance Webcast

Example Issue

UBE runs for three days and never finishes.

Need a valid profile to answer the question

WHERE IS THE CODE/SYSTEM SPENDING ITS TIME?

The problem question:

  • How to get that profile?
  • How to “get your arms around the process”?

Analysis of Long running UBE’s

In JDE.INI [DEBUG] DumpLPDS=0
These settings:

  • Reduce log file size; debug gets large less quickly
  • Give a more accurate profile by stripping out stuff which does not help with performance analysis

UBEDebugLevel=0 eliminates UBE ER entries which do not have time stamps and do not add performance relevant data DumpLPDS=0 eliminates dumping BSFN data structures twice for each call.
Approaches

  1. Run the unmodified production use case with debug on
  2. Run the UBE with Reduced Data Selection
  3. Run for 30-60 minutes, then terminate job
  4. Run for a long period, collecting log samples at intervals
  5. Capture runtime call stacks over an extended interval

Run the unmodified production use case with debug on
Pros:

  • This is the ideal case – the entire run is captured beginning to end
  • No compromise to data sample.

Cons:

  • For UBEs longer than two hours (absolute maximum), this method is almost certainly not viable.
  • Logs get too large too fast: several GB per hour
  • Some customers have 2GB file size limit on the operating system.
    • Note that this is usually modifiable
  • Debug logging adds a factor of 2-3 to the runtime

Reduced Data Selection

Pros:

  • Shorter, manageable run
  • Smaller log size
  • Job can finish, so a complete start-to-finish picture results
  • No need to terminate the job

Cons:

  • Under sampling
  • Short time frame skews the profile
  • Fixed-cost / one-time hits at startup will be exaggerated
  • A 10-minute run of a 10 hour job will NOT give a reliable profile
  • Avoidance of problems you are trying to observe
  • If the problem is related to a sp specific data range, this may be missed
  • Job may be in infinite loop

Run for 30-60 minutes, then terminate job

Pros:

  • Less risk from under sampling
  • A reasonable sized log in the few GB range

Cons:

  • UBE may have multiple sections, and the real problems may occur in a section that is never reached in the 30-60 minutes
    • Get a listing of the UBE’s ER to help get a clearer picture
    • Look at the UBE in the RDA tool
  • What the UBE is doing in the first hour may NOT be the same as what it’s doing in the third hour
    • Remember: a one-hour run with debugging on is probably 20-30 minutes of runtime without debug logging on
  •  If the job is running the same code for a long period – but slowing down gradually – this will also be missed by just getting the first hour
  • Killing the job prematurely means not seeing the complete picture
  • The graceful end of a UBE contains important indications of cache memory leaks:
    • Such as jdeCacheInit()not matched with jdeCacheTerminateAll()
    • Failure to call jdeCacheTerminateAll() will call the jdeCacheDestroyAllUserCaches()
      • This will prove difficult to count and match all jdeCacheInit() calls, as they may be initialized and closed in as many BSFNs.
  • Multiple sections in UBEs
    • When UBE processing contains multiple sections which process in serial fashion, a one-hour sample at the start of the run may never even capture a sample of the serious problem.
  • Problem data ranges in UBEs
    • Even if there is only a single section which does all the processing, specific data ranges later in the process may trigger slower throughput.
    • Other factors may cause a precipitous drop in throughput later in the process, such as memory consumption reaching thresholds.

Run for a long period, collecting log samples at intervals

Pros:

  • Obtain profiles for a much longer period
  • Perhaps collect a 30 minute log every 2-4 hours

Cons:

  • The run needs monitoring, babysitting
  • Process is a bit of a “kludge”
  • Takes a long time, requires machine resources during that time

Capture runtime call stacks over an extended interval

DO NOT enable EOne debug logging

  • Capture a set of snapshots at different intervals over a long run

Pros :

  • Can combine this method with other monitoring
  • Poor man’s “manual” sampling…can be effective
  • Use existing operating system commands
  • Debug code NOT required
  • Can help to spot infinite looping behavior

Cons :

  • Raw call stacks are a bit obscure, not as intuitive to read
  • The run needs monitoring, babysitting
  • Takes a long time, requires machine resources during that time

Conclusion

  • If the use case is less than two hours:
    • Run the unmodified production use case with debug on
  • In general, avoid reducing data selection
    • But this MAY help in identifying cache leaks – since the UBE finishes gracefully
    • Can view the end of the log file for missing cache terminates
  • Perhaps a reduced data selection case could be run in addition to one of the longer use cases
    • This would allow the end of the job to be captured
  • Try a one hour terminated run first
    • In general one single long-running SELECT should NOT drive the analysis
  • Next – try log samples throughout the run
  • Finally, try call stack samples

1 thought on “How to Analyze Long running UBE’s

  1. Sudhakar Motade Reply

    We have a strange issue. We have a sales historical report that run since 6 months in production. We do minor code changes last week that doesn’t involve any Data Selection.
    Since theses minor changes, the UBE process one loop in the Do Section then just freeze when he goes at the top of the Do Section. We get ‘JDB_SetSelectionX was invoked with an invalid 0 number of selection criteria. Selection criteria need to be positive’ in the jde.log.
    We can’t process successfully the report and we can’t figure out what is the situation for this one.
    Any hints will be gladly welcome.

Leave a Reply