Hi Folks,
My intention of this post is to provide a
quick reference for Manual Recovery of Faults within the SOA.
It aims to present some of the valuable
information regarding Manual recovery in one place.
Introduction:
Integration flows can fail at run-time
with a variety of errors. The cause of these failures could be either Business
errors or System errors. When
Synchronous Integration Flows fail, they are restarted from the beginning. On
the other hand, Asynchronous Integration flows when they error can potentially
be resubmitted/recovered from designated/pre-configured milestones within the
flow. These milestones could be persistence points like queues topics or
database tables, where the state of the flow was last persisted. Recovery is a
mechanism whereby a faulted Asynchronous Flow can be rerun from such a
persistence milestone
BPEL Message Recovery:
To understand the BPEL Message Recovery,
let us briefly look into how BPEL Service engine performs asynchronous
processing. Asynchronous BPEL processes use an intermediate Delivery Store in
the SOA Infrastructure Database to store the incoming request. The message is
then picked up and further BPEL
processing happens in an Invoke Thread. The
Invoke Thread is one among the free threads from the ‘Invoke Thread Pool’
configured for BPEL Service Engine. The processing of the message from the
delivery Store onwards until the next dehydration in the BPEL process or the
next commit point in the flow constitutes a transaction. Figure below shows at
a high level the Asynchronous request handling by BPEL Invoke Thread. Any
unhandled errors during this processing will cause the message to roll back to
the delivery Store.
During Recovery of these messages, the
end user cannot make any modifications to the original payload. The messages
marked recoverable can either be recovered or aborted. In the former case, the
original message is simply redelivered for processing again.
The BPEL Configuration property
‘MaxRecoverAttempt’ determines the number of times a message can be recovered
manually or automatically. Messages go to the exhausted state after reaching
the MaxRecoverAttempt. They can be selected and ‘Reset’ back to make them
available for manual/automatic recovery again.
A Fault Policy with configurable Actions
to be bound to SOA Component. These can be attached at the Composite, Component
or Reference levels. The configured Actions will be executed when the
invocation fails. The available Actions could be retry, abort, human
intervention, custom java callout, etc. When the Action applied is human
intervention the faults become available for Manual Recovery from the Oracle
Enterprise Manager Fusion Middleware Control [EM FMWC Console]. They show up as
recoverable instances in the faults tab of ‘SOA->faults and rejected
messages’
Automatic recovery program for pending
BPEL call back messages:
BPEL engine maintains all async call
back messages into database table called dlv_message. You can see such all
messages in BPEL console call-back manual recovery area.The query being used by
bpel console is joined on dlv_message and work_item tables.This query simply
picks up all call back messages which are undelivered and have not been
modified with in certain threshold time.
Call-back messages are processed in
following steps
· BPEL engine assigns the
call-back message to delivery service
· Delivery service saves the
message into dlv_message table with state 'UNDELIVERED-0'
· Delivery service schedules
a dispatcher thread to process message asynchronously
· Dispatcher thread enqueues
message into JMS queue
· Message is picked up by
MDB
· MDB delivers the message
to actual BPEL process waiting for
call-back and changes state to 'HANDLED=2'
So given above steps, there is always
possibility that message is available in dlv_message table but MDB is
failed in
delivering it to BPEL process which keeps message always in state= 0.
Recovering the instances from recovery:
The instances in the recovery queue can
be recovered manually to continue the processing.
Below are the some of the reasons the
instances to go to manual recovery.
1.There are not enough threads
or memory to process the message.
2.The server shuts down or
crash before it finishes processing the BPEL message
3.The engine could not finish
processing the message before reaching the time-out as dictated by the
transaction-timeout configuration
BPEL process manager has a nice UI for
looking at and managing these, but what if we need to be alerted
when a process
goes into one of these states? Well, BPEL PM doesn't have that capability if we
want
we can write a custom code for that or else we manually go and reinitiate
the recoverable instances in EM.
Recovering the BPEL instances:-
1. Login to EM console
2. Right click on
soa-infra ,Click on Service Engine
--> BPEL
3. Click on Recovery tab
4. Change the Type
accordingly(Invoke,Activity,Callback) and the Message state to
“Undelivered”
and click on search
5. All the recoverable messages
that match the criteria will be displayed.
6. Select the required messages
and click on Recovery button.
Auto Recovery feature in BPEL
Configuration:
Auto Recovery’ configuration is done by
setting few of the MBean properties in EM console.
To configure it in EM
console one should navigate to soa-infra -> SOA Administration ->
BPEL
Properties -> More BPEL Configuration Properties -> RecoveryConfig.
This will bring up the following screen
showing the default parameters. BPEL Auto recovery is enabled
by default.The properties startWindowTime and
stopWindowTime specify the period during
which Auto Recovery is active. By
default auto recovery feature will be active from 12AM to 4AM everyday
(remember that it’s SOA server time), shown in above screenshot. We can change
these settings by simply updating the time values in 24 hr format and do click
on Apply.
The property maxMessageRaiseSize
specifies the number of messages to be sent in each recovery attempt, in effect
resembles the batch size.
The property subsequentTriggerDelay
specifies interval between consecutive auto recovery attempts and the value is
300 sec by default.
The property threshHoldTimeInMinutes is
used by BPEL engine, to mark particular instance eligible for auto recovery
once the recoverable fault occurs which is 10 min by default.
If we observe closely, none of these
properties mention about number of recovery attempts to be made which is
altogether a separate MBean property. To set, navigate to soa-infra -> SOA
Administration -> BPEL Properties -> More BPEL Configuration Properties
-> MaxRecoverAttempt. The default value is 2.
To disable ‘Auto Recovery’, set the
maxMessageRaiseSize property value to 0 as shown above.
Auto Recovery Behavior:
Whenever a recoverable fault (this term
is more abstract, I verified this behavior with Remote, Binding and User
Defined Faults) occurs during the BPEL processing, it will be visible in
Recovery console. If Auto Recovery is enabled, after threshHoldTimeInMinutes
BPEL runtime will try to auto recover the instance. If it’s not successful,
again number of recovery attempts will be made as given for MaxRecoverAttempt
with an
interval as given forsubsequentTriggerDelay. If instance fails even
after these maximum recover attempts,
the instance will be marked as exhausted
(can be queried on recovery console using message state as
exhausted). We can
use ‘Reset’ button to make these instances eligible for Auto Recovery again.
Note that, we observe this behavior only
when the fault is thrown back to BPEL runtime or fault is not caught within
BPEL process.
SOA 11.1.1.6 New Features for BPEL Message Recovery
SOA 11.1.1.6 added an important feature to the Message Recovery
subject.
SOA 11.1.1.6 added
more pro-active alerts for BPEL stuck messages. A small part of this feature
was available since SOA 11.1.1.5 but just for the Composite Flow Trace, so you
had to know the flow trace where you could have problems. Now clicking the soa-infra
inside EM, we see a global alert that there are messages needing recovery:
Also, we have the same
alert when clicking a composite which has some of these messages pending
recovery while it disappears when we move to a composite which does not have
such messages.
By clicking Show
Details, we can see how many messages are pending recovery, grouped by type:
And clicking Go to BPEL
Recovery Console will redirect you to the BPEL recovery console where you can
recover or cancel the message:
Well, that’s it. This new
feature is simple yet very powerful, helping SOA administrators to get alerts
when messages need recovery and be more pro-active when administering the SOA
environment.
Happy Learning...!!!!!!!!!!!! Fun
Sharing.........!!!!!!!!!
ReplyDeleteIt is very good blog and useful for students and developer ,Thanks for sharing
Oracle SOA Online Training
Oracle SOA Online Training Bangalore
Oracle SOA Online Training Hyderabad