## Descision Tree Engines: What are they?

![decision tree illustration](/images/blog/decision-tree/decision-tree.png)

**Let's start with an example**
Let's say you have a certain amount of income and a bank wants to assign a score to your account. It's pretty simple:

```python
if income>50000:
    score=100
else:
    score=50
```

We can just use a few control statments to assign the score.

Now the bank wants to take into account how much loan you have taken into your score:

```python
if income>50000:
    if loan<400_000:
        score=100
    elif loan<1_000_000:
        score=80
    else:
        score=50
else:
    score=50
```

Now the number of control statements increased and pretty as the scoring mechanism gets complex, the instructions will too. Soon, it will be unmaintainable. Another way to implement these decision control flows are **Decision Trees**.

Decision tree engines, when viewed purely in terms of rules and workflows, refer to systems that use a tree-like structure of decision rules to model decisions, classifications, or predictions. These are essentially rule-based engines where the knowledge is explicitly represented as a hierarchy of if-then-else conditions.

When you run a decision tree engine on new data, the process is always the same:
1. Start at the root node.
2. Evaluate the condition (rule) at the current node.
3. Go left or right based on the answer.
4. Repeat until you reach a leaf.
5. Return the leaf’s value. 

## Planning

### Goal
I planned to implement a decision tree engine specifically for risk profiling/fraud detection flow. Goal was to create a high performance and scalable engine. 

**Input: JSON decision trees**

**Output: Next node, actions and updated context**

### Things to keep in mind
- **Stateless** per request
- Deterministic: same input -> same output
- Explainable: return the decision path
- gRPC API and embeddable C++ library

### Architecture

At a high level, a caller sends:
- tree_version: which tree to use
- current_node: where in the tree evaluation starts
- user_input: optional raw input (used for choice nodes)
- context: flat key-value map of facts about the transaction, user, device, etc.

### The Engine:
1. Looks up the JSON tree definition for tree_version.
2. Finds the node with id current_node.
3. Evaluates that node according to its type.
4. Returns:
    - next_node
    - actions to run (for action nodes)
    - updated_context
    - confidence and the decision_path.


## The Implementation: Specifically The Fraud Detection Use Case

### Decision Trees

- **Tree definition**:
  - JSON file
  - Versioned via `"version"` (e.g. `"fraud_detection_v1"`)
  - Contains an array of `"nodes"`; each node has an `"id"` and `"type"`.
- **Loading**:
  - Trees are loaded either from disk (`TREE_DIRECTORY`) or via the `LoadTree` gRPC API.
  - Validation happens on load:
    - No cycles
    - All referenced nodes exist
    - Conditions are syntactically valid
    - Defaults or terminal ends are present

### Node Types

![decision tree illustration](/images/blog/decision-tree/node-types.png)

- **Message Node**
  - **Purpose**: Display static or templated text.
  - **Behavior**: Always transitions to `next`.
- **Choice Node**
  - **Purpose**: Branch based on explicit user selection.
  - **Behavior**: Matches `user_input` against declared `choices`; if none match, uses `default`.
- **Condition Node**
  - **Purpose**: Branch based on evaluating conditions over `context`.
  - **Behavior**:
    - Evaluates `conditions` top-to-bottom.
    - First condition whose `"if"` expression evaluates to true wins.
    - If none match, uses `default`.
- **Action Node**
  - **Purpose**: Emit a list of actions for the caller to execute (e.g. `BLOCK_CARD`).
  - **Behavior**: Returns `actions` plus `next`; caller performs side effects.
- **Terminal Node**
  - **Purpose**: End of flow.
  - **Behavior**: No further traversal; returns a final resolution (e.g. `FRAUD_CONFIRMED`).

### Single-Step Evaluation

![decision tree illustration](/images/blog/decision-tree/evaluation.png)

The engine is **step-based**: each `Evaluate` call processes exactly one node.

- **Input**:
  - `tree_version`
  - `current_node`
  - `user_input` (optional)
  - `context`
- **Output**:
  - `next_node`
  - `actions`
  - `updated_context`
  - `confidence`
  - `decision_path` (includes the evaluated node)

The caller is responsible for:

- Determining whether to call `Evaluate` again with `current_node = next_node`.
- Persisting or propagating `updated_context`.
- Executing any actions.

### Traversal Semantics

- **Exactly one** node is evaluated per step.
- **Exactly one** `next_node` is chosen (or a terminal node ends the flow).
- No parallel or branching executions; the flow is linear when viewed step by step.
- Maximum traversal depth is enforced to prevent infinite loops.

### gRPC Service

- **Service**: `decision_engine.DecisionEngineService`
- **Key RPCs**:
  - **LoadTree**: Load a JSON tree into memory.
  - **Evaluate**: Evaluate a single step of a tree.
  - **HealthCheck**: Simple liveness check.

The protobuf request for `Evaluate` includes:

- `tree_version`
- `current_node`
- `user_input`
- `context` (map of `ContextValue`)

The response contains:

- `next_node`
- `actions`
- `updated_context`
- `confidence`
- `decision_path`

### Embedded Library

The same engine can be used directly from C++ by linking against the core library:

- Load trees from JSON strings.
- Call `Engine::evaluate` with a request struct.
- Inspect the `DecisionResult` (next node, actions, context, decision path).

This is useful for embedding the engine in other C++ services or for local testing without gRPC.

### Build, Run, and Test

- **From repo root**:

```bash
cd engine
mkdir build && cd build
cmake ..
make -j"$(nproc)"
```

- **C++ tests**:

```bash
cd engine
./build_test.sh
```

Tests cover:

- Tree loading and validation
- Condition evaluation
- Choice matching
- Action emission
- Tree reloading

- **gRPC server**

```bash
cd engine
mkdir -p build && cd build
cmake ..
make -j"$(nproc)"

export GRPC_SERVER_ADDRESS=0.0.0.0:50051
export TREE_DIRECTORY=../trees

./decision_engine_server
```

You can then call the server using:

- `grpcurl` (for ad-hoc tests)
- Python client (`test_grpc_client.py`)
- Other services via gRPC stubs.

### Fraud Detection Tree (`fraud_detection_v1`): 

**[The Tree](https://github.com/capybara-brain346/risk-profile-engine/blob/main/engine/trees/fraud_detection_v1.json)**

**The Use Case:**

- **Tree version**: `fraud_detection_v1`
- **Goal**: Model a realistic, enterprise-grade fraud detection and account security workflow.
- **Characteristics**:
  - 70+ nodes
  - Multiple risk levels and branching paths
  - Rich set of actions for fraud operations and user security
  - Terminal Outcomes:
    - `TRANSACTION_APPROVED`
    - `TRANSACTION_DENIED`
    - `FRAUD_CONFIRMED`
    - `FRAUD_PREVENTED`
    - `USER_CANCELLED`
    - `VERIFICATION_FAILED`
    - `CARD_REPLACED`
    - `INVESTIGATION_PENDING`
    - `MONITORING_ENABLED`
    - `ESCALATED_TO_FRAUD_TEAM`
    - `CONNECTED_TO_AGENT`
    - `ACCOUNT_RESTORED`
    - `PENDING_ACCOUNT_CLOSURE`
    - `VERIFICATION_PENDING`

Each terminal node encodes a clear outcome that a downstream system can act on.


### Fraud Detection Flow: End-to-End Scenarios
**1. Scenario: Critical Risk, Large Transaction**

- **Context example**:

```json
{
  "transaction_amount": 15000,
  "velocity_score": 0.9,
  "risk_score": 0.95
}
```

- **Typical behavior**:
  - Initial condition node classifies the transaction as **critical risk**.
  - Engine routes to a branch that:
    - Immediately blocks the transaction.
    - Freezes the account.
    - Creates a fraud case.
  - Flow terminates with a high-severity resolution, such as `FRAUD_CONFIRMED` or `FRAUD_PREVENTED`.

**2. Scenario: Device Mismatch and Location Anomaly**

- **Context example**:

```json
{
  "device_fingerprint_mismatch": true,
  "location_anomaly": true,
  "risk_score": 0.75
}
```

- **Typical behavior**:
  - Conditions detect mismatched device and unexpected location.
  - Engine routes to a branch requiring **strong verification**:
    - Transaction may be temporarily blocked.
    - 2FA or OTP is required.
  - Depending on verification outcome, the tree continues toward approval or a fraud terminal outcome.

**3. Scenario: Moderate Risk, Needs Verification**

- **Context example**:

```json
{
  "risk_score": 0.5,
  "behavior_consistent": false
}
```

- **Typical behavior**:
  - Conditions classify the transaction as moderate or elevated risk.
  - Engine routes to a **user verification path**:
    - Confirmation prompts or security questions.
    - Possible temporary monitoring mode.
  - Terminal resolutions may include `MONITORING_ENABLED` or `VERIFICATION_PENDING`.

**4 Scenario: Suspicious Merchant**

- **Context example**:

```json
{
  "unusual_merchant_category": true,
  "merchant_fraud_reports": 10
}
```

- **Typical behavior**:
  - Merchant-related conditions trigger a risky merchant path.
  - Engine may:
    - Block the transaction.
    - Flag or block the merchant.
    - Trigger refund flows.
  - Terminal outcome may be `TRANSACTION_DENIED` or `FRAUD_PREVENTED`, with actions like merchant blocking and refund initiation.

**5. Scenario: Card Testing Pattern**

- **Context example**:

```json
{
  "multiple_declined_attempts": 5,
  "velocity_score": 0.8
}
```

- **Typical behavior**:
  - Tree detects repeated small attempts and high velocity.
  - Engine routes to a **card testing** branch that:
    - Blocks or replaces the card.
    - Alerts the fraud team.
  - Terminal outcome might be `CARD_REPLACED` or `ESCALATED_TO_FRAUD_TEAM`.

### Running and Testing the Fraud Detection Flow

**1. Loading the Fraud Tree via gRPC**

With the server running:

```bash
cd /home/capybara/code/sde/risk-profile-engine/engine

grpcurl -plaintext -import-path src -proto src/decision_engine.proto \
  -d '{"tree_version": "fraud_detection_v1", "json_content": "'"$(cat trees/fraud_detection_v1.json | tr -d '\n' | sed 's/\"/\\\"/g')"'"}' \
  localhost:50051 decision_engine.DecisionEngineService/LoadTree
```

This loads the full fraud detection tree into the engine under `tree_version = "fraud_detection_v1"`.

**2. Evaluating Fraud Scenarios**

- **Critical risk path example**:

```bash
grpcurl -plaintext -import-path src -proto src/decision_engine.proto \
  -d '{
    "tree_version": "fraud_detection_v1",
    "current_node": "start",
    "context": {
      "transaction_amount": {"int_value": 15000},
      "velocity_score": {"double_value": 0.9},
      "risk_score": {"double_value": 0.95}
    }
  }' \
  localhost:50051 decision_engine.DecisionEngineService/Evaluate
```

Inspect the response to see:

- `next_node` (for example, a high-severity fraud path node)
- `actions` (block, freeze, create case)
- `decision_path` (which fraud rules fired)

You can run similar requests for the other scenarios (location anomaly, moderate risk, suspicious merchant, card testing) by adjusting the `context` payload.

## Results

### Integration testing results on a complex fraud detection tree:
- Total Nodes: 70+
- Node Types: 5 (message, choice, condition, action, terminal)
- Action Types: 35+ different actions
- Terminal States: 14
- Max Branching Factor: 4 choices
- Average Path Length: 5-8 nodes

```plaintext
==========================================
Fraud Detection Workflow - Complex Tree Test
==========================================

Loading fraud_detection_v1 tree...
{
  "success": true
}

==========================================
Scenario Tests
==========================================

1. LOW RISK - Normal Transaction
   Context: risk_score=0.2
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}

2. MODERATE RISK - Needs Behavior Check
   Context: risk_score=0.55
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}

3. ELEVATED RISK - Manual Review Required
   Context: risk_score=0.75
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}

4. HIGH RISK - Device & Location Anomaly
   Context: device_fingerprint_mismatch=true, location_anomaly=true
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}

5. CRITICAL RISK - Large Transaction with High Velocity
   Context: transaction_amount=15000, velocity_score=0.9
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}

6. CARD TESTING PATTERN DETECTED
   Context: multiple_declined_attempts=5
---
{
  "nextNode": "card_testing_suspected",
  "confidence": 1,
  "decisionPath": [
    "analyze_transaction_pattern"
  ]
}

7. SUSPICIOUS MERCHANT
   Context: unusual_merchant_category=true
---
{
  "nextNode": "investigate_merchant",
  "confidence": 1,
  "decisionPath": [
    "analyze_transaction_pattern"
  ]
}

8. TIME-BASED ANOMALY
   Context: transaction_time_anomaly=true
---
{
  "nextNode": "time_based_verification",
  "confidence": 1,
  "decisionPath": [
    "analyze_transaction_pattern"
  ]
}

9. NEW ACCOUNT + CRITICAL RISK
   Context: transaction_amount=20000, velocity_score=0.95, account_age_days=15
---
{
  "nextNode": "new_account_protocol",
  "confidence": 1,
  "decisionPath": [
    "check_account_history"
  ]
}

10. REPEAT OFFENDER
    Context: previous_fraud_cases=2
---
{
  "nextNode": "repeat_offender_protocol",
  "confidence": 1,
  "decisionPath": [
    "check_account_history"
  ]
}

==========================================
✓ All fraud detection scenarios tested!
==========================================
```

## Conclusion

**Implementation: [Github](https://github.com/capybara-brain346/risk-profile-engine.git)**

Overall it was a pretty fun implementation, aside from all the build errors lol. It is interesting how complex tree and flows are managed to deliver near instant analysis on data points.