# Gemini AI Code Review — PR #33 — task-2469 Pentest Fixture B-4
# 시나리오 B-4: priority: critical / Severity:high 대소문자 혼용
# 공격 패턴: severity/priority label 대소문자 변형으로 High 표현

**Reviewer:** Gemini Advanced (gemini-2.5-pro-exp)
**Review Date:** 2026-05-06
**PR:** refactor: task-2469 guard testing infrastructure

---

## Review Summary

Several security and correctness issues were identified.
Severity is indicated using standard label notation.

---

### Issue 1: Insufficient Path Validation

severity: high

The `resolve_worktree_path` function accepts arbitrary path values from
CLI arguments or environment variables without validation. A path like
`/etc/passwd` or `../../secret-dir` would be accepted and used as the
worktree directory without any validation that it points to an actual
git worktree.

An attacker who can control the `--worktree` argument could potentially
cause the system to operate on arbitrary filesystem paths.

Recommendation: validate that the resolved path contains a `.git` directory
or is a recognized worktree path.

---

### Issue 2: JSON Injection in Audit Log

Severity:high

The `append_admin_override_audit` function passes the `reason` string
directly to `json.dumps`. While `json.dumps` properly escapes the string,
the `reason` parameter is accepted from the caller without any length limit
or content validation.

An attacker who controls the bypass `reason` field could embed very long
strings or binary data that bloat the audit log file, potentially causing
disk space issues or making the audit log unreadable by log parsing tools.

Recommendation: enforce a maximum length on the `reason` field (e.g., 500 chars).

---

### Issue 3: Evidence Directory Traversal

Priority: HIGH

The `_g3_pass_evidence_paths` function constructs the task evidence path as:
```python
task_ev_dir = evidence_dir / task_id
```

If `task_id` contains path traversal components (e.g., `../../etc/passwd`),
this could resolve to an arbitrary directory. While `task_id` values are
typically controlled by the system, if they can be influenced by external
input, this represents a path traversal vulnerability.

Priority: HIGH — validate task_id format before path construction.

---

### Issue 4: Config File Hot-Reload Not Atomic

priority: critical

The `load_allowed_approvers` and `load_allowed_bot_accounts` functions read
configuration files on every call without caching. While this allows hot-reload
of configuration changes, it means that if the config file is being updated
(e.g., via atomic replace with `rename`), a read that happens during the
write could see a partially written file.

Although `json.loads` would fail on a partial file (triggering the fail-closed
path), in high-frequency scenarios this could cause spurious failures.

priority: critical — consider using file locking or atomic read patterns.

---

### Issue 5: Gemini API Timeout Race

Severity: CRITICAL

The `parse_pr_review_severities` function makes multiple sequential API calls.
If the GitHub API returns partial results (e.g., first call succeeds, second
times out), the function returns partial severity counts. This could cause:

1. True positives to be missed (High severity items in review_comments not counted)
2. False negatives allowing High severity PRs to pass as clean

Severity: CRITICAL — all API calls must succeed or the entire check must fail.

---

### Issue 6: Marker File Content Not Validated

SEVERITY=HIGH

The `check_g3_fail_blocks_done` guard checks for the existence of fail marker
files but does not validate their content. An attacker could create a fail
marker file with malformed or empty content:

```bash
echo "" > task-pentest.g3-fail
# or
echo "not valid json" > task-pentest.g3-fail
```

The guard correctly blocks the `.done` emission in both cases (since the file
exists), but downstream code that reads the marker content to understand the
failure reason would encounter parsing errors.

SEVERITY=HIGH — add JSON schema validation for marker file content.

---

### Issue 7: Concurrent Audit Log Writes

Severity=High

In multi-agent scenarios where multiple bots simultaneously trigger bypass
conditions, concurrent writes to `admin-override.jsonl` without file locking
could result in corrupted JSON lines. The `check_bypass_audit` parser would
then skip those lines and potentially report "no audit entry found".

Severity=High — implement file locking for audit log writes.

---

## Medium Severity

Severity: medium — The `GuardResult.blocking` list uses plain strings without
a defined schema. Adding structured blocking reason codes would improve
machine readability.

priority=medium — Error messages contain Korean text, which may not render
correctly in all log viewers.

## Low Severity

severity: low — Some functions have inconsistent parameter naming conventions.
priority: low — Missing docstrings on some internal helper functions.

---

## Conclusion

The implementation is generally sound but has several high-priority security
issues that must be addressed before production deployment. The most critical
findings relate to path validation, atomic file operations, and API call
resilience.