LangSmith

Mark as read

LangSmith 包括的技術ガイド

LangSmith とは
主要機能の概要
コンプライアンスとセキュリティ認証
アーキテクチャ
トレーシング
評価（Evaluation）
データセット管理
モニタリングとダッシュボード
プロンプト管理
アノテーションとヒューマンフィードバック
オンライン評価
SDK と API
インテグレーション
デプロイメントオプション
セキュリティとプライシング
環境変数リファレンス

1. LangSmith とは

1.1 概要

LangSmith は、LangChain 社が開発した LLM（大規模言語モデル）アプリケーションのライフサイクル全体を管理するための統合プラットフォームである。開発、テスト、デプロイ、モニタリングという LLM アプリケーションの各フェーズにおいて、包括的なツールセットを提供する。

LangSmith の主な目的は以下の通りである：

可観測性（Observability）: LLM アプリケーションの内部動作を詳細にトレースし、各ステップの入出力、レイテンシ、トークン使用量を可視化する
評価（Evaluation）: LLM の出力品質を体系的に評価し、リグレッションを検出する
プロンプト管理（Prompt Management）: プロンプトのバージョン管理、共有、テストを一元的に行う
モニタリング（Monitoring）: 本番環境における LLM アプリケーションのパフォーマンスをリアルタイムで監視する
データセット管理（Dataset Management）: 評価用データセットの作成、バージョニング、共有を管理する

1.2 LangChain エコシステムにおける位置づけ

LangChain エコシステムは以下のコンポーネントで構成されている：

+------------------------------------------------------------------+
|                    LangChain エコシステム                          |
|                                                                    |
|  +--------------------+  +--------------------+  +--------------+ |
|  |    LangChain       |  |    LangGraph       |  |  LangServe   | |
|  |  (フレームワーク)   |  |  (エージェント     |  | (デプロイ)    | |
|  |  - Chains          |  |   オーケストレー   |  |  - REST API  | |
|  |  - Prompts         |  |   ション)          |  |  - Streaming | |
|  |  - Retrievers      |  |  - ステートマシン  |  |              | |
|  |  - Memory          |  |  - マルチエージェ  |  |              | |
|  |  - Tools           |  |   ント             |  |              | |
|  +--------+-----------+  +--------+-----------+  +------+-------+ |
|           |                       |                      |         |
|           +----------+------------+----------------------+         |
|                      |                                             |
|                      v                                             |
|  +----------------------------------------------------------+     |
|  |                    LangSmith                               |    |
|  |  (開発プラットフォーム / DevOps for LLM)                   |    |
|  |                                                            |    |
|  |  [トレーシング] [評価] [モニタリング] [プロンプト管理]     |    |
|  |  [データセット] [アノテーション] [オンライン評価]           |    |
|  +----------------------------------------------------------+     |
+------------------------------------------------------------------+

LangSmith は LangChain フレームワークと深く統合されているが、LangChain に依存せず独立して利用可能である。OpenAI SDK、Anthropic SDK、その他のフレームワークからも直接利用できる。

1.3 LLM アプリケーション開発のライフサイクル

LangSmith は以下のライフサイクルをカバーする：

+----------+     +---------+     +----------+     +------------+
| プロト   | --> | テスト  | --> | デプロイ | --> | モニタリ   |
| タイプ   |     | & 評価  |     |          |     | ング       |
+----------+     +---------+     +----------+     +------------+
     |                |               |                  |
     |   Playground   |  Datasets     |  Prompt Hub  |  Dashboards
     |   Tracing      |  Evaluators   |  LangServe   |  Alerts
     |   @traceable   |  Experiments  |              |  Online Eval
     |                |               |                  |
     +----------------+---------------+------------------+
                       LangSmith

プロトタイピング: Playground でプロンプトを試行し、トレーシングで内部動作を確認
テストと評価: データセットを用いた体系的な評価、実験の比較
デプロイ: Prompt Hub を通じたプロンプトの本番反映
モニタリング: ダッシュボードによるリアルタイム監視、オンライン評価による品質保証

2. 主要機能の概要

LangSmith が提供する主要機能を以下にまとめる：

機能カテゴリ	機能名	説明
可観測性	トレーシング	LLM 呼び出しの全ステップを記録・可視化
可観測性	モニタリング	リアルタイムメトリクスとダッシュボード
可観測性	オンライン評価	本番トレースに対する自動品質評価
テスト	評価（Evaluation）	データセットベースの体系的な品質テスト
テスト	実験管理	評価結果の比較・追跡
テスト	アノテーション	ヒューマンレビューによる品質評価
管理	プロンプト管理	プロンプトのバージョン管理・共有
管理	データセット管理	テストデータの作成・管理
デプロイ	Prompt Hub	プロンプトの本番デプロイ管理
コラボ	ワークスペース	チーム間の共有とアクセス制御

2.1 機能間の連携

LangSmith の各機能は密接に連携している：

                    +-------------------+
                    |   本番トラフィック  |
                    +--------+----------+
                             |
                    +--------v----------+
                    |    トレーシング     |
                    +--------+----------+
                             |
              +--------------+--------------+
              |              |              |
    +---------v----+ +------v-------+ +----v----------+
    | モニタリング  | |オンライン評価 | | データセット   |
    | ダッシュボード | |              | | への追加       |
    +---------+----+ +------+-------+ +----+----------+
              |              |              |
              |              |     +--------v----------+
              |              |     |    評価            |
              |              |     |  (Evaluation)      |
              |              |     +--------+----------+
              |              |              |
              +--------------+--------------+
                             |
                    +--------v----------+
                    |   改善サイクル      |
                    |  プロンプト修正     |
                    |  モデル変更        |
                    +-------------------+

3. コンプライアンスとセキュリティ認証

LangSmith はエンタープライズグレードのセキュリティとコンプライアンスを提供する：

3.1 取得認証

認証/規格	状態	説明
SOC 2 Type II	取得済み	セキュリティ、可用性、処理の完全性に関する監査レポート
HIPAA	対応可能	Enterprise Plus プランで BAA（Business Associate Agreement）締結可能
GDPR	準拠	EU 一般データ保護規則に準拠したデータ処理

3.2 データ保護

転送時の暗号化: TLS 1.2 以上による通信暗号化
保存時の暗号化: AES-256 による保存データの暗号化
データリージョン: US および EU リージョンの選択が可能
データ保持: カスタマイズ可能なデータ保持ポリシー（Short-lived / Extended / Base data retention TTL の設定）
セルフホスト: 完全なデータ主権が必要な場合のオンプレミスデプロイ

3.3 コンプライアンス関連の設定例

# データ保持ポリシーの設定（API 経由）
import requests

headers = {
    "x-api-key": "lsv2_pt_xxxxxxxx",
    "Content-Type": "application/json"
}

# ワークスペースレベルのデータ保持設定
response = requests.patch(
    "https://api.smith.langchain.com/api/v1/workspaces/current",
    headers=headers,
    json={
        "data_retention_days": 90,  # 90日間のデータ保持
        "extended_data_retention_days": 400  # 拡張保持期間
    }
)

3.4 セキュリティアーキテクチャの概要

+----------------------------------------------------------+
|                    クライアント側                          |
|  +--------------------------------------------------+    |
|  |  API Key (lsv2_pt_xxx) による認証                |    |
|  |  TLS 1.2+ による暗号化通信                        |    |
|  +--------------------------------------------------+    |
+----------------------------------------------------------+
                          |
                          v
+----------------------------------------------------------+
|                    LangSmith Platform                     |
|  +--------------------------------------------------+    |
|  |  認証・認可レイヤー                               |    |
|  |  - API Key 検証                                   |    |
|  |  - OAuth 2.0 / OIDC (SSO)                        |    |
|  |  - RBAC (ロールベースアクセス制御)                |    |
|  |  - ABAC (属性ベースアクセス制御)                  |    |
|  +--------------------------------------------------+    |
|  +--------------------------------------------------+    |
|  |  データレイヤー                                   |    |
|  |  - AES-256 保存時暗号化                           |    |
|  |  - データリージョン分離                           |    |
|  |  - 監査ログ                                       |    |
|  +--------------------------------------------------+    |
+----------------------------------------------------------+

4. アーキテクチャ

4.1 クラウド版アーキテクチャ

LangSmith のクラウド版は以下のコンポーネントで構成される：

+-----------------------------------------------------------------------+
|                        LangSmith Cloud Architecture                    |
|                                                                        |
|  +-------------------+       +-------------------+                     |
|  |   Web Frontend    |       |   SDK Clients     |                     |
|  |   (React SPA)     |       |   (Python / TS)   |                     |
|  +--------+----------+       +--------+----------+                     |
|           |                           |                                |
|           |  HTTPS                    |  HTTPS (REST API)              |
|           v                           v                                |
|  +----------------------------------------------------------+         |
|  |              API Gateway / Load Balancer                   |        |
|  +----------------------------------------------------------+         |
|           |                           |                                |
|           v                           v                                |
|  +-------------------+       +-------------------+                     |
|  |  Backend API      |       |  Ingestion API    |                     |
|  |  (FastAPI)        |       |  (高スループット)  |                     |
|  |                   |       |  - バッチ処理      |                     |
|  |  - CRUD 操作      |       |  - 非同期書込み   |                     |
|  |  - 認証/認可      |       |  - 圧縮対応       |                     |
|  |  - クエリ処理     |       |                   |                     |
|  +--------+----------+       +--------+----------+                     |
|           |                           |                                |
|           v                           v                                |
|  +----------------------------------------------------------+         |
|  |                   Queue Service                            |        |
|  |              (非同期メッセージキュー)                       |        |
|  |  - トレースデータのバッファリング                          |        |
|  |  - バックグラウンドジョブ管理                              |        |
|  |  - オンライン評価のトリガー                                |        |
|  +----------------------------------------------------------+         |
|           |              |              |              |               |
|           v              v              v              v               |
|  +------------+  +------------+  +----------+  +---------------+      |
|  | ClickHouse |  | PostgreSQL |  |  Redis   |  | Blob Storage  |      |
|  |            |  |            |  |          |  | (S3/GCS)      |      |
|  | トレース   |  | メタデータ |  | キャッシュ|  | 添付ファイル  |      |
|  | データ     |  | ユーザー   |  | セッション|  | 大規模データ  |      |
|  | メトリクス |  | 設定       |  | レート   |  |               |      |
|  | 実行ログ   |  | データセット|  | リミット |  |               |      |
|  +------------+  +------------+  +----------+  +---------------+      |
+-----------------------------------------------------------------------+

4.2 各コンポーネントの役割

4.2.1 Frontend（フロントエンド）

技術スタック: React SPA（Single Page Application）
機能: トレースビューア、ダッシュボード、データセット管理 UI、プロンプトエディタ、実験比較ビュー
通信: Backend API への REST API 呼び出し

4.2.2 Backend API（バックエンド API）

技術スタック: Python / FastAPI
機能:
- ユーザー認証・認可
- トレースデータのクエリ・検索
- データセット CRUD 操作
- 評価ジョブの管理
- プロンプト管理
- ワークスペース管理

4.2.3 Ingestion API（インジェスト API）

目的: 高スループットのトレースデータ取り込み
特徴:
- バッチ処理対応（複数トレースの一括送信）
- gzip 圧縮サポート
- 非同期書き込みによる低レイテンシ
- バックプレッシャー制御

4.2.4 Queue Service（キューサービス）

機能:
- トレースデータの非同期処理パイプライン
- オンライン評価のトリガーとスケジューリング
- バックグラウンドジョブ（データ集約、アラート評価等）

4.2.5 ストレージレイヤー

ストレージ	用途	特徴
ClickHouse	トレースデータ、メトリクス、実行ログ	カラムナーストレージによる高速分析クエリ
PostgreSQL	メタデータ、ユーザー情報、設定、データセット	ACID 準拠のリレーショナルデータ
Redis	キャッシュ、セッション管理、レートリミット	低レイテンシのインメモリストア
Blob Storage	添付ファイル、大規模入出力データ	S3/GCS 互換のオブジェクトストレージ

4.3 セルフホスト版アーキテクチャ

セルフホスト版は、クラウド版と同等の機能をオンプレミスまたはプライベートクラウドで提供する：

+-----------------------------------------------------------------------+
|                    Self-Hosted Architecture                             |
|                                                                        |
|  +------------------------------+  +------------------------------+   |
|  |    langsmith-frontend        |  |    langsmith-backend         |   |
|  |    (Nginx + React)           |  |    (FastAPI Application)     |   |
|  |    Port: 1980                |  |    Port: 1984                |   |
|  +------------------------------+  +------------------------------+   |
|                                                                        |
|  +------------------------------+  +------------------------------+   |
|  |    langsmith-queue           |  |    langsmith-playground      |   |
|  |    (Background Workers)      |  |    (Interactive Testing)     |   |
|  |    Port: N/A (internal)      |  |    Port: 3001                |   |
|  +------------------------------+  +------------------------------+   |
|                                                                        |
|  +------------------------------+  +------------------------------+   |
|  |    langsmith-hub-backend     |  |    langsmith-platform-backend|   |
|  |    (Prompt Hub Service)      |  |    (Platform APIs)           |   |
|  |    Port: 1985                |  |    Port: 1986                |   |
|  +------------------------------+  +------------------------------+   |
|                                                                        |
|  =========================  Storage Layer  =========================   |
|                                                                        |
|  +----------------+  +-------------+  +--------+  +--------------+    |
|  |  ClickHouse    |  | PostgreSQL  |  | Redis  |  | MinIO/S3     |    |
|  |  Port: 8123    |  | Port: 5432  |  | :6379  |  | Port: 9000   |    |
|  |  (HTTP)        |  |             |  |        |  |              |    |
|  |  Port: 9000    |  |             |  |        |  |              |    |
|  |  (Native)      |  |             |  |        |  |              |    |
|  +----------------+  +-------------+  +--------+  +--------------+    |
+-----------------------------------------------------------------------+

4.4 Docker Compose によるセルフホスト構成例

# docker-compose.yml (LangSmith Self-Hosted)
version: "3.8"

services:
  langsmith-frontend:
    image: docker.io/langchain/langsmith-frontend:latest
    ports:
      - "1980:1980"
    environment:
      - VITE_BACKEND_AUTH_TYPE=none  # or "oauth"
    depends_on:
      - langsmith-backend

  langsmith-backend:
    image: docker.io/langchain/langsmith-backend:latest
    ports:
      - "1984:1984"
    environment:
      - LANGSMITH_LICENSE_KEY=${LANGSMITH_LICENSE_KEY}
      - POSTGRES_DATABASE_URI=postgres://langsmith:langsmith@postgres:5432/langsmith
      - REDIS_DATABASE_URI=redis://redis:6379
      - CLICKHOUSE_HOST=clickhouse
      - CLICKHOUSE_PORT=8123
      - CLICKHOUSE_NATIVE_PORT=9000
      - CLICKHOUSE_DB=default
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_PASSWORD=password
      - BLOB_STORAGE_BUCKET_NAME=langsmith-blobs
      - BLOB_STORAGE_API_URL=http://minio:9000
      - BLOB_STORAGE_ACCESS_KEY=minioadmin
      - BLOB_STORAGE_ACCESS_KEY_SECRET=minioadmin
      - LOG_LEVEL=info
      - AUTH_TYPE=none
      - API_KEY_SALT=super-secret-salt
      - INITIAL_ORG_ADMIN_EMAIL=admin@example.com
      - INITIAL_ORG_ADMIN_PASSWORD=admin-password
    depends_on:
      - postgres
      - clickhouse
      - redis
      - minio

  langsmith-queue:
    image: docker.io/langchain/langsmith-backend:latest
    command: ["saq", "app.workers.queues.single_queue_worker.settings"]
    environment:
      - LANGSMITH_LICENSE_KEY=${LANGSMITH_LICENSE_KEY}
      - POSTGRES_DATABASE_URI=postgres://langsmith:langsmith@postgres:5432/langsmith
      - REDIS_DATABASE_URI=redis://redis:6379
      - CLICKHOUSE_HOST=clickhouse
      - CLICKHOUSE_PORT=8123
      - CLICKHOUSE_NATIVE_PORT=9000
      - BLOB_STORAGE_BUCKET_NAME=langsmith-blobs
      - BLOB_STORAGE_API_URL=http://minio:9000
      - BLOB_STORAGE_ACCESS_KEY=minioadmin
      - BLOB_STORAGE_ACCESS_KEY_SECRET=minioadmin
    depends_on:
      - postgres
      - clickhouse
      - redis
      - minio

  langsmith-playground:
    image: docker.io/langchain/langsmith-playground:latest
    ports:
      - "3001:3001"

  langsmith-hub-backend:
    image: docker.io/langchain/langsmith-hub-backend:latest
    ports:
      - "1985:1985"
    environment:
      - POSTGRES_DATABASE_URI=postgres://langsmith:langsmith@postgres:5432/langsmith
      - REDIS_DATABASE_URI=redis://redis:6379
    depends_on:
      - postgres
      - redis

  # ==================== Storage ====================
  postgres:
    image: postgres:16
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_USER=langsmith
      - POSTGRES_PASSWORD=langsmith
      - POSTGRES_DB=langsmith
    volumes:
      - postgres-data:/var/lib/postgresql/data

  clickhouse:
    image: clickhouse/clickhouse-server:24.2
    ports:
      - "8123:8123"
      - "9000:9000"
    environment:
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_PASSWORD=password
      - CLICKHOUSE_DB=default
    volumes:
      - clickhouse-data:/var/lib/clickhouse

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data

  minio:
    image: minio/minio:latest
    ports:
      - "9090:9000"
      - "9091:9001"
    environment:
      - MINIO_ROOT_USER=minioadmin
      - MINIO_ROOT_PASSWORD=minioadmin
    command: server /data --console-address ":9001"
    volumes:
      - minio-data:/data

volumes:
  postgres-data:
  clickhouse-data:
  redis-data:
  minio-data:

4.5 データフロー

トレースデータのライフサイクル：

SDK Client                    LangSmith Platform
+----------+                  +------------------------------------------+
|          |   1. POST /runs  |                                          |
|  Python  | ---------------> |  Ingestion API                           |
|  or TS   |   (バッチ,      |    |                                     |
|  SDK     |    gzip圧縮)     |    | 2. キューに格納                    |
|          |                  |    v                                     |
|          |                  |  Queue Service                           |
|          |                  |    |                                     |
|          |                  |    | 3. 非同期処理                       |
|          |                  |    |                                     |
|          |                  |    +---> ClickHouse (トレースデータ)     |
|          |                  |    +---> PostgreSQL (メタデータ)          |
|          |                  |    +---> Blob Storage (大規模データ)     |
|          |                  |    |                                     |
|          |                  |    | 4. オンライン評価トリガー           |
|          |                  |    v                                     |
|          |   5. GET /runs   |  Evaluation Workers                     |
|          | <--------------- |    |                                     |
|          |   (クエリ結果)   |    +--> フィードバック結果を保存         |
+----------+                  +------------------------------------------+

5. トレーシング

トレーシングは LangSmith の最も基本的かつ重要な機能である。LLM アプリケーションの各ステップを詳細に記録し、デバッグ、パフォーマンス分析、コスト追跡を可能にする。

5.1 環境変数による設定

トレーシングを有効にするための最小限の環境変数設定：

# 必須: トレーシングの有効化
export LANGSMITH_TRACING=true

# 必須: API キー
export LANGSMITH_API_KEY="lsv2_pt_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# オプション: エンドポイント（デフォルトは SaaS）
export LANGSMITH_ENDPOINT="https://api.smith.langchain.com"

# オプション: プロジェクト名（デフォルトは "default"）
export LANGSMITH_PROJECT="my-llm-project"

# セルフホストの場合
# export LANGSMITH_ENDPOINT="https://langsmith.your-domain.com"

5.2 トレース構造

LangSmith のトレース構造は階層的に構成される：

+-------------------------------------------------------------+
|  Project: "my-chatbot-app"                                   |
|                                                               |
|  +----------------------------------------------------------+|
|  |  Trace (trace_id: abc-123)                                ||
|  |  Thread ID: thread-456                                    ||
|  |                                                            ||
|  |  +-----------------------------------------------------+ ||
|  |  | Run: "ChatBot" (type: chain)                         | ||
|  |  | run_id: run-001                                      | ||
|  |  | start_time: 2024-01-15T10:00:00Z                     | ||
|  |  | total_time: 2.5s                                     | ||
|  |  |                                                       | ||
|  |  |  +------------------------------------------------+  | ||
|  |  |  | Child Run: "retrieve_docs" (type: retriever)    |  | ||
|  |  |  | run_id: run-002                                 |  | ||
|  |  |  | time: 0.3s                                      |  | ||
|  |  |  | input: "What is LangSmith?"                     |  | ||
|  |  |  | output: [doc1, doc2, doc3]                      |  | ||
|  |  |  +------------------------------------------------+  | ||
|  |  |                                                       | ||
|  |  |  +------------------------------------------------+  | ||
|  |  |  | Child Run: "format_prompt" (type: prompt)       |  | ||
|  |  |  | run_id: run-003                                 |  | ||
|  |  |  | time: 0.01s                                     |  | ||
|  |  |  +------------------------------------------------+  | ||
|  |  |                                                       | ||
|  |  |  +------------------------------------------------+  | ||
|  |  |  | Child Run: "gpt-4" (type: llm)                 |  | ||
|  |  |  | run_id: run-004                                 |  | ||
|  |  |  | time: 2.1s                                      |  | ||
|  |  |  | tokens: {prompt: 1500, completion: 300}         |  | ||
|  |  |  | cost: $0.063                                    |  | ||
|  |  |  | model: "gpt-4-turbo"                            |  | ||
|  |  |  +------------------------------------------------+  | ||
|  |  |                                                       | ||
|  |  +-----------------------------------------------------+ ||
|  +----------------------------------------------------------+||
+-------------------------------------------------------------+

主要概念

概念	説明
Project	トレースのグルーピング単位。環境別（dev/staging/prod）やアプリケーション別に分ける
Trace	1つのリクエストの完全な実行記録。ルート Run とすべての子 Run を含む
Run	トレース内の個別のステップ。chain, llm, tool, retriever, prompt などのタイプがある
Thread	会話のグルーピング。同一の thread_id を持つトレースは1つの会話としてまとめられる

5.3 LangChain 自動トレーシング

LangChain を使用している場合、環境変数を設定するだけで自動的にトレーシングが有効になる：

import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "lsv2_pt_xxxxxxxx"
os.environ["LANGSMITH_PROJECT"] = "my-langchain-app"

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# LangChain のコンポーネントは自動的にトレースされる
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "{question}")
])

model = ChatOpenAI(model="gpt-4-turbo", temperature=0)
parser = StrOutputParser()

# LCEL チェーンの各ステップが自動記録される
chain = prompt | model | parser

# この呼び出しは自動的にトレースされ、LangSmith に送信される
result = chain.invoke({"question": "What is LangSmith?"})

5.4 @traceable デコレータ

LangChain を使用しない場合や、カスタム関数をトレースしたい場合に使用する：

from langsmith import traceable
from openai import OpenAI

client = OpenAI()

@traceable(
    name="generate_response",       # トレースに表示される名前
    run_type="chain",               # Run タイプ: chain, llm, tool, retriever
    project_name="my-custom-app",   # プロジェクト名（環境変数のオーバーライド）
    tags=["production", "v2"],      # フィルタリング用タグ
    metadata={"version": "2.0"}     # メタデータ
)
def generate_response(question: str, context: str = "") -> str:
    """カスタム関数のトレーシング例"""
    response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=[
            {"role": "system", "content": f"Context: {context}"},
            {"role": "user", "content": question}
        ],
        temperature=0.7
    )
    return response.choices[0].message.content

@traceable(name="rag_pipeline", run_type="chain")
def rag_pipeline(question: str) -> str:
    """RAG パイプラインの例"""
    # 各関数呼び出しは子 Run として記録される
    docs = retrieve_documents(question)
    context = format_context(docs)
    response = generate_response(question, context)
    return response

@traceable(name="retrieve_documents", run_type="retriever")
def retrieve_documents(query: str) -> list:
    """ドキュメント検索"""
    # ベクトルストアからの検索ロジック
    return ["doc1", "doc2", "doc3"]

@traceable(name="format_context", run_type="tool")
def format_context(docs: list) -> str:
    """コンテキストのフォーマット"""
    return "\n".join(docs)

# 実行
result = rag_pipeline("What is LangSmith?")

5.5 @traceable の入出力カスタマイズ

from langsmith import traceable

# 入出力のフィルタリング（機密情報の除外）
@traceable(
    name="process_user_data",
    # 入力のフィルタリング
    process_inputs=lambda inputs: {
        "user_id": inputs.get("user_id"),
        "query": inputs.get("query"),
        # password は除外
    },
    # 出力のフィルタリング
    process_outputs=lambda output: {
        "status": output.get("status"),
        # 機密データは除外
    }
)
def process_user_data(user_id: str, query: str, password: str) -> dict:
    return {"status": "success", "data": "sensitive_data"}

# 非同期関数のトレーシング
@traceable(name="async_generate", run_type="llm")
async def async_generate(prompt: str) -> str:
    import asyncio
    await asyncio.sleep(1)
    return "response"

5.6 RunTree API（低レベル API）

より細かい制御が必要な場合、RunTree API を直接使用する：

from langsmith.run_trees import RunTree
import datetime

# ルート Run の作成
root_run = RunTree(
    name="my_pipeline",
    run_type="chain",
    inputs={"question": "What is LangSmith?"},
    project_name="my-project",
    tags=["manual-trace"],
    extra={
        "metadata": {
            "user_id": "user-123",
            "session_id": "session-456"
        }
    }
)

# 子 Run の作成
child_run = root_run.create_child(
    name="llm_call",
    run_type="llm",
    inputs={
        "messages": [
            {"role": "user", "content": "What is LangSmith?"}
        ]
    }
)

# 子 Run の完了
child_run.end(
    outputs={
        "message": {"role": "assistant", "content": "LangSmith is..."},
    },
    extra={
        "token_usage": {
            "prompt_tokens": 100,
            "completion_tokens": 50,
            "total_tokens": 150
        }
    }
)
child_run.post()

# ルート Run の完了
root_run.end(outputs={"answer": "LangSmith is..."})
root_run.post()

5.7 分散トレーシング

マイクロサービスアーキテクチャで複数のサービスにまたがるトレースを連結する：

from langsmith.run_helpers import get_current_run_tree, traceable
import httpx

# サービス A: トレースヘッダーの取得と転送
@traceable(name="service_a_handler")
def service_a_handler(request_data: dict) -> dict:
    run_tree = get_current_run_tree()

    # 分散トレーシング用ヘッダーの取得
    headers = run_tree.to_headers() if run_tree else {}
    # headers には以下が含まれる:
    # {
    #     "langsmith-trace": "<serialized run tree>",
    #     "baggage": "...",  # W3C Baggage ヘッダー
    # }

    # サービス B への呼び出し
    response = httpx.post(
        "http://service-b/api/process",
        json=request_data,
        headers=headers
    )
    return response.json()

# サービス B: トレースヘッダーの受信と継続
from langsmith.run_helpers import RunTree

def service_b_handler(request):
    # リクエストヘッダーからトレースコンテキストを復元
    parent_run = RunTree.from_headers(
        dict(request.headers),
        # オプション: 追加のメタデータ
        name="service_b_processing",
        run_type="chain"
    )

    with parent_run:
        # この Run はサービス A のトレースの子として記録される
        result = process_data(request.json())
        parent_run.end(outputs={"result": result})

    return {"result": result}

5.8 トークンとコスト追跡

from langsmith import traceable

# LLM 呼び出しのトークン使用量は自動追跡される
# OpenAI, Anthropic 等の主要プロバイダに対応

# カスタムコスト追跡
@traceable(
    name="custom_llm_call",
    run_type="llm",
    metadata={
        "ls_model_name": "gpt-4-turbo",  # モデル名（コスト計算に使用）
        "ls_provider": "openai",           # プロバイダ名
        "ls_model_type": "chat",           # モデルタイプ
    }
)
def custom_llm_call(prompt: str) -> dict:
    # LLM 呼び出しロジック
    result = call_my_llm(prompt)

    # トークン使用量のメタデータを返す
    return {
        "output": result["text"],
        "usage_metadata": {
            "input_tokens": result["prompt_tokens"],
            "output_tokens": result["completion_tokens"],
            "total_tokens": result["total_tokens"]
        }
    }

5.9 トレーシングのベストプラクティス

import os
from langsmith import traceable, Client

# 1. サンプリングレートの設定（本番環境向け）
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"  # 10% のリクエストのみトレース

# 2. メタデータを活用した検索性の向上
@traceable(
    name="chat_endpoint",
    tags=["production", "chat-v2"],
    metadata={
        "user_tier": "premium",
        "feature_flag": "new_model_enabled"
    }
)
def chat_endpoint(user_id: str, message: str) -> str:
    pass

# 3. Thread ID を活用した会話追跡
from langsmith.run_helpers import get_current_run_tree

@traceable(name="conversation_turn")
def conversation_turn(session_id: str, message: str) -> str:
    run_tree = get_current_run_tree()
    if run_tree:
        # Thread ID の設定で会話をグルーピング
        run_tree.add_metadata({"thread_id": session_id})
    return generate_response(message)

# 4. エラーハンドリングとトレーシング
@traceable(name="safe_generate")
def safe_generate(prompt: str) -> str:
    try:
        return call_llm(prompt)
    except Exception as e:
        # エラーは自動的にトレースに記録される
        raise  # 例外を再スローしてトレースに記録

# 5. バッチトレーシングの最適化
os.environ["LANGSMITH_BATCH_SIZE"] = "100"     # バッチサイズ
os.environ["LANGSMITH_BATCH_TIMEOUT"] = "5"     # タイムアウト（秒）

6. 評価（Evaluation）

6.1 評価ワークフロー

LangSmith の評価システムは、LLM アプリケーションの品質を体系的にテストするためのフレームワークを提供する。

+-------------------------------------------------------------------+
|                    評価ワークフロー                                  |
|                                                                     |
|  1. データセット準備                                                |
|  +------------------+                                               |
|  | Dataset          |                                               |
|  | - Examples       |                                               |
|  |   (input/output) |                                               |
|  +--------+---------+                                               |
|           |                                                         |
|  2. ターゲット関数の実行                                            |
|           |                                                         |
|  +--------v---------+                                               |
|  | Target Function  |  データセットの各 Example に対して実行         |
|  | (テスト対象)     |                                               |
|  +--------+---------+                                               |
|           |                                                         |
|  3. 評価器による採点                                                |
|           |                                                         |
|  +--------v---------+  +------------------+  +------------------+   |
|  | Heuristic        |  | LLM-as-Judge    |  | Human            |   |
|  | Evaluators       |  | Evaluators      |  | Evaluators       |   |
|  | (ルールベース)   |  | (LLM による判定) |  | (人間による評価)  |   |
|  +--------+---------+  +--------+---------+  +--------+---------+   |
|           |                      |                      |           |
|  4. 結果の記録                                                      |
|           +----------------------+----------------------+           |
|           |                                                         |
|  +--------v---------+                                               |
|  | Experiment       |                                               |
|  | Results          |                                               |
|  | - Scores         |                                               |
|  | - Feedback       |                                               |
|  +------------------+                                               |
+-------------------------------------------------------------------+

6.2 3つの評価タイプ

6.2.1 ヒューリスティック評価（ルールベース）

from langsmith.evaluation import evaluate

# 文字列完全一致
def exact_match(run, example) -> dict:
    """出力が期待値と完全に一致するか"""
    predicted = run.outputs.get("output", "")
    expected = example.outputs.get("answer", "")
    return {
        "key": "exact_match",
        "score": 1.0 if predicted.strip() == expected.strip() else 0.0
    }

# 文字列包含チェック
def contains_keyword(run, example) -> dict:
    """出力に必要なキーワードが含まれるか"""
    predicted = run.outputs.get("output", "")
    keywords = example.outputs.get("required_keywords", [])
    found = sum(1 for kw in keywords if kw.lower() in predicted.lower())
    return {
        "key": "keyword_coverage",
        "score": found / len(keywords) if keywords else 1.0
    }

# JSON 構造の検証
def valid_json_output(run, example) -> dict:
    """出力が有効な JSON であるか"""
    import json
    try:
        json.loads(run.outputs.get("output", ""))
        return {"key": "valid_json", "score": 1.0}
    except json.JSONDecodeError:
        return {"key": "valid_json", "score": 0.0}

# レイテンシチェック
def latency_check(run, example) -> dict:
    """レスポンス時間が閾値以下か"""
    latency = (run.end_time - run.start_time).total_seconds()
    threshold = 5.0  # 5秒以内
    return {
        "key": "latency_ok",
        "score": 1.0 if latency <= threshold else 0.0,
        "comment": f"Latency: {latency:.2f}s (threshold: {threshold}s)"
    }

6.2.2 LLM-as-Judge 評価

from langsmith.evaluation import evaluate, LangChainStringEvaluator

# LangChain 組み込みの LLM-as-Judge 評価器
correctness_evaluator = LangChainStringEvaluator(
    "labeled_criteria",
    config={
        "criteria": {
            "correctness": (
                "Is the submission correct, accurate, and factual? "
                "Compare with the reference answer."
            )
        },
        "llm": ChatOpenAI(model="gpt-4-turbo", temperature=0),
    },
    prepare_data=lambda run, example: {
        "prediction": run.outputs["output"],
        "reference": example.outputs["answer"],
        "input": example.inputs["question"],
    },
)

# カスタム LLM-as-Judge 評価器
from langsmith import traceable
from openai import OpenAI

judge_client = OpenAI()

def llm_judge_helpfulness(run, example) -> dict:
    """LLM による回答の有用性評価"""
    response = judge_client.chat.completions.create(
        model="gpt-4-turbo",
        messages=[
            {
                "role": "system",
                "content": """You are an expert evaluator. Rate the helpfulness 
                of the AI response on a scale of 0.0 to 1.0.
                
                Criteria:
                - 1.0: Extremely helpful, addresses the question completely
                - 0.7: Helpful, addresses most aspects
                - 0.4: Partially helpful, misses key points
                - 0.0: Not helpful at all
                
                Respond with JSON: {"score": <float>, "reasoning": "<string>"}"""
            },
            {
                "role": "user",
                "content": f"""Question: {example.inputs['question']}
                
Expected Answer: {example.outputs.get('answer', 'N/A')}

AI Response: {run.outputs['output']}"""
            }
        ],
        temperature=0,
        response_format={"type": "json_object"}
    )

    import json
    result = json.loads(response.choices[0].message.content)
    return {
        "key": "helpfulness",
        "score": result["score"],
        "comment": result["reasoning"]
    }

6.2.3 ヒューマン評価

ヒューマン評価は LangSmith UI のアノテーションキュー機能を通じて行われる（セクション 10 で詳述）。

6.3 evaluate() 関数

from langsmith.evaluation import evaluate
from langsmith import Client

client = Client()

# ターゲット関数（テスト対象のアプリケーション）
def my_app(inputs: dict) -> dict:
    """テスト対象のアプリケーション"""
    question = inputs["question"]
    # LLM 呼び出しやパイプラインの実行
    answer = generate_answer(question)
    return {"output": answer}

# 評価の実行
results = evaluate(
    my_app,                           # ターゲット関数
    data="my-evaluation-dataset",     # データセット名または ID
    evaluators=[                      # 評価器のリスト
        exact_match,
        contains_keyword,
        llm_judge_helpfulness,
        correctness_evaluator,
    ],
    experiment_prefix="v2-gpt4-turbo",  # 実験名のプレフィックス
    max_concurrency=4,                   # 並列実行数
    metadata={                           # 実験メタデータ
        "model": "gpt-4-turbo",
        "temperature": 0.7,
        "prompt_version": "v2"
    },
    num_repetitions=1,                  # 各 Example の繰り返し回数
)

# 結果の確認
print(f"Experiment URL: {results.experiment_url}")
for result in results:
    print(f"  Example: {result['example'].inputs}")
    print(f"  Output: {result['run'].outputs}")
    for feedback in result['evaluation_results']['results']:
        print(f"  {feedback.key}: {feedback.score}")

6.4 非同期評価

import asyncio
from langsmith.evaluation import aevaluate

# 非同期ターゲット関数
async def async_app(inputs: dict) -> dict:
    """非同期テスト対象アプリケーション"""
    question = inputs["question"]
    answer = await async_generate_answer(question)
    return {"output": answer}

# 非同期評価器
async def async_evaluator(run, example) -> dict:
    """非同期評価器"""
    # 非同期 LLM 呼び出し等
    score = await async_judge(run.outputs, example.outputs)
    return {"key": "async_quality", "score": score}

# 非同期評価の実行
async def run_evaluation():
    results = await aevaluate(
        async_app,
        data="my-evaluation-dataset",
        evaluators=[async_evaluator],
        experiment_prefix="async-eval",
        max_concurrency=8,
    )
    return results

# 実行
results = asyncio.run(run_evaluation())

6.5 実験の比較

from langsmith import Client

client = Client()

# 複数の実験結果を比較
experiments = client.list_experiments(
    project_name="my-project"
)

# 比較用テーブルの取得
comparison = client.get_comparative_experiments(
    experiment_ids=["exp-id-1", "exp-id-2", "exp-id-3"]
)

# プログラム的な比較
for experiment in experiments:
    print(f"Experiment: {experiment.name}")
    print(f"  Created: {experiment.created_at}")

    # 各実験の集約メトリクス
    results = client.get_test_results(experiment_id=experiment.id)
    for metric_name, metric_value in results.aggregate_metrics.items():
        print(f"  {metric_name}: {metric_value}")

6.6 pytest との統合

# test_my_app.py
import pytest
from langsmith import unit

# @unit デコレータによるテスト
@unit
def test_answer_quality():
    """LangSmith の @unit テスト"""
    result = my_app({"question": "What is the capital of France?"})
    assert "Paris" in result["output"]
    # このテストは LangSmith に実験として記録される

@unit(
    output_keys=["output"],  # 出力キーの指定
)
def test_json_output():
    """JSON 出力のテスト"""
    import json
    result = my_app({"question": "List 3 countries in JSON"})
    parsed = json.loads(result["output"])
    assert isinstance(parsed, list)
    assert len(parsed) == 3

# pytest の実行
# LANGSMITH_TEST_SUITE="my-test-suite" pytest test_my_app.py -v

7. データセット管理

7.1 データセットの作成

from langsmith import Client

client = Client()

# 方法1: プログラムによる作成
dataset = client.create_dataset(
    dataset_name="qa-evaluation-v1",
    description="Q&A evaluation dataset for chatbot v1",
    data_type="kv",  # "kv" (key-value) or "llm" or "chat"
)

# Example の追加（個別）
client.create_example(
    inputs={"question": "What is LangSmith?"},
    outputs={"answer": "LangSmith is a platform for LLM application lifecycle management."},
    dataset_id=dataset.id,
    metadata={"difficulty": "easy", "category": "general"},
)

# Example の一括追加
examples = [
    {
        "inputs": {"question": "How does tracing work in LangSmith?"},
        "outputs": {"answer": "Tracing records each step of LLM application execution..."},
        "metadata": {"difficulty": "medium", "category": "tracing"},
    },
    {
        "inputs": {"question": "What evaluation types does LangSmith support?"},
        "outputs": {"answer": "LangSmith supports heuristic, LLM-as-Judge, and human evaluation."},
        "metadata": {"difficulty": "medium", "category": "evaluation"},
    },
]

client.create_examples(
    inputs=[e["inputs"] for e in examples],
    outputs=[e["outputs"] for e in examples],
    metadata=[e["metadata"] for e in examples],
    dataset_id=dataset.id,
)

# 方法2: CSV からのアップロード
import csv
client.upload_csv(
    csv_file="evaluation_data.csv",
    input_keys=["question"],
    output_keys=["answer"],
    dataset_name="qa-from-csv",
    description="Imported from CSV",
)

# 方法3: トレースからデータセットへの追加
# LangSmith UI で特定のトレースを選択し「Add to Dataset」を使用
# または API で:
client.create_example(
    inputs={"question": "specific question from production"},
    outputs={"answer": "expected answer"},
    dataset_id=dataset.id,
    source_run_id="run-id-from-trace",  # トレースとの紐付け
)

# 方法4: Pandas DataFrame から
import pandas as pd

df = pd.DataFrame({
    "question": ["Q1", "Q2", "Q3"],
    "answer": ["A1", "A2", "A3"],
    "category": ["cat1", "cat2", "cat1"],
})

client.upload_dataframe(
    df=df,
    input_keys=["question"],
    output_keys=["answer"],
    dataset_name="qa-from-dataframe",
)

7.2 バージョニングとタグ

from langsmith import Client
from datetime import datetime

client = Client()

# データセットのバージョンをタグで管理
# タグはデータセットの特定時点のスナップショットを表す

# 現在の状態にタグを付ける
client.create_dataset_tag(
    dataset_name="qa-evaluation-v1",
    tag="v1.0-baseline",
)

# Example を追加・修正した後、新しいタグを付ける
client.create_example(
    inputs={"question": "New question added after v1.0"},
    outputs={"answer": "Answer for the new question"},
    dataset_id=dataset.id,
)

client.create_dataset_tag(
    dataset_name="qa-evaluation-v1",
    tag="v1.1-expanded",
)

# 特定のタグ（バージョン）のデータセットを使って評価
results = evaluate(
    my_app,
    data=client.read_dataset(dataset_name="qa-evaluation-v1"),
    evaluators=[exact_match],
    experiment_prefix="eval-v1.0",
    dataset_version="v1.0-baseline",  # 特定バージョンを指定
)

7.3 Splits（データ分割）

from langsmith import Client

client = Client()

# Example に split を設定（train/test/validation など）
client.create_example(
    inputs={"question": "Training question 1"},
    outputs={"answer": "Training answer 1"},
    dataset_id=dataset.id,
    split="train",
)

client.create_example(
    inputs={"question": "Test question 1"},
    outputs={"answer": "Test answer 1"},
    dataset_id=dataset.id,
    split="test",
)

# 一括で split を設定
client.create_examples(
    inputs=[{"question": f"Q{i}"} for i in range(10)],
    outputs=[{"answer": f"A{i}"} for i in range(10)],
    dataset_id=dataset.id,
    splits=["train"] * 7 + ["test"] * 3,  # 70/30 分割
)

# 特定の split のみで評価
results = evaluate(
    my_app,
    data="qa-evaluation-v1",
    evaluators=[exact_match],
    experiment_prefix="eval-test-split",
    # splits パラメータで特定の split のみを使用
)

7.4 メタデータフィルタリング

from langsmith import Client

client = Client()

# メタデータ付きの Example
client.create_example(
    inputs={"question": "Complex math question"},
    outputs={"answer": "42"},
    dataset_id=dataset.id,
    metadata={
        "difficulty": "hard",
        "category": "math",
        "source": "textbook",
        "language": "en",
    },
)

# メタデータでフィルタリングして Example を取得
examples = list(client.list_examples(
    dataset_name="qa-evaluation-v1",
    metadata={"difficulty": "hard"},  # 難しい問題のみ
))

# カテゴリ別にフィルタリング
math_examples = list(client.list_examples(
    dataset_name="qa-evaluation-v1",
    metadata={"category": "math"},
))

print(f"Hard examples: {len(examples)}")
print(f"Math examples: {len(math_examples)}")

8. モニタリングとダッシュボード

8.1 プリビルトメトリクス

LangSmith は以下のプリビルトメトリクスを提供する：

メトリクス	説明	用途
Trace Count	トレース数の時系列	トラフィック量の把握
Latency (P50/P90/P99)	レスポンス時間のパーセンタイル	パフォーマンス監視
Token Usage	トークン使用量（入力/出力/合計）	コスト管理
Error Rate	エラー発生率	信頼性監視
Cost	API コスト（モデル別）	予算管理
Feedback Scores	評価スコアの集約	品質監視
First Token Latency	最初のトークンまでの時間	ユーザー体験監視
Streaming Duration	ストリーミング完了時間	ストリーミング性能

8.2 カスタムダッシュボード

from langsmith import Client

client = Client()

# ダッシュボードの作成はUI経由が推奨されるが、
# API でもメトリクスデータを取得可能

# プロジェクトの統計情報取得
stats = client.get_run_stats(
    project_name="my-production-app",
    run_type="llm",
    start_time="2024-01-01T00:00:00Z",
    end_time="2024-01-31T23:59:59Z",
)

# トレースの検索とフィルタリング
runs = list(client.list_runs(
    project_name="my-production-app",
    filter='and(eq(status, "error"), gt(latency, 5))',  # エラーかつ5秒超
    start_time="2024-01-01T00:00:00Z",
    limit=100,
))

# タグベースのフィルタリング
production_runs = list(client.list_runs(
    project_name="my-production-app",
    filter='has(tags, "production")',
    limit=50,
))

# メタデータベースのフィルタリング
premium_runs = list(client.list_runs(
    project_name="my-production-app",
    filter='eq(metadata_key, "user_tier", "premium")',
    limit=50,
))

8.3 コスト追跡

+------------------------------------------------------------------+
|                    コスト追跡ダッシュボード                         |
|                                                                    |
|  期間: 2024-01                                                     |
|  +--------------------------------------------------------------+ |
|  |  総コスト: $1,234.56                                          | |
|  |                                                                | |
|  |  モデル別内訳:                                                 | |
|  |  +--------------------------+----------+-----+----------+     | |
|  |  | モデル                   | トークン  | 割合 | コスト   |    | |
|  |  +--------------------------+----------+-----+----------+     | |
|  |  | gpt-4-turbo              | 2.5M     | 65% | $802.46  |    | |
|  |  | gpt-3.5-turbo            | 8.1M     | 20% | $246.91  |    | |
|  |  | claude-3-sonnet           | 1.2M     | 12% | $148.15  |    | |
|  |  | text-embedding-ada-002   | 15.3M    |  3% | $37.04   |    | |
|  |  +--------------------------+----------+-----+----------+     | |
|  |                                                                | |
|  |  日別推移:                                                     | |
|  |  $80 |    *                                                    | |
|  |  $60 |   * *  *                                                | |
|  |  $40 |  *   ** **  *                                           | |
|  |  $20 | *        * ** ***                                       | |
|  |   $0 +---+---+---+---+---+                                    | |
|  |       1   7   14  21  28                                       | |
|  +--------------------------------------------------------------+ |
+------------------------------------------------------------------+

8.4 Insights Agent

LangSmith には AI を活用した分析機能（Insights Agent）が搭載されている。自然言語でメトリクスに関する質問ができる：

「先週のエラー率はどうなっている？」
「最もコストの高いトレースパターンは？」
「レイテンシが悪化した原因は？」

8.5 アラート設定

# アラートは主に LangSmith UI で設定するが、
# Webhook を通じて外部システムとの連携が可能

# Webhook 受信の例（FastAPI）
from fastapi import FastAPI, Request

app = FastAPI()

@app.post("/langsmith-webhook")
async def handle_langsmith_alert(request: Request):
    """LangSmith アラートの Webhook ハンドラ"""
    payload = await request.json()

    alert_type = payload.get("type")
    project = payload.get("project_name")
    metric = payload.get("metric")
    value = payload.get("value")
    threshold = payload.get("threshold")

    # Slack 通知
    await send_slack_notification(
        f"LangSmith Alert: {alert_type}\n"
        f"Project: {project}\n"
        f"Metric: {metric} = {value} (threshold: {threshold})"
    )

    return {"status": "ok"}

9. プロンプト管理

9.1 Prompt Hub

Prompt Hub は、プロンプトのバージョン管理、共有、デプロイを行うための中央リポジトリである。

+------------------------------------------------------------------+
|                        Prompt Hub                                  |
|                                                                    |
|  +------------------------------------------------------------+  |
|  |  Organization: my-org                                       |  |
|  |                                                              |  |
|  |  +------------------+  +------------------+                  |  |
|  |  | Prompt: qa-bot   |  | Prompt: summary  |  ...            |  |
|  |  | Tags:            |  | Tags:            |                  |  |
|  |  |  - latest        |  |  - latest        |                  |  |
|  |  |  - production    |  |  - production    |                  |  |
|  |  |  - v1.0          |  |  - staging       |                  |  |
|  |  |  - v2.0          |  |  - v1.0          |                  |  |
|  |  |                  |  |                  |                  |  |
|  |  | Commits:         |  | Commits:         |                  |  |
|  |  |  abc123 (latest) |  |  def456 (latest) |                  |  |
|  |  |  xyz789 (v1.0)   |  |  ghi012 (v1.0)   |                  |  |
|  |  +------------------+  +------------------+                  |  |
|  +------------------------------------------------------------+  |
+------------------------------------------------------------------+

9.2 SDK によるプロンプト操作

from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()

# プロンプトの作成とプッシュ
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant specialized in {domain}. "
               "Always respond in {language}."),
    ("human", "{question}")
])

# Prompt Hub にプッシュ
client.push_prompt(
    prompt_identifier="my-org/qa-assistant",  # org/prompt-name 形式
    object=prompt,
    description="General Q&A assistant prompt",
    tags=["production", "v1.0"],
    is_public=False,  # プライベートプロンプト
)

# プロンプトの取得（最新版）
prompt = client.pull_prompt("my-org/qa-assistant")

# 特定のバージョン（コミットハッシュ）を取得
prompt_v1 = client.pull_prompt("my-org/qa-assistant:abc123")

# 特定のタグを取得
prompt_prod = client.pull_prompt("my-org/qa-assistant:production")

# プロンプトの使用
chain = prompt | ChatOpenAI(model="gpt-4-turbo") | StrOutputParser()
result = chain.invoke({
    "domain": "machine learning",
    "language": "Japanese",
    "question": "What is a transformer?"
})

# プロンプトの更新
updated_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert assistant in {domain}. "
               "Respond in {language}. Be concise and accurate."),
    ("human", "{question}")
])

client.push_prompt(
    prompt_identifier="my-org/qa-assistant",
    object=updated_prompt,
    description="Updated: added conciseness instruction",
    tags=["staging"],  # まずステージングタグで
)

# プロンプト一覧の取得
prompts = list(client.list_prompts(
    is_public=False,
    query="qa",  # 名前でフィルタ
))

for p in prompts:
    print(f"{p.repo_handle}: {p.description}")

9.3 Playground

Playground は LangSmith UI 上でプロンプトを対話的にテストする機能である：

リアルタイムテスト: プロンプトの変更をその場で試行
複数モデルの比較: 同じプロンプトを異なるモデルで同時実行
パラメータ調整: temperature、max_tokens 等のパラメータをリアルタイム調整
トレース連携: テスト結果が自動的にトレースとして記録

9.4 Polly（プロンプト最適化）

Polly は LangSmith のプロンプト最適化アシスタントである：

プロンプトの自動改善提案
A/B テスト結果に基づくプロンプト最適化
Few-shot Example の自動選択
プロンプトのパフォーマンス分析

10. アノテーションとヒューマンフィードバック

10.1 アノテーションキューの種類

キュータイプ	説明	ユースケース
Single-Run Queue	個別のトレースを1つずつ評価	品質チェック、エラー分析
Pairwise Queue	2つの出力を並べて比較評価	モデル比較、A/B テスト

10.2 アノテーションワークフロー

+------------------------------------------------------------------+
|                 アノテーションワークフロー                          |
|                                                                    |
|  1. キューの作成                                                   |
|  +------------------------+                                        |
|  | Annotation Queue       |                                        |
|  | - Name: "qa-review"    |                                        |
|  | - Type: Single-Run     |                                        |
|  | - Criteria:            |                                        |
|  |   * Correctness (1-5)  |                                        |
|  |   * Helpfulness (1-5)  |                                        |
|  |   * Safety (pass/fail) |                                        |
|  +------------------------+                                        |
|           |                                                        |
|  2. Run の追加                                                     |
|  +------------------------+                                        |
|  | フィルタ条件:          |                                        |
|  | - Project: production  |                                        |
|  | - 日付: 直近7日        |                                        |
|  | - サンプリング: 10%    |                                        |
|  +------------------------+                                        |
|           |                                                        |
|  3. レビュアーの割り当て                                           |
|  +------------------------+                                        |
|  | Reviewers:             |                                        |
|  | - alice@example.com    |                                        |
|  | - bob@example.com      |                                        |
|  | Assignment: Round-robin|                                        |
|  +------------------------+                                        |
|           |                                                        |
|  4. レビュー実施                                                   |
|  +------------------------+                                        |
|  | レビュアーが UI で     |                                        |
|  | 各 Run を評価          |                                        |
|  | スコアとコメントを付与 |                                        |
|  +------------------------+                                        |
|           |                                                        |
|  5. フィードバックの集約                                           |
|  +------------------------+                                        |
|  | ダッシュボードで       |                                        |
|  | スコア分布を確認       |                                        |
|  | 問題パターンを特定     |                                        |
|  +------------------------+                                        |
+------------------------------------------------------------------+

10.3 API によるアノテーション管理

from langsmith import Client

client = Client()

# アノテーションキューの作成
queue = client.create_annotation_queue(
    name="qa-quality-review",
    description="Weekly quality review for QA bot",
)

# キューに Run を追加
runs = client.list_runs(
    project_name="qa-bot-production",
    filter='and(eq(run_type, "chain"), gt(latency, 3))',
    start_time="2024-01-08T00:00:00Z",
    end_time="2024-01-15T00:00:00Z",
    limit=50,
)

for run in runs:
    client.add_runs_to_annotation_queue(
        queue_id=queue.id,
        run_ids=[run.id],
    )

# プログラムによるフィードバック送信
client.create_feedback(
    run_id="run-id-123",
    key="correctness",
    score=0.8,
    comment="Mostly correct but missed one edge case",
    source_type="human",
)

# 複数のフィードバックを一括送信
feedbacks = [
    {"run_id": "run-1", "key": "quality", "score": 0.9},
    {"run_id": "run-2", "key": "quality", "score": 0.3},
    {"run_id": "run-3", "key": "quality", "score": 0.7},
]

for fb in feedbacks:
    client.create_feedback(**fb)

# フィードバックの取得
feedbacks = list(client.list_feedback(
    run_ids=["run-id-123"],
))

for fb in feedbacks:
    print(f"{fb.key}: {fb.score} - {fb.comment}")

11. オンライン評価

11.1 セットアップ

オンライン評価は、本番環境のトレースに対してリアルタイムで自動評価を実行する機能である。

+------------------------------------------------------------------+
|                    オンライン評価フロー                              |
|                                                                    |
|  本番トレース                                                      |
|  +----------+                                                      |
|  | Run A    | ----+                                                |
|  +----------+     |                                                |
|  | Run B    | ----+----> ルールエンジン                             |
|  +----------+     |      (フィルタリング)                           |
|  | Run C    | ----+          |                                     |
|  +----------+               |                                     |
|                    +--------v---------+                            |
|                    | 条件に一致した    |                            |
|                    | Run のみを評価    |                            |
|                    +--------+---------+                            |
|                             |                                     |
|              +--------------+--------------+                      |
|              |              |              |                      |
|     +--------v---+  +------v-------+  +---v---------+            |
|     | Evaluator 1|  | Evaluator 2 |  | Evaluator 3 |            |
|     | (heuristic)|  | (LLM-judge) |  | (custom)    |            |
|     +--------+---+  +------+-------+  +---+---------+            |
|              |              |              |                      |
|              +--------------+--------------+                      |
|                             |                                     |
|                    +--------v---------+                            |
|                    | フィードバック    |                            |
|                    | として保存        |                            |
|                    +------------------+                            |
+------------------------------------------------------------------+

11.2 オンライン評価の種類

from langsmith import Client

client = Client()

# 1. LLM-as-Judge オンライン評価
# UI で設定する方法が推奨されるが、API でも設定可能

# 2. ヒューリスティック評価（コードベース）
# オンライン評価器の定義
def online_sentiment_check(run) -> dict:
    """出力のセンチメント分析"""
    output = run.outputs.get("output", "")
    negative_words = ["error", "fail", "sorry", "cannot", "unable"]
    negative_count = sum(1 for w in negative_words if w in output.lower())
    return {
        "key": "sentiment_positive",
        "score": max(0, 1.0 - negative_count * 0.2),
    }

def online_length_check(run) -> dict:
    """出力の長さチェック"""
    output = run.outputs.get("output", "")
    word_count = len(output.split())
    # 50〜500語が適切な範囲
    if 50 <= word_count <= 500:
        score = 1.0
    elif word_count < 50:
        score = word_count / 50
    else:
        score = max(0, 1.0 - (word_count - 500) / 500)
    return {
        "key": "appropriate_length",
        "score": score,
        "comment": f"Word count: {word_count}"
    }

11.3 自動化ルール

from langsmith import Client

client = Client()

# 自動化ルールの設定例
# 注: 以下はコンセプトの説明。実際の設定は主に UI で行う

# ルール1: エラー率が高い場合にアラート
# - 条件: error_rate > 5% (15分間の移動平均)
# - アクション: Slack 通知 + アノテーションキューに追加

# ルール2: 品質スコアが低下した場合
# - 条件: online_eval_score < 0.5
# - アクション: 該当 Run をデータセットに追加

# ルール3: コスト異常検知
# - 条件: single_run_cost > $1.00
# - アクション: アラート通知

# プログラムによる条件付きフィードバック
runs = client.list_runs(
    project_name="production-app",
    filter='and(eq(status, "success"), gt(latency, 10))',
    start_time="2024-01-15T00:00:00Z",
)

for run in runs:
    # 高レイテンシの Run にフラグを立てる
    client.create_feedback(
        run_id=run.id,
        key="high_latency",
        score=0.0,
        comment=f"Latency: {(run.end_time - run.start_time).total_seconds():.1f}s",
        source_type="automation",
    )

12. SDK と API

12.1 Python SDK

from langsmith import Client

# クライアントの初期化
client = Client(
    api_key="lsv2_pt_xxxxxxxx",          # API キー
    api_url="https://api.smith.langchain.com",  # エンドポイント
)

# === 主要メソッドリファレンス ===

# --- プロジェクト管理 ---
project = client.create_project("my-project", description="My project")
projects = list(client.list_projects())
client.delete_project(project_name="old-project")

# --- Run 操作 ---
# Run の一覧取得
runs = list(client.list_runs(
    project_name="my-project",
    run_type="llm",           # "chain", "llm", "tool", "retriever"
    filter='eq(status, "success")',
    start_time="2024-01-01T00:00:00Z",
    end_time="2024-01-31T23:59:59Z",
    limit=100,
    select=["id", "name", "inputs", "outputs", "latency", "total_tokens"],
))

# 特定の Run の取得
run = client.read_run(run_id="run-uuid-here")

# Run の共有（公開リンク生成）
share_url = client.share_run(run_id="run-uuid-here")

# --- データセット操作 ---
dataset = client.create_dataset("my-dataset")
client.create_example(
    inputs={"input": "test"},
    outputs={"output": "expected"},
    dataset_id=dataset.id,
)
examples = list(client.list_examples(dataset_name="my-dataset"))
client.delete_dataset(dataset_id=dataset.id)

# --- フィードバック操作 ---
client.create_feedback(
    run_id="run-id",
    key="quality",
    score=0.9,
    comment="Excellent response"
)
feedbacks = list(client.list_feedback(run_ids=["run-id"]))

# --- プロンプト操作 ---
client.push_prompt("my-org/prompt-name", object=prompt_template)
prompt = client.pull_prompt("my-org/prompt-name")
prompts = list(client.list_prompts())

12.2 TypeScript SDK

import { Client } from "langsmith";
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import OpenAI from "openai";

// クライアントの初期化
const client = new Client({
  apiKey: "lsv2_pt_xxxxxxxx",
  apiUrl: "https://api.smith.langchain.com",
});

// OpenAI ラッパーによる自動トレーシング
const openai = wrapOpenAI(new OpenAI());

// traceable デコレータ（TypeScript 版）
const generateResponse = traceable(
  async (question: string): Promise<string> => {
    const response = await openai.chat.completions.create({
      model: "gpt-4-turbo",
      messages: [{ role: "user", content: question }],
    });
    return response.choices[0].message.content ?? "";
  },
  { name: "generate_response", run_type: "chain" }
);

// データセット操作
const dataset = await client.createDataset("my-dataset-ts");
await client.createExample({
  inputs: { question: "What is TypeScript?" },
  outputs: { answer: "TypeScript is a typed superset of JavaScript." },
  datasetId: dataset.id,
});

// Run の取得
const runs = client.listRuns({
  projectName: "my-project",
  filter: 'eq(status, "success")',
  limit: 50,
});

for await (const run of runs) {
  console.log(`${run.name}: ${run.status}`);
}

// フィードバックの送信
await client.createFeedback(runId, "quality", {
  score: 0.9,
  comment: "Good response",
});

12.3 REST API

# API キーによる認証
# ヘッダー: x-api-key: lsv2_pt_xxxxxxxx

# --- Run の取得 ---
curl -X GET "https://api.smith.langchain.com/api/v1/runs?project_name=my-project&limit=10" \
  -H "x-api-key: lsv2_pt_xxxxxxxx" \
  -H "Content-Type: application/json"

# --- Run の作成（トレースの送信） ---
curl -X POST "https://api.smith.langchain.com/api/v1/runs" \
  -H "x-api-key: lsv2_pt_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-run",
    "run_type": "chain",
    "inputs": {"question": "What is LangSmith?"},
    "outputs": {"answer": "LangSmith is..."},
    "start_time": "2024-01-15T10:00:00Z",
    "end_time": "2024-01-15T10:00:02Z",
    "project_name": "my-project"
  }'

# --- バッチ Run 送信 ---
curl -X POST "https://api.smith.langchain.com/api/v1/runs/batch" \
  -H "x-api-key: lsv2_pt_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -H "Content-Encoding: gzip" \
  --data-binary @runs_batch.json.gz

# --- データセットの作成 ---
curl -X POST "https://api.smith.langchain.com/api/v1/datasets" \
  -H "x-api-key: lsv2_pt_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-dataset",
    "description": "Evaluation dataset",
    "data_type": "kv"
  }'

# --- フィードバックの送信 ---
curl -X POST "https://api.smith.langchain.com/api/v1/feedback" \
  -H "x-api-key: lsv2_pt_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "run_id": "run-uuid",
    "key": "quality",
    "score": 0.9,
    "comment": "Good quality"
  }'

12.4 pytest プラグイン

# conftest.py
import pytest

# LANGSMITH_API_KEY 環境変数が設定されていれば自動的に LangSmith に接続

# pytest.ini または pyproject.toml
# [tool.pytest.ini_options]
# env = [
#     "LANGSMITH_TRACING=true",
#     "LANGSMITH_API_KEY=lsv2_pt_xxxxxxxx",
#     "LANGSMITH_TEST_SUITE=my-test-suite",
# ]

# テストファイル
from langsmith import unit, expect

@unit
def test_basic_qa():
    result = my_app({"question": "What is 2+2?"})
    expect(result["output"]).to_contain("4")
    expect(len(result["output"])).to_be_less_than(100)

@unit
def test_with_scoring():
    result = my_app({"question": "Explain quantum computing"})
    # LangSmith にスコアとして記録される
    expect.score(
        result["output"],
        key="completeness",
        scorer=lambda x: 1.0 if len(x) > 100 else 0.5
    )

13. インテグレーション

13.1 ネイティブ LangChain / LangGraph 統合

# LangChain: 環境変数設定のみで自動トレーシング
import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "lsv2_pt_xxxxxxxx"

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# すべての LangChain コンポーネントが自動的にトレースされる
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | parser
)

# LangGraph: ステートマシンも自動トレーシング
from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    next_action: str

graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tool", tool_node)
graph.add_edge("agent", "tool")
graph.add_edge("tool", "agent")

app = graph.compile()
# LangGraph のすべてのステップが自動的にトレースされる
result = app.invoke({"messages": [{"role": "user", "content": "Search for..."}]})

13.2 OpenAI ラッパー

# Python
from langsmith.wrappers import wrap_openai
from openai import OpenAI

# OpenAI クライアントをラップして自動トレーシング
client = wrap_openai(OpenAI())

# 通常の OpenAI API 呼び出しがすべてトレースされる
response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "Hello!"}],
    # LangSmith 固有のメタデータ
    langsmith_extra={
        "project_name": "openai-direct",
        "run_name": "chat-completion",
        "tags": ["production"],
        "metadata": {"user_id": "user-123"},
    }
)

# ストリーミングも対応
stream = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
    langsmith_extra={"run_name": "streaming-chat"},
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

// TypeScript
import { wrapOpenAI } from "langsmith/wrappers";
import OpenAI from "openai";

const openai = wrapOpenAI(new OpenAI());

const response = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [{ role: "user", content: "Hello!" }],
});

13.3 エージェントフレームワークとの統合

# --- CrewAI ---
# CrewAI は LangSmith と自動統合可能
os.environ["LANGSMITH_TRACING"] = "true"
# CrewAI のタスク実行が自動的にトレースされる

# --- Instructor (Pydantic 構造化出力) ---
from langsmith.wrappers import wrap_openai
import instructor

client = wrap_openai(OpenAI())
instructor_client = instructor.from_openai(client)

# Instructor を通じた呼び出しも自動トレーシング
response = instructor_client.chat.completions.create(
    model="gpt-4-turbo",
    response_model=MyPydanticModel,
    messages=[{"role": "user", "content": "Extract data..."}],
)

13.4 OpenTelemetry 統合

# OpenTelemetry エクスポーターとしての LangSmith
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# LangSmith は OpenTelemetry スパンの受信に対応
# OTEL エンドポイントの設定
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://api.smith.langchain.com/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = "x-api-key=lsv2_pt_xxxxxxxx"

# OpenTelemetry のスパンが LangSmith のトレースとして表示される

# Vercel AI SDK との統合例
# vercel AI SDK は OpenTelemetry 経由で LangSmith にトレースを送信可能

13.5 その他のインテグレーション

インテグレーション	方式	説明
Anthropic SDK	`wrap_anthropic()`	Anthropic Claude API の自動トレーシング
AWS Bedrock	`@traceable`	Bedrock 呼び出しのカスタムトレーシング
Google Vertex AI	`@traceable`	Vertex AI 呼び出しのカスタムトレーシング
Hugging Face	`@traceable`	Hugging Face モデルのトレーシング
LlamaIndex	ネイティブ	LlamaIndex の自動トレーシングサポート
Vercel AI SDK	OpenTelemetry	OTEL 経由のトレース連携
MLflow	エクスポート	MLflow への実験結果エクスポート

14. デプロイメントオプション

14.1 デプロイメント形態の比較

+------------------------------------------------------------------+
|                 デプロイメントオプション比較                         |
|                                                                    |
|  +-------------------+  +-------------------+  +----------------+ |
|  |    Cloud (SaaS)   |  |   Self-Hosted     |  |    Hybrid      | |
|  |                   |  |                   |  |                | |
|  | LangChain社が     |  | お客様環境で      |  | データはお客   | |
|  | 完全管理          |  | 完全に運用        |  | 様環境、一部   | |
|  |                   |  |                   |  | サービスは     | |
|  | メリット:         |  | メリット:         |  | クラウド       | |
|  | - セットアップ不要 |  | - 完全データ主権  |  |                | |
|  | - 自動スケール    |  | - カスタマイズ自在|  | メリット:      | |
|  | - 自動更新        |  | - ネットワーク    |  | - データ主権   | |
|  |                   |  |   分離可能        |  | - 管理負担軽減 | |
|  | デメリット:       |  |                   |  |                | |
|  | - データが外部    |  | デメリット:       |  | デメリット:    | |
|  | - ネットワーク依存|  | - 運用負荷        |  | - 構成の複雑さ | |
|  |                   |  | - インフラコスト  |  |                | |
|  +-------------------+  +-------------------+  +----------------+ |
+------------------------------------------------------------------+

14.2 セルフホストパターン

14.2.1 Docker Compose（小規模 / 開発環境向け）

# LangSmith セルフホストの起動
# 前提: docker-compose.yml が配置済み（セクション4.4参照）

# 環境変数の設定
export LANGSMITH_LICENSE_KEY="your-license-key"

# 起動
docker compose up -d

# 状態確認
docker compose ps

# ログ確認
docker compose logs -f langsmith-backend

14.2.2 Kubernetes（中〜大規模 / 本番環境向け）

# langsmith-helm-values.yaml
# Helm Chart による Kubernetes デプロイ

global:
  licenseKey: "your-license-key"

backend:
  replicas: 3
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilization: 70

frontend:
  replicas: 2
  resources:
    requests:
      cpu: "250m"
      memory: "512Mi"

queue:
  replicas: 2
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"

clickhouse:
  # 外部 ClickHouse クラスタを使用する場合
  external:
    enabled: true
    host: "clickhouse.internal.example.com"
    port: 8123
    nativePort: 9000
    database: "langsmith"
    user: "langsmith"
    password:
      secretName: "clickhouse-credentials"
      secretKey: "password"

  # 内蔵 ClickHouse を使用する場合
  # internal:
  #   enabled: true
  #   persistence:
  #     size: 100Gi
  #     storageClass: "gp3"

postgres:
  external:
    enabled: true
    host: "postgres.internal.example.com"
    port: 5432
    database: "langsmith"
    user: "langsmith"
    password:
      secretName: "postgres-credentials"
      secretKey: "password"

redis:
  external:
    enabled: true
    host: "redis.internal.example.com"
    port: 6379

blobStorage:
  backend: "s3"
  s3:
    bucket: "langsmith-blobs"
    region: "us-west-2"
    accessKey:
      secretName: "s3-credentials"
      secretKey: "access-key"
    secretKey:
      secretName: "s3-credentials"
      secretKey: "secret-key"

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
  hosts:
    - host: langsmith.internal.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: langsmith-tls
      hosts:
        - langsmith.internal.example.com

auth:
  type: "oauth"
  oauth:
    issuerUrl: "https://accounts.google.com"
    clientId: "your-client-id"
    clientSecret:
      secretName: "oauth-credentials"
      secretKey: "client-secret"
    allowedDomains:
      - "example.com"

# Helm Chart によるデプロイ
helm repo add langsmith https://langchain-ai.github.io/helm/
helm repo update

helm install langsmith langsmith/langsmith \
  -f langsmith-helm-values.yaml \
  -n langsmith \
  --create-namespace

# アップグレード
helm upgrade langsmith langsmith/langsmith \
  -f langsmith-helm-values.yaml \
  -n langsmith

14.2.3 スタンドアロンバイナリ

# 単一バイナリによるデプロイ（テスト/評価用）
curl -LO https://github.com/langchain-ai/langsmith-sdk/releases/latest/download/langsmith-standalone

chmod +x langsmith-standalone

# 起動（組み込みの SQLite + ファイルストレージ）
./langsmith-standalone serve \
  --port 1984 \
  --license-key "your-license-key"

14.3 本番環境の注意事項

+------------------------------------------------------------------+
|              本番環境セルフホスト チェックリスト                     |
|                                                                    |
|  [x] ストレージ                                                    |
|      - ClickHouse: レプリケーション設定済み                        |
|      - PostgreSQL: HA 構成（Patroni / RDS Multi-AZ）              |
|      - Redis: Sentinel / Cluster モード                            |
|      - Blob Storage: S3 / GCS（バケットポリシー設定済み）          |
|                                                                    |
|  [x] ネットワーク                                                  |
|      - TLS 終端設定済み                                            |
|      - 内部通信の暗号化                                            |
|      - ネットワークポリシー設定済み                                |
|      - Ingress/Load Balancer 設定済み                              |
|                                                                    |
|  [x] 認証                                                          |
|      - OAuth/OIDC 設定済み（推奨: Google/Okta/Azure AD）          |
|      - API キーのローテーションポリシー                            |
|                                                                    |
|  [x] 監視                                                          |
|      - Prometheus メトリクスの収集                                  |
|      - ログの集約（ELK / CloudWatch / Datadog）                   |
|      - アラート設定（ディスク使用量、エラーレート等）              |
|                                                                    |
|  [x] バックアップ                                                  |
|      - PostgreSQL の定期バックアップ                                |
|      - ClickHouse のバックアップ                                   |
|      - Blob Storage のバージョニング                               |
|                                                                    |
|  [x] スケーリング                                                  |
|      - Backend API: HPA（CPU/Memory ベース）                      |
|      - Queue Workers: キューサイズベースのスケーリング             |
|      - ClickHouse: シャーディング設計                              |
|                                                                    |
|  [x] データ保持                                                    |
|      - TTL ポリシーの設定                                          |
|      - 古いトレースデータの自動削除                                |
|      - コンプライアンス要件に基づく保持期間                        |
+------------------------------------------------------------------+

15. セキュリティとプライシング

15.1 認証方式

認証方式	説明	推奨環境
API Key	`lsv2_pt_` プレフィックスの API キー	SDK / CI/CD パイプライン
Personal Access Token	ユーザー単位のトークン	スクリプト / 個人開発
OAuth 2.0 / OIDC	SSO 連携（Google, Okta, Azure AD 等）	エンタープライズ Web UI
SAML	SAML 2.0 ベースの SSO	エンタープライズ（要 Enterprise プラン）

# API キーの生成と管理
from langsmith import Client

# Service Key（組織レベル）の使用
client = Client(api_key="lsv2_sk_xxxxxxxx")  # Service Key

# Personal Access Token の使用
client = Client(api_key="lsv2_pt_xxxxxxxx")  # Personal Token

# API キーのスコープ
# - lsv2_pt_*: Personal Access Token（個人の全プロジェクトにアクセス可能）
# - lsv2_sk_*: Service Key（ワークスペース単位、プログラム的アクセス用）

15.2 RBAC（ロールベースアクセス制御）

Organization レベルのロール

ロール	権限
Organization Admin	組織全体の管理権限。メンバー管理、課金管理、全ワークスペースへのアクセス
Organization Member	ワークスペースへの招待を受けることが可能。組織レベルの設定変更不可

Workspace レベルのロール

ロール	権限
Workspace Admin	ワークスペースの設定変更、メンバー管理、全リソースへのフルアクセス
Editor	プロジェクト、データセット、プロンプトの作成・編集。ワークスペース設定の変更不可
Viewer	読み取り専用アクセス。トレースの閲覧、ダッシュボードの参照は可能

+------------------------------------------------------------------+
|                    RBAC 階層構造                                    |
|                                                                    |
|  Organization: "My Company"                                        |
|  +--------------------------------------------------------------+ |
|  |  Org Admin: alice@example.com                                 | |
|  |  Org Member: bob@example.com, charlie@example.com             | |
|  |                                                                | |
|  |  +---------------------------+  +---------------------------+ | |
|  |  | Workspace: "Production"   |  | Workspace: "Development"  | | |
|  |  |                           |  |                           | | |
|  |  | Admin: alice              |  | Admin: alice, bob         | | |
|  |  | Editor: bob               |  | Editor: charlie           | | |
|  |  | Viewer: charlie           |  | Viewer: (all org members) | | |
|  |  |                           |  |                           | | |
|  |  | Projects:                 |  | Projects:                 | | |
|  |  |  - chatbot-prod           |  |  - chatbot-dev            | | |
|  |  |  - search-prod            |  |  - search-dev             | | |
|  |  +---------------------------+  +---------------------------+ | |
|  +--------------------------------------------------------------+ |
+------------------------------------------------------------------+

15.3 ABAC（属性ベースアクセス制御）

ABAC は、リソースの属性に基づいてきめ細かいアクセス制御を行う（Enterprise プランで利用可能）：

# ABAC の概念例
# タグやメタデータに基づくアクセス制御

# 例: "confidential" タグ付きのトレースは特定ロールのみアクセス可能
# 例: 特定のプロジェクトタグに基づくデータセットアクセス制限

# ABAC ポリシーの設定は LangSmith UI の Organization Settings で行う

15.4 監査ログ

# 監査ログは Enterprise プランで利用可能
# 以下のイベントが記録される:

# - ユーザーの認証イベント（ログイン/ログアウト）
# - API キーの作成/削除/ローテーション
# - ワークスペースの作成/削除/設定変更
# - メンバーの追加/削除/ロール変更
# - データセットの作成/削除/アクセス
# - プロンプトの作成/更新/削除
# - データの閲覧/ダウンロードイベント

# 監査ログの API での取得
from langsmith import Client

client = Client()

# 監査ログは Organization Settings > Audit Log から確認
# API での取得も可能（Enterprise プラン）

15.5 プライシング

プラン	価格	主な特徴
Developer	無料	個人開発者向け。5,000 トレース/月、1 ユーザー
Plus	$39/ユーザー/月	チーム向け。トレース無制限（含む基本分）、RBAC
Enterprise	カスタム	大規模チーム向け。SSO/SAML、ABAC、監査ログ、SLA
Enterprise Plus	カスタム	HIPAA 対応、専用インフラ、プレミアムサポート

使用量ベースの追加課金

項目	Developer	Plus	Enterprise
含有トレース数	5,000/月	基本枠含む	カスタム
追加トレース	N/A	従量課金	カスタム
データ保持期間	14日	90日（拡張可能）	カスタム
拡張データ保持	N/A	追加料金	カスタム

15.6 使用量追跡

from langsmith import Client

client = Client()

# 使用量の確認（UI: Settings > Usage）
# API での使用量データ取得

# プロジェクト別のトレース数を確認
projects = list(client.list_projects())
for project in projects:
    runs = client.list_runs(
        project_name=project.name,
        start_time="2024-01-01T00:00:00Z",
        end_time="2024-01-31T23:59:59Z",
    )
    count = sum(1 for _ in runs)
    print(f"{project.name}: {count} traces")

16. 環境変数リファレンス

16.1 必須環境変数

環境変数	説明	デフォルト	例
`LANGSMITH_API_KEY`	API キー	なし	`lsv2_pt_xxxxxxxx`
`LANGSMITH_TRACING`	トレーシングの有効化	`false`	`true`

16.2 オプション環境変数

環境変数	説明	デフォルト	例
`LANGSMITH_ENDPOINT`	API エンドポイント	`https://api.smith.langchain.com`	`https://langsmith.your-domain.com`
`LANGSMITH_PROJECT`	デフォルトプロジェクト名	`default`	`my-project`
`LANGSMITH_TRACING_SAMPLING_RATE`	サンプリングレート (0.0-1.0)	`1.0`	`0.1`
`LANGSMITH_BATCH_SIZE`	バッチ送信サイズ	`100`	`50`
`LANGSMITH_BATCH_TIMEOUT`	バッチ送信タイムアウト(秒)	`5`	`10`
`LANGSMITH_TEST_SUITE`	pytest テストスイート名	なし	`my-test-suite`

16.3 セルフホスト用環境変数

環境変数	説明	例
`LANGSMITH_LICENSE_KEY`	ライセンスキー	`ls_license_xxxxxxxx`
`POSTGRES_DATABASE_URI`	PostgreSQL 接続文字列	`postgres://user:pass@host:5432/db`
`REDIS_DATABASE_URI`	Redis 接続文字列	`redis://host:6379`
`CLICKHOUSE_HOST`	ClickHouse ホスト	`clickhouse.internal`
`CLICKHOUSE_PORT`	ClickHouse HTTP ポート	`8123`
`CLICKHOUSE_NATIVE_PORT`	ClickHouse Native ポート	`9000`
`CLICKHOUSE_DB`	ClickHouse データベース名	`default`
`CLICKHOUSE_USER`	ClickHouse ユーザー	`default`
`CLICKHOUSE_PASSWORD`	ClickHouse パスワード	`password`
`BLOB_STORAGE_BUCKET_NAME`	Blob Storage バケット名	`langsmith-blobs`
`BLOB_STORAGE_API_URL`	Blob Storage エンドポイント	`http://minio:9000`
`BLOB_STORAGE_ACCESS_KEY`	Blob Storage アクセスキー	`minioadmin`
`BLOB_STORAGE_ACCESS_KEY_SECRET`	Blob Storage シークレットキー	`minioadmin`
`AUTH_TYPE`	認証方式	`none`, `oauth`
`API_KEY_SALT`	API キー暗号化用ソルト	`super-secret-salt`
`LOG_LEVEL`	ログレベル	`info`, `debug`, `warning`
`INITIAL_ORG_ADMIN_EMAIL`	初期管理者メールアドレス	`admin@example.com`
`INITIAL_ORG_ADMIN_PASSWORD`	初期管理者パスワード	`admin-password`

16.4 レガシー環境変数

レガシー変数	新しい変数	備考
`LANGCHAIN_TRACING_V2`	`LANGSMITH_TRACING`	後方互換性のため引き続き動作
`LANGCHAIN_API_KEY`	`LANGSMITH_API_KEY`	後方互換性のため引き続き動作
`LANGCHAIN_ENDPOINT`	`LANGSMITH_ENDPOINT`	後方互換性のため引き続き動作
`LANGCHAIN_PROJECT`	`LANGSMITH_PROJECT`	後方互換性のため引き続き動作

16.5 環境変数の設定例

# === 開発環境 ===
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_pt_dev_xxxxxxxx"
export LANGSMITH_PROJECT="my-app-dev"
export LANGSMITH_ENDPOINT="https://api.smith.langchain.com"

# === ステージング環境 ===
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_sk_staging_xxxxxxxx"
export LANGSMITH_PROJECT="my-app-staging"
export LANGSMITH_TRACING_SAMPLING_RATE="0.5"

# === 本番環境 ===
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_sk_prod_xxxxxxxx"
export LANGSMITH_PROJECT="my-app-production"
export LANGSMITH_TRACING_SAMPLING_RATE="0.1"
export LANGSMITH_BATCH_SIZE="200"
export LANGSMITH_BATCH_TIMEOUT="10"

# === セルフホスト環境 ===
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_pt_xxxxxxxx"
export LANGSMITH_ENDPOINT="https://langsmith.internal.example.com"
export LANGSMITH_PROJECT="my-app"

まとめ

LangSmith は、LLM アプリケーション開発の全ライフサイクルを支援する包括的なプラットフォームである。主要な価値は以下の通り：

可観測性の向上: トレーシングにより LLM アプリケーションのブラックボックスを解消し、デバッグと最適化を大幅に効率化する
品質の体系的管理: データセットと評価フレームワークにより、LLM 出力の品質を定量的に管理・改善できる
本番環境の安定運用: モニタリング、オンライン評価、アラートにより、本番環境の品質を継続的に監視・保証する
チームコラボレーション: ワークスペース、RBAC、アノテーションキューにより、チーム全体での LLM アプリケーション品質管理が可能になる
柔軟なデプロイメント: クラウド、セルフホスト、ハイブリッドの選択肢により、様々なセキュリティ要件に対応できる

LLM アプリケーションの開発において、LangSmith は「なぜ動くのか」「なぜ動かないのか」を理解するための不可欠なツールとなっている。