AgentSkillsCN

kusto-analyst

使用 Azure 数据探索器(Kusto)分析 Android 身份验证遥测数据。此技能适用于查询 android_spans、eSTS 关联、延迟调查、错误分析以及遥测故障排除等场景。触发条件包括“查询 Kusto”、“分析遥测数据”、“检查 android_spans”、“eSTS 关联”、“延迟调查”、“错误模式”,或任何涉及遥测数据分析的请求。

SKILL.md
--- frontmatter
name: kusto-analyst
description: Analyze Android authentication telemetry using Azure Data Explorer (Kusto). Use this skill for querying android_spans, eSTS correlation, latency investigation, error analysis, and telemetry troubleshooting. Triggers include "query Kusto", "analyze telemetry", "check android_spans", "eSTS correlation", "latency investigation", "error patterns", or any request involving telemetry data analysis.

Kusto Analyst

Analyze Android authentication telemetry using Azure Data Explorer (Kusto) for error analysis, latency investigation, and cross-cluster correlation.

Available MCP Tools

Always use these tools to execute Kusto queries:

  • mcp_my-mcp-server_execute_query - Execute Kusto queries
  • mcp_my-mcp-server_list_tables - Discover available tables
  • mcp_my-mcp-server_get_table_schema - Explore field schema

Android Telemetry Cluster

Cluster Information

PropertyValue
Cluster URLhttps://idsharedeus2.kusto.windows.net/
Production Databasead-accounts-android-otel
Sandbox Databaseandroid-broker-otel-sandbox

Primary Tables

TablePurposeRetention
android_spansAuthentication telemetry spans30 days
android_metricsAggregated metrics data30 days

Materialized Views

  • 46 pre-aggregated views for faster queries
  • Retention: 90 days (longer than raw tables!)
  • Update frequency: Hourly
  • Discover with: .show materialized-views query
  • Categories: Error Analysis, Silent/Interactive Auth, PRT Operations, Broker & Apps, Devices, Performance

User Intent Translation

User SaysSpan Name
"Interactive request"AcquireTokenInteractive
"Silent request"AcquireTokenSilent
"PRT operation"Various PRT-related spans

android_spans Key Fields

Span Identification

FieldDescription
span_idUnique identifier for the span
parent_span_idParent span ID for hierarchical relationships
trace_idTrace ID linking related spans
correlation_idCorrelation ID for request tracking (use for eSTS correlation)
span_nameOperation name (e.g., "AcquireTokenInteractive")

Error Information

FieldDescription
error_codeError code (e.g., "auth_cancelled_by_sdk")
error_messageDetailed error message
span_statusStatus ("OK", "ERROR")

Broker Information

FieldDescription
active_broker_package_nameCurrently active broker package
current_broker_package_nameCurrent broker package
calling_package_namePackage that initiated the call

Common Broker Packages:

  • com.microsoft.windowsintune.companyportal - Company Portal
  • com.azure.authenticator - Azure Authenticator
  • com.microsoft.appmanager - Microsoft App Manager

Device & Timing

FieldDescription
DeviceInfo_IdUnique device identifier
DeviceInfo_ModelDevice model (e.g., "Pixel 7 Pro")
EventInfo_TimeEvent timestamp (use ago(Xd) for filtering)
elapsed_timeTotal operation duration

Common Query Patterns

Discovery Queries

Find top span names:

kql
android_spans
| where EventInfo_Time >= ago(7d)
| summarize count() by span_name
| order by count_ desc
| take 30

Find common error codes:

kql
android_spans
| where EventInfo_Time >= ago(7d)
| where isnotempty(error_code)
| summarize count() by error_code
| order by count_ desc
| take 20

Error Analysis

Error patterns for specific span:

kql
android_spans
| where EventInfo_Time >= ago(7d)
| where span_name == "AcquireTokenInteractive"
| where isnotempty(error_code)
| summarize error_count = count() by error_code, error_message
| order by error_count desc

Device-level error aggregation:

kql
android_spans
| where EventInfo_Time >= ago(7d)
| summarize 
    total_devices = dcount(DeviceInfo_Id),
    error_count = count()
    by error_code

Company Portal Detection

kql
android_spans
| where EventInfo_Time >= ago(7d)
| extend has_cp = iff(
    active_broker_package_name contains "companyportal" or 
    calling_package_name contains "companyportal", 
    1, 0)
| summarize 
    total = count(),
    with_cp = countif(has_cp == 1)
| extend cp_percentage = round(100.0 * with_cp / total, 2)

Parent-Child Span Relationships

kql
let parentSpans = android_spans
| where EventInfo_Time >= ago(7d)
| where span_name == "AcquireTokenInteractive"
| project parent_span_id = span_id, trace_id;

let childSpans = android_spans
| where EventInfo_Time >= ago(7d)
| where span_name == "ProcessWebCpRedirects"
| project child_span_id = span_id, parent_span_id, trace_id;

parentSpans
| join kind=inner (childSpans) on trace_id

Latency Investigation Workflow

When investigating latency increases (e.g., AcquireTokenSilent), follow these steps:

Step 1: Identify the Increase

kql
android_spans
| where EventInfo_Time >= ago(7d)
| where span_name == "AcquireTokenSilent"
| summarize 
    p50 = percentile(elapsed_time, 50),
    p90 = percentile(elapsed_time, 90),
    p95 = percentile(elapsed_time, 95),
    p99 = percentile(elapsed_time, 99)
    by bin(EventInfo_Time, 1h)
| order by EventInfo_Time desc

Step 2: Find Culprit Dimensions

kql
android_spans
| where EventInfo_Time >= ago(3d)
| where span_name == "AcquireTokenSilent"
| summarize 
    count = count(),
    p90_latency = percentile(elapsed_time, 90)
    by active_broker_package_name, current_broker_package_name
| order by p90_latency desc

Step 3: Check Error Rate Correlation

kql
android_spans
| where EventInfo_Time >= ago(7d)
| where span_name == "AcquireTokenSilent"
| summarize 
    total = count(),
    errors = countif(isnotempty(error_code)),
    avg_latency = avg(elapsed_time)
    by bin(EventInfo_Time, 1h)
| extend error_rate = round(100.0 * errors / total, 2)
| order by EventInfo_Time desc

Step 4: Analyze Elapsed Time Breakdown

kql
android_spans
| where EventInfo_Time >= ago(3d)
| where span_name == "AcquireTokenSilent"
| where isnotempty(elapsed_time_cache_load) or isnotempty(elapsed_time_network_acquire_at)
| summarize 
    avg_cache = avg(elapsed_time_cache_load),
    avg_network = avg(elapsed_time_network_acquire_at),
    avg_total = avg(elapsed_time)
    by bin(EventInfo_Time, 1h)

MATS telemetry

Cluster Information

PropertyValue
Cluster URLhttps://idsharedeus2.kusto.windows.net/
DatabaseMATS_Office
Database IDfaab4ead691e451eb230afc98a28e0f2

eSTS (Token Service) Cluster

Cluster Information

PropertyValue
Cluster URLhttps://estswus2.kusto.windows.net/
DatabaseESTS
Primary TableAllPerRequestTable (cross-cluster union view)

Android-Specific Filtering

⚠️ ALWAYS filter by Android platform:

kql
AllPerRequestTable
| where env_time >= ago(7d)
| where DevicePlatformForUI == "Android"

Key eSTS Fields

CategoryFieldDescription
Request IDCorrelationIdLinks to Android correlation_id
RequestIdUnique eSTS request ID
env_timeRequest timestamp
Request TypeCallAuth call type (e.g., "token")
IsInteractiveUser interaction required
PromptPrompt type ("none", "login")
StatusResult"Success" or "Failure"
ErrorCodeError code if failed
HttpStatusCodeHTTP status
PRTPrtDataPRT-related data (JSON)
DeviceDeviceIdDevice identifier
ApplicationIdClient app ID
UserTenantIdTenant ID
UserPrincipalObjectIDUser's Entra ID object ID
AccountTypeAAD, MSA, etc.

Cross-Cluster Correlation

To trace a complete flow (Android → Broker → eSTS):

Step 1: Get correlation IDs from Android spans

kql
// Run against: https://idsharedeus2.kusto.windows.net/ | ad-accounts-android-otel
android_spans
| where EventInfo_Time >= ago(7d)
| where span_name == "AcquireTokenInteractive"
| where error_code == "some_error"
| project correlation_id, span_id, EventInfo_Time, error_code
| take 100

Step 2: Find corresponding eSTS requests

kql
// Run against: https://estswus2.kusto.windows.net/ | ESTS
AllPerRequestTable
| where env_time >= ago(7d)
| where DevicePlatformForUI == "Android"
| where CorrelationId in ("correlation-id-1", "correlation-id-2")
| project env_time, CorrelationId, Call, Result, ErrorCode, PrtData, ResponseTime

eSTS Query Examples

Find requests by CorrelationId:

kql
AllPerRequestTable
| where env_time >= ago(7d)
| where DevicePlatformForUI == "Android"
| where CorrelationId == "your-correlation-id-here"
| project env_time, CorrelationId, Call, Result, ErrorCode, IsInteractive, PrtData, ResponseTime

Check PRT usage:

kql
AllPerRequestTable
| where env_time >= ago(7d)
| where DevicePlatformForUI == "Android"
| extend HasPRT = isnotempty(PrtData)
| summarize 
    total_requests = count(),
    prt_requests = countif(HasPRT),
    success_rate = round(100.0 * countif(Result == "Success") / count(), 2)
    by HasPRT

Error patterns:

kql
AllPerRequestTable
| where env_time >= ago(7d)
| where DevicePlatformForUI == "Android"
| where Result != "Success"
| summarize error_count = count() by ErrorCode, SubErrorCode, Call
| order by error_count desc
| take 20

Query Optimization Tips

TipExample
Always filter by time first`
Use take for exploration`
Project early`
Use dcount() for unique countsdcount(DeviceInfo_Id)
Check field population`
Break long rangesQueries > 7 days may timeout

Important Notes

  • Sensitive Data: correlation_id may be scrubbed to "Scrubbed" for privacy
  • Field Availability: Not all fields populated in all spans; use isnotempty()
  • Cross-Cluster Joins: Cannot directly join Android + eSTS clusters; correlate via CorrelationId
  • PrtData Parsing: Use parse_json() or extend to extract PRT fields