Automated Time Tracking
65 min
0\) executive summary we are building the industry leading, passive, automated time tracking platform for msp operations the system captures technician time automatically , links it to the right work artifacts (tickets, outages, projects, docs, training, rmm sessions), pre aggregates entries , and presents a single daily review for confirmation it eliminates manual timers, increases capture rate, improves sliq/trumethods reporting accuracy, and reduces billing leakage—while respecting privacy and minimizing workflow friction core principles passive first capture from agent, rmm, chat, calendar, docs, automation, and training the default is “already captured ” explainable every auto link has a rule, confidence, and human readable reason noise resistant sessionization, sticky gaps, and overlap policies yield meaningful blocks policy driven rounding, minimums, billable defaults per org review once/day one place to confirm, edit, or resolve unlinked gaps privacy by design idle detection, opt in screenshots, redaction, and role based access metrics complete all sliq/trumethods categories and formulas supported out of the box 1\) scope in scope agent for foreground activity (windows/macos initially) with url/title/process capture ingestion apis, link engine, aggregation service, overlap resolution, policy engine source adapters rmm remote sessions, chat (ticket channels), calendar/schedule, docs/knowledge, cybersecurity training, workflow automation halos ui/ux technician daily review, unlinked activity, rule manager + tester, policy settings, overlap review, source integrations, admin analytics reports utilization, effective rate, capture rate, sliq/trumethods distribution, outage costing out of scope (v1) mobile agents (capture via mobile browser/app); can be v1 1+ ai auto summarization of notes; optional backlog item 2\) personas & goals technician never start/stop timers confirm a daily sheet in < 2 minutes resolve the rare “unlinked” items with suggestions incident commander / service desk lead reliable, non overlapping time blocks for incidents and queues; visibility into progress finance/billing accurate, policy compliant entries for invoicing and agreements; fewer adjustments vcio/account manager clear proactive vs reactive effort; proof of value; qbr ready reports security/ciso auditable trails with privacy controls and least privilege access primary outcomes 95% capture rate (captured hours ÷ expected hours) <5% unlinked minutes per tech per day <2 min median confirmation time per tech per day ±1% billing variance vs manual baselines 3\) definitions & entities activity (user activity log) foreground window interval with title/url/process (+ idle/input metrics) source fact rmm session, chat activity window, calendar slot, doc reading session, training session, workflow halo link mapping of activity to an entity (ticket/outage/etc ) with explanation (rule, confidence, reason) cluster merged contiguous activities (≤ sticky gap) with same user + link → one time entry overlap group set of entries whose times intersect; resolved by deterministic policy policy org level rounding, min block, sticky gap, defaults for admin/training billable (see ddl canvas for table definitions ) 4\) high level architecture \[agent]──batched activity──> \[ingestion api]──> user activity log │ ├─> \[source adapters] (rmm/chat/calendar/docs/training/automation) → activity facts │ ├─> \[linking engine] │ ├─ rules (precedence/confidence) │ └─ explanations (audit trail) │ ├─> \[sessionizer/aggregator] │ └─ clusters → time entry (policy min/round/sticky) │ ├─> \[overlap resolver] │ └─ primary/secondary/split + audit │ └─> \[rating/billing] (downstream) \[web app] ├─ technician daily review (confirm/edit/delete) ├─ unlinked activity (suggestions) ├─ rule manager + tester ├─ policy settings (per org) ├─ overlap review └─ analytics (sliq/trumethods, utilization, capture rate) 5\) data model (reference) use the ddl canvas for authoritative schemas of updated user activity log, time entry new activity linking rule, activity link explanation, time entry policy, document interaction event related existing tables referenced device, user, organization, ticket, outage (master/child), project task, technician schedule, chat channel, agreement, work type, invoice line item, location, time entry source, document version 6\) source adapters (capture) 6 1 agent (windows/macos) responsibilities foreground detection; window title, process, and url (browser extensions or accessibility apis) idle detection input event counters + os idle time batch completed intervals every 60–90s with backpressure and offline queue respect privacy policies (no content keystrokes, optional screenshots via separate, opt in mechanism) payload (per interval) { "device id" "uuid", "user id" "uuid", "user session identifier" "string", "application name" "google chrome", "process name" "chrome exe", "window title" "ticket tck 12345 – customer x", "url" "https //app/tickets/1e36…", "start time utc" "2025 08 19t18 12 00z", "end time utc" "2025 08 19t18 14 40z", "duration seconds" 160, "input events" 97, "is idle" false } reliability local wal (write ahead log) with rotating file cap exponential backoff on http errors; retry queue persists across reboots idempotency key header = hash(device id + start time utc + end time utc + process name + window title) 6 2 rmm remote sessions hook into remote control session events (join/leave, tool, device, user) emit source facts to ingestion with a time entry source of remotecontrol 6 3 chat (ticket channels) for messages sent by a tech in a ticket channel, create activity windows first to last message +/ 2min grace 6 4 calendar / dispatch technician schedule slots create soft claims; used by linker and to generate fallback entries when no foreground activity exists (onsite/phone) 6 5 docs/knowledge web app emits document interaction event on viewer open/close; may include ?ticketid context 6 6 cybersecurity training track session start/end on training content; associate with remedial training assignment/org 6 7 workflow automation halos for human in the loop automation steps, create short “halo” intervals (e g , 60–120s) to represent supervision/validation 7\) ingestion api (crud & contracts) 7 1 endpoints post /v1/activity — batch write of agent intervals post /v1/source facts — generic facts (rmm, chat, calendar, docs, training, automation) each type has a source type field post /v1/docs/interactions — document interaction events (optional separate endpoint) common behaviors validate timestamps (end ≥ start), min/max duration (e g , drop < 3s unless from automation) normalize users/devices (reject unknown unless allowlist enabled) upsert by idempotency key; never duplicate store to user activity log (agent, chat derived, docs) or to a side table → but for v1 we normalize all sources into user activity log with application name + process name set contextually (e g , “remote control session”) 7 2 db write patterns (pseudocode) \ upsert activity interval insert into user activity log ( ) on conflict (/ synthetic unique key if provided /) do nothing; 8\) linking engine goal assign linked entity type/id to each unlinked user activity log and log why 8 1 rule evaluation order url (platform routes) /tickets/{uuid}, /outages/{uuid}, /projects/{uuid} chat → ticket resolve channel to ticket device → ticket nearest active ticket involving this device/user calendar slot context windowtitle patterns (tck \d+, inc\d+) process heuristics (rmm tool names → device/ticket) 8 2 algorithm (pseudocode) for each activity where linked entity id is null candidates = \[] for rule in rules ordered by (active desc, scope, precedence asc, confidence desc) if rule matches(activity) entity id = resolve(rule entity lookup, activity) if entity id candidates add({rule, entity id, rule confidence, reason}) choose = best(candidates) if confidence >= threshold else null if choose update activity linked entity type/id insert activity link explanation for each candidate; mark chosen=true for the winner thresholds default acceptance threshold 70 below → remains unlinked (shown in review) examples url /tickets/98f8… → type ticket, id = extracted window title tck 12345 → select latest open ticket with that key device activity on device a while ticket b is “in progress” with device a → link to b 9\) sessionization & aggregation goal convert linked activities into meaningful time entry blocks 9 1 clustering group by (user id, linked entity type, linked entity id) and merge intervals with gaps ≤ sticky gap seconds (policy per org) propagate source (agent/remote/chat/calendar/docs/training/automation) from the dominant share of intervals; if mixed, prefer higher precedence source per overlap policy 9 2 policy application apply min block seconds (floor) and round to seconds (nearest up/down per org policy; default round to nearest, tie → up) if context tagged admin or training , set billable based on policy defaults (overridable by work type/agreements) 9 3 emission create one time entry per cluster with start/end/duration mark all clustered user activity log rows as processed for time entry = true and set a generated activity cluster id on those rows 10\) overlap resolution goal prevent double counting/billing 10 1 priority order calendar/meeting (onsite/dispatch) remote control url linked desktop activity chat activity automation halos docs/training (lowest by default) 10 2 policy if two entries overlap keep the higher priority as primary lower becomes secondary (non billable by default) or split (ratio, e g , 70/30) if org enables splitting persist overlap id, overlap resolution, and optional overlap primary time entry id 10 3 ui overlap review screen (rare use) shows groups; allows override to split/secondary 11\) sliq / trumethods mapping & metrics work type taxonomy (examples) reactive – remote support (tickets) reactive – incident management (child outages → client; master outage → msp internal) proactive – maintenance/automation project – delivery administration (non billable default) training (billable=false default) travel (separate flag) key metrics & formulas utilization % = billable hours ÷ available hours (per role policy) × 100 capture rate % = captured hours ÷ expected hours × 100 reactive % / proactive % / project % / admin % / training % = hours in class ÷ total hours effective rate = billable revenue ÷ billable hours auto link rate % = auto linked minutes ÷ total captured minutes unlinked minutes (median, p95) confirmation time (median per tech/day) reports and dashboards will expose these with filters by org, customer, team, tech, and date 12\) user experience (screens) 12 1 technician – daily review header kpis captured, confirmed, unlinked, pending; soft goal meter (e g , 7 5h target) timeline view chronological blocks colored by source; icons for confidence, overlap, policy rounding per entry actions confirm ✓, edit (duration/note/work type/billable), reassign (ticket/outage/project), delete (marks as non work) undo unlinked bucket suggested targets ranked by confidence (url/title/device/chat/calendar hints) one click assign nudges “you still have 12 minutes unlinked; auto assign to admin?” 12 2 rule manager rule list with scope/precedence/confidence, enabled toggle editor with pattern tester paste example title/url; see extracted entity and confidence audit view recent link explanations referencing this rule 12 3 policy settings (per org) min block, rounding increment, sticky gap defaults for admin/training billable overlap handling (secondary vs split ratio) 12 4 overlap review grouped by overlap id; shows primary/secondary/split; allow override 12 5 source integrations rmm tools, chat workspace, calendar (m365/google), docs, training platform; connection status and mapping helpers 12 6 admin analytics sliq/trumethods, utilization, capture rate, effective rate; filters and exports 12 7 privacy & audit screenshot controls (off by default); redaction; who can view; retention policy access logs for time data 13\) api surface (selected) activity post /v1/activity (batch) get /v1/activity?user id=…\&from=…\&to=… (admin) linking post /v1/linking/run (admin – backfill window) get /v1/linking/explanations?activity id=… time entries get /v1/time entries?user id\&from\&to post /v1/time entries/{id}/confirm patch /v1/time entries/{id} (edit) delete /v1/time entries/{id} rules get /v1/linking/rules post /v1/linking/rules patch /v1/linking/rules/{id} post /v1/linking/rules/test (pattern tester) policy get /v1/time entry policies?organization id post /v1/time entry policies patch /v1/time entry policies/{id} docs post /v1/docs/interactions all endpoints require auth (jwt) + org scoping; idempotency for writes 14\) services & jobs linking engine worker streams new user activity log with linked entity id is null; writes explanations and links aggregator periodic (e g , every 10–15 min) + on demand; clusters by policy; writes time entry rows overlap resolver runs after aggregator; applies priority rules; sets overlap fields backfill specify date windows to re run link+aggregate after rule/policy changes retention/archival rotate raw activity after n days (configurable); retain time entries and explanations indefinitely runtime queue backed (e g , rabbitmq/sqs) with at least once delivery idempotent processors (use activity ids and cluster keys) 15\) billing & agreements work type + agreement determine billable coverage and rates master outage → internal “incident management” ticket (msp org) for costing; child outage → client ticket calculated rate/bill amount set by downstream rating engine; this system provides pristine time blocks 16\) security & privacy rbac only managers/billing can view others’ detailed time; screenshots (if enabled) require elevated permission pii/data minimization no keystroke content; only counts no page content; only url/title/process encryption tls in transit; row level at rest for sensitive sources compliance audit trail via activity link explanation; access logs for who viewed/edited time redaction regex allow list for urls; domains blacklist; hash query strings 17\) observability metrics capture rate, auto link rate, backlog size, job latencies, confirmation median, % overlaps logs ingestion errors, rule exceptions, policy application results tracing request → rule eval → aggregation → overlap → ui confirm 18\) performance targets ingestion sustain 50 events/sec/agent burst, 5k agents system wide (batching) linking < 50ms p95 per activity aggregation complete a 24h window for 1k users in < 5 min ui daily review loads < 2s p95 for a typical day ( 50 blocks) 19\) edge cases & rules very long single windows (meetings) clamp to schedule end if idle for > x minutes flapping windows merge micro intervals (< 5s) into adjacent missing end times (crash/reboot) infer end using next start or heartbeat cutoff multi customer contexts open prefer explicit url route over heuristic device shared by two users rely on login bound user session identifier 20\) qa strategy unit rule matchers, url parsers, window title extractors, policy rounding, cluster logic, overlap resolver integration agent → ingestion → link → aggregate → overlap → ui confirm property based tests randomized intervals with sticky gaps, overlaps, and mixed sources e2e scenarios reactive ticket day; outage master/child; onsite calendar block; doc research + remote session; training afternoon; admin tasks performance tests synthetic load 5k agents, 24h 21\) deployment & operations feature flags per org enablement; per source enablement; screenshots off by default rollout plan pilot on internal msp org → 5 friendly customers → ga backfill support from date x to y with safe windowing and progress ui 22\) migration plan deploy new tables (ddl canvas) backfill policy rows for all orgs with sensible defaults populate seed rules for url routes and common ticket patterns turn on linker in dry run (explanations only) to validate match rates enable aggregator (policy min/round) for a pilot team enable overlap resolver and review ui expand to all technicians 23\) risk register & mitigations over linking (false positives) confidence thresholds, explanations, and daily review to correct; rule tester before deploy under capture on macos urls browser extension fallback; process/title heuristics privacy backlash opt in screenshots, clear policies, restricted viewing, redaction performance regressions queue isolation per component; horizontal scale; backpressure signalling to agents 24\) open questions split ratios by work type? (e g , always favor customer facing work ) automatic “auto confirm” for high confidence, policy compliant entries? (opt in per org) sla around time availability for near real time dashboards (e g , 5–10 minutes?) 25\) appendices a) sample linking rules (seed) scope pattern entity type precedence confidence notes url /tickets/(\[a f0 9 ]{36}) ticket 10 98 primary route matcher url /outages/(\[a f0 9 ]{36}) outage 10 98 outage route windowtitle tck (\d{4,7}) ticket 40 85 map to last open ticket with that external key process `screenconnect splashtop anydesk` ticket 60 calendar ticket 70 70 use schedule context when active b) sessionization pseudocode for each user, each entity sort intervals by start current = first for next in intervals\[1 ] if gap(current end, next start) <= sticky gap current end = next end else emit cluster(current) current = next emit cluster(current) c) rounding logic rounded = roundtonearest(duration, round to seconds) if rounded < min block seconds rounded = min block seconds d) overlap resolution pseudocode groups = find overlaps(time entries) for g in groups sort by priority desc primary = first for e in rest if policy split enabled split ratio = policy split ratio e billable = (e billable and split ratio secondary > 0) e overlap resolution = 'split' else e billable = false e overlap resolution = 'secondary' e overlap primary time entry id = primary id e) sliq/trumethods mapping matrix (example) work type category billable default reactive – remote support reactive true reactive – incident management (child) reactive true reactive – incident management (master) internal/msp false proactive – maintenance/automation proactive true project – delivery project true administration admin false training training false travel travel true 26\) next steps engineering review of prd + ddl create tickets agent tasks, ingestion, linker, aggregator, overlap, ui screens, integrations pilot feature flag plan and initial seed rules