Batch-oriented AI inference on distributed household GPUs.
Solvyr is an early-stage technical pilot for running retry-safe, latency-insensitive inference workloads on distributed household GPUs (starting with RTX 3090).
No UI, no SLA, no automation. Hands-on onboarding, bounded scope.
Embeddings, classification, tagging, summarization, offline jobs.
Low-latency APIs, interactive chat, long reservations, or “must never fail”.
We optimize for cost-per-result and repeatable execution on unreliable nodes — not latency guarantees.
What Solvyr is
Technical pilot (not a product launch)
Solvyr is a deliberately constrained system to validate reliability, retries, and economics of batch inference on distributed GPUs. It is not a public signup, and we keep scope tight.
Work-unit model
Nodes pull small, retryable work units (seconds to a few minutes). Failures are expected; retries and failure visibility are first-class.
Security posture
Outbound-only from nodes. No exposed home ports. Controlled connectivity (VPN/mesh). We prioritize correctness and containment over convenience.
Is your workload a fit?
If you can answer “yes” to most of these, a pilot might make sense.
How pilots work
Quick fit check (15 minutes)
You share one representative job: model, input shape, runtime expectations, retry-safety, and constraints.
Minimal integration
One customer ID, one API token, one endpoint, one sample payload. Manual ops. Billing only on successful completed work.
Run a bounded pilot
We test survivability (retries, failure modes, tail latencies), correctness checks, and cost-per-run versus your baseline.
Decide, fast
At the end we decide: proceed (expand), pause, or stop. No long tail of “maybe”.
Who we are
A small founder–engineer team running a focused technical pilot.
Jan (founder / operator)
Runs pilot onboarding and ops with a strong bias for scope discipline: small work units, retries-first, and measurable cost-per-result. The goal is to validate reliability and economics quickly—without “platform” theater.
Maksym (founding engineer)
Builds the runtime and reliability foundation: predictable node behavior, failure visibility, and repeatable execution on unreliable machines. Focus is on robustness before features.
What we optimize for
Batch inference where customers care about cost-per-result and retry-safe execution—not low-latency APIs or strict uptime. We keep v0 deliberately narrow to learn fast.
Contact
If you’re evaluating batch inference economics and can tolerate a constrained technical pilot, reach out.
- Workload type (embeddings / classification / tagging / summarization)
- Typical runtime per job
- Model + VRAM requirement (rough)
- Retry-safety notes