Solvyr
Technical pilot · batch inference on distributed household GPUs

Batch-oriented AI inference on distributed household GPUs.

Solvyr is an early-stage technical pilot for running retry-safe, latency-insensitive inference workloads on distributed household GPUs (starting with RTX 3090).

Pilot inquiries
If you include model + typical runtime + VRAM estimate, we can respond fast.
Status
Pilot-only

No UI, no SLA, no automation. Hands-on onboarding, bounded scope.

Best for
Seconds–minutes

Embeddings, classification, tagging, summarization, offline jobs.

Not for
Real-time

Low-latency APIs, interactive chat, long reservations, or “must never fail”.

We optimize for cost-per-result and repeatable execution on unreliable nodes — not latency guarantees.

What Solvyr is

Technical pilot (not a product launch)

Solvyr is a deliberately constrained system to validate reliability, retries, and economics of batch inference on distributed GPUs. It is not a public signup, and we keep scope tight.

Work-unit model

Nodes pull small, retryable work units (seconds to a few minutes). Failures are expected; retries and failure visibility are first-class.

Security posture

Outbound-only from nodes. No exposed home ports. Controlled connectivity (VPN/mesh). We prioritize correctness and containment over convenience.

Is your workload a fit?

If you can answer “yes” to most of these, a pilot might make sense.

Batch / offline (not interactive)
Latency-insensitive (seconds–minutes is fine)
Retry-safe (can be re-run without causing harm)
Inputs/outputs can be minimized and audited
You can tolerate occasional failures during the pilot
You need strict uptime or low-latency responses
Hard gate: if re-running the same job 3–5x is unacceptable, this pilot is not a fit.
Rule of thumb: if it barely fits VRAM or needs complex multi-GPU orchestration, it’s not v0.

How pilots work

1

Quick fit check (15 minutes)

You share one representative job: model, input shape, runtime expectations, retry-safety, and constraints.

2

Minimal integration

One customer ID, one API token, one endpoint, one sample payload. Manual ops. Billing only on successful completed work.

3

Run a bounded pilot

We test survivability (retries, failure modes, tail latencies), correctness checks, and cost-per-run versus your baseline.

4

Decide, fast

At the end we decide: proceed (expand), pause, or stop. No long tail of “maybe”.

Who we are

A small founder–engineer team running a focused technical pilot.

Jan (founder / operator)

Runs pilot onboarding and ops with a strong bias for scope discipline: small work units, retries-first, and measurable cost-per-result. The goal is to validate reliability and economics quickly—without “platform” theater.

Maksym (founding engineer)

Builds the runtime and reliability foundation: predictable node behavior, failure visibility, and repeatable execution on unreliable machines. Focus is on robustness before features.

What we optimize for

Batch inference where customers care about cost-per-result and retry-safe execution—not low-latency APIs or strict uptime. We keep v0 deliberately narrow to learn fast.

Contact

If you’re evaluating batch inference economics and can tolerate a constrained technical pilot, reach out.

Email
Pilot discussions only. We reply fastest with one representative job.
What to include
  • Workload type (embeddings / classification / tagging / summarization)
  • Typical runtime per job
  • Model + VRAM requirement (rough)
  • Retry-safety notes