Batch-oriented AI inference on distributed household GPUs.

Solvyr is an early-stage technical pilot for running retry-safe, latency-insensitive inference workloads on distributed household GPUs (starting with RTX 3090).

Pilot inquiries

Email: vandalen.janwouter@gmail.com

If you include model + typical runtime + VRAM estimate, we can respond fast.

Email me What to include

Status

Pilot-only

No UI, no SLA, no automation. Hands-on onboarding, bounded scope.

Best for

Seconds–minutes

Embeddings, classification, tagging, summarization, offline jobs.

Not for

Real-time

Low-latency APIs, interactive chat, long reservations, or “must never fail”.

We optimize for cost-per-result and repeatable execution on unreliable nodes — not latency guarantees.

What Solvyr is

Technical pilot (not a product launch)

Solvyr is a deliberately constrained system to validate reliability, retries, and economics of batch inference on distributed GPUs. It is not a public signup, and we keep scope tight.

Work-unit model

Nodes pull small, retryable work units (seconds to a few minutes). Failures are expected; retries and failure visibility are first-class.

Security posture

Outbound-only from nodes. No exposed home ports. Controlled connectivity (VPN/mesh). We prioritize correctness and containment over convenience.

Is your workload a fit?

If you can answer “yes” to most of these, a pilot might make sense.

Batch / offline (not interactive)

Latency-insensitive (seconds–minutes is fine)

Retry-safe (can be re-run without causing harm)

Inputs/outputs can be minimized and audited

You can tolerate occasional failures during the pilot

You need strict uptime or low-latency responses

Hard gate: if re-running the same job 3–5x is unacceptable, this pilot is not a fit.

Rule of thumb: if it barely fits VRAM or needs complex multi-GPU orchestration, it’s not v0.

How pilots work

Quick fit check (15 minutes)

You share one representative job: model, input shape, runtime expectations, retry-safety, and constraints.

Minimal integration

One customer ID, one API token, one endpoint, one sample payload. Manual ops. Billing only on successful completed work.

Run a bounded pilot

We test survivability (retries, failure modes, tail latencies), correctness checks, and cost-per-run versus your baseline.

Decide, fast

At the end we decide: proceed (expand), pause, or stop. No long tail of “maybe”.

Who we are

A small founder–engineer team running a focused technical pilot.

Jan (founder / operator)

Runs pilot onboarding and ops with a strong bias for scope discipline: small work units, retries-first, and measurable cost-per-result. The goal is to validate reliability and economics quickly—without “platform” theater.

Maksym (founding engineer)

Builds the runtime and reliability foundation: predictable node behavior, failure visibility, and repeatable execution on unreliable machines. Focus is on robustness before features.

What we optimize for

Batch inference where customers care about cost-per-result and retry-safe execution—not low-latency APIs or strict uptime. We keep v0 deliberately narrow to learn fast.

Contact

If you’re evaluating batch inference economics and can tolerate a constrained technical pilot, reach out.

vandalen.janwouter@gmail.com

Pilot discussions only. We reply fastest with one representative job.

Email me Check workload fit

What to include

Workload type (embeddings / classification / tagging / summarization)
Typical runtime per job
Model + VRAM requirement (rough)
Retry-safety notes