Hacker News

Subscribe to Hacker News feed
Hacker News RSS
Updated: 22 min 36 sec ago

The Model Counting Competitions 2021-2023

Tue, 04/22/2025 - 11:51am

Article URL: https://arxiv.org/abs/2504.13842

Comments URL: https://news.ycombinator.com/item?id=43763572

Points: 1

# Comments: 0

Categories: Hacker News

Hyperwood – Open-Source Furniture

Tue, 04/22/2025 - 11:50am

Article URL: https://hyperwood.org/

Comments URL: https://news.ycombinator.com/item?id=43763565

Points: 2

# Comments: 0

Categories: Hacker News

Think of an Elephpant

Tue, 04/22/2025 - 11:47am
Categories: Hacker News

Show HN: Creative Storytelling for Little Ones

Tue, 04/22/2025 - 11:45am

Hi HN,

I built StoryPup to make creating personalized stories for kids easier and more fun. It uses AI to help generate unique tales, and you can even use your own voice for narration to make them extra special.

Built this as a parent who wanted a better way to engage my kids with stories. Would love to hear your thoughts or any feedback!

App Store link: https://apps.apple.com/us/app/storypup/id6744484541

Comments URL: https://news.ycombinator.com/item?id=43763513

Points: 1

# Comments: 0

Categories: Hacker News

Native VisionOS Platform Support

Tue, 04/22/2025 - 11:39am
Categories: Hacker News

Launch HN: Infra.new (YC W23) – DevOps Copilot with Guardrails Built In

Tue, 04/22/2025 - 10:59am

Hey HN, we’re Caleb, Michael, and Josh, the founders of infra.new (https://infra.new/), a DevOps Copilot that can configure and deploy apps on AWS, GCP, and Azure using Terraform and GitHub Actions.

You start by describing your infrastructure needs in detail and optionally attach any source code. The agent will clarify your requirements and either execute the task immediately or generate a plan with step-by-step instructions for you to approve. Once you’re happy with the changes, export everything to GitHub or let the agent provision it in your cloud account. Here’s a quick demo of deploying a new app to GCP / AWS: https://www.loom.com/share/4627b3cd96cc439e9981a38363b7f6f7

Why build a new coding agent when there are good ones already out there? We believe there’s room for a new agent that is specifically built for DevOps tasks since the risks are much higher – it's easy to rollback AI-related errors in a web app, but fixing a misconfigured database is not nearly as easy. By focusing specifically on cloud infra, we can provide all the visibility and checks you need to feel confident in your configuration changes.

At our previous jobs, we built an internal data / ML platform at Google Life Sciences that involved migrating off of internal Google infrastructure to the public cloud (GCP). We quickly learned how complicated it can be to configure cloud infrastructure well, even for seemingly simple tasks. Configuring an app with CI/CD requires knowledge of multiple infra tools, cloud services, and best practices. Mistakes can be costly and diagnosing issues can send you down a rabbit hole of cloud docs.

Our goal is to help engineers feel confident when making changes in their cloud. We designed the workflow to start with a prompt, a template, or a GitHub repository. After clarifying your requirements, the agent will start generating IaC, CI/CD, and other configurations using the latest docs, public Terraform Registries, and a set of best practices we dynamically load into the context window.

All changes are run through static analysis to detect hallucinations, estimate cost changes, and visualize your infrastructure components as you go. Once you’re happy with the changes, you can export everything to GitHub for review. You also have the option to deploy directly to your cloud from the workspace and let the agent diagnose any deployment issues. The deployment flow is "pseudo-deterministic" in that it follows a checklist of human-guided instructions that help it stay in bounds, but we still recommend only using this feature for dev environments and using GitOps for any changes to production.

The current plan is to continue adding support for more tools (Kubernetes and GitLab are next) and we may add a CLI that lets you bring the agent into your local workspace.

We’d love to hear your feedback and ideas!

Comments URL: https://news.ycombinator.com/item?id=43763026

Points: 1

# Comments: 0

Categories: Hacker News

A5 – A global, equal-area, millimeter-accurate geospatial index

Tue, 04/22/2025 - 10:59am

Article URL: https://a5geo.org/

Comments URL: https://news.ycombinator.com/item?id=43763016

Points: 1

# Comments: 0

Categories: Hacker News

A simple heuristic for agents: human-led vs. human-in-the-loop vs. agent-led

Tue, 04/22/2025 - 10:59am

tl;dr - the more agency your agent has, the simpler your use case needs to be

Most if not all successful production use cases today are either human-led or human-in-the-loop. Agent-led is possible but requires simplistic use cases.

---

Human-led:

An obvious example is ChatGPT. One input, one output. The model might suggest a follow-up or use a tool but ultimately, you're the master in command.

---

Human-in-the-loop:

The best example of this is Cursor (and other coding tools). Coding tools can do 99% of the coding for you, use dozens of tools, and are incredibly capable. But ultimately the human still gives the requirements, hits "accept" or "reject' AND gives feedback on each interaction turn.

The last point is important as it's a live recalibration.

This can sometimes not be enough though. An example of this is the rollout of Sonnect 3.7 in Cursor. The feedback loop vs model agency mix was off. Too much agency, not sufficient recalibration from the human. So users switched!

---

Agent-led:

This is where the agent leads the task, end-to-end. The user is just a participant. This is difficult because there's less recalibration so your probability of something going wrong increases on each turn… It's cumulative.

P(all good) = pⁿ

p = agent works correctly n = number of turns / interactions

Ok… I'm going to use my product as an example, not to promote, I'm just very familiar with how it works.

It's a chat agent that runs short customer interviews. My customers can configure it based on what they want to learn (i.e. why a customer churned) and send it to their customers.

It's agent-led because

→ as soon as the respondent opens the link, they're guided from there → at each turn the agent (not the human) is deciding what to do next

That means deciding the right thing to do over 10 to 30 conversation turns (depending on config). I.e. correctly decide:

→ whether to expand the conversation vs dive deeper → reflect on current progress + context → traverse a bunch of objectives and ask questions that draw out insight (per current objective)

Let's apply the above formula. Example:

Let's say:

→ n = 20 (i.e. number of conversation turns) → p = .99 (i.e. how often the agent does the right thing - 99% of the time)

That equals P(all good) = 0.99²⁰ ≈ 0.82

So if I ran 100 such 20‑turn conversations, I'd expect roughly 82 to complete as per instructions and about 18 to stumble at least once.

Let's change p to 95%...

→ n = 20 → p = .95

P(all good) = 0.95²⁰ ≈ 0.358

I.e. if I ran 100 such 20‑turn conversations, I’d expect roughly 36 to finish without a hitch and about 64 to go off‑track at least once.

My p score is high. I had to strip out a bunch of tools and simplify but I got there. And for my use case, a failure is just a slightly irrelevant response so it's manageable.

---

Conclusion:

Getting an agent to do the correct thing 99% is not trivial.

You basically can't have a super complicated workflow. Yes, you can mitigate this by introducing other agents to check the work but this then introduces latency.

There's always a tradeoff!

Know which category you're building in and if you're going for agent-led, narrow your use-case as much as possible.

Comments URL: https://news.ycombinator.com/item?id=43763011

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Cursor for Email

Tue, 04/22/2025 - 10:58am

Hey!

For the past months I've been building an MVP cursor for email and looking to get my first 10 early users.

The project is still in early development, but would love to hear your input on this project.

current features: categorization, auto draft, using llm to edit/add text, auto task creation from email.

under development: keyboard shortcuts, Cursor-like tab navigation, lots of bug-fixes

I'd love to hear what feedback you have, features you'd like to have or if you'd use/buy the product.

Cheers Doru

Comments URL: https://news.ycombinator.com/item?id=43763002

Points: 1

# Comments: 0

Categories: Hacker News

Pages