Hacker News

Subscribe to Hacker News feed
Hacker News RSS
Updated: 23 min 29 sec ago

Power to the Power Users

Wed, 11/20/2024 - 12:52pm
Categories: Hacker News

Show HN: Weave - actually measure engineering productivity

Wed, 11/20/2024 - 12:43pm

Hey HN,

We’re building Weave: an ML-powered tool to measure engineering output, that actually understands engineering output!

Why? Here’s the thing: almost every eng leader already measures output - either openly or behind closed doors. But they rely on metrics like lines of code (correlation with effort: ~0.3), number of PRs, or story points (slightly better at ~0.35). These metrics are, frankly, terrible proxies for productivity.

We’ve developed a custom model that analyzes code and its impact directly, with a far better 0.94 correlation. The result? A standardized engineering output metric that doesn’t reward vanity. Even better, you can benchmark your team’s output against peers while keeping everything private.

Although this one metric is much better than anything else out there, of course it still doesn't tell the whole story. In the future, we’ll build more metrics that go deeper into things like code quality and technical leadership. And we'll build actionable suggestions on top of all of it to help teams improve and track progress.

After testing with several startups, the feedback has been fantastic, so we’re opening it up today. Connect your GitHub and see what WorkWeave can tell you: https://app.workweave.ai/welcome.

I’ll be around all day to chat, answer questions, or take a beating. Fire away!

Comments URL: https://news.ycombinator.com/item?id=42196381

Points: 2

# Comments: 0

Categories: Hacker News

AI models to understand legacy code without code generation

Wed, 11/20/2024 - 12:42pm

I'm looking for AI-based tools that can help me understand a system's legacy components. Ideally, I would provide the source code, and the model would generate text or diagrams that explain the code's underlying business rules as much as possible.

The broad range of existing LLM tooling seems to focus specifically on the code generation part, but I'm not interested in getting generated code. I love to code, so I would like these models to help me in the boring part of understanding legacy code before the modernization.

The only relevant article I have found so far is https://martinfowler.com/articles/legacy-modernization-gen-ai.html . However, it's based on a private tool by Thoughtworks.

Comments URL: https://news.ycombinator.com/item?id=42196370

Points: 1

# Comments: 0

Categories: Hacker News

The End of an Era for You Got This

Wed, 11/20/2024 - 12:35pm
Categories: Hacker News

Show HN: Rebuild of Blossom, an open-source social robot

Wed, 11/20/2024 - 12:28pm

From the post:

Blossom is an open-source robot platform for human-robot interaction (HRI) research that I developed during my PhD. I’ve used Blossom for research in design, machine learning, and telepresence; others have made Blossoms for their own research purposes. I have continued working on “rebuilding” the entire platform: I redesigned the inner frame as a model kit, complete with Gunpla-inspired runners and instructions, and refactored the codebase as r0b0, a Python library for communicating between hardware peripherals and software applications. In preparation to present Blossom at Maker Faire Coney Island, I refined the telepresence interface and enabled conversational interaction with a language model. The new repository is available on GitHub and includes documentation for construction.

Comments URL: https://news.ycombinator.com/item?id=42196226

Points: 7

# Comments: 0

Categories: Hacker News

Show HN: Agentic Evaluators for Agentic Workflows (Starting with RAG)

Wed, 11/20/2024 - 11:57am

Hey all! Thought this group might find this interesting - new approach to evaluating RAG pipelines using 'agents as a judge'. We got excited by the findings in this paper (https://arxiv.org/abs/2410.10934), about agents producing evaluations closer to human-evaluators, especially for multi-step workflows.

Our first use case was RAG pipelines, specifically evaluating if your agent MISSED pulling any important chunks from the source document. While many RAG evaluators determine if your model USED its chunk in the output, there's no visibility on if your model grabbed all the right chunks in the first place. We thought we'd test the 'agent as judge', with a new metric called 'potential sources missed', to help evaluate if your agents are missing any important chunks from the source of truth.

Curious what you all think!

Comments URL: https://news.ycombinator.com/item?id=42195858

Points: 2

# Comments: 1

Categories: Hacker News

Pages