Hacker News

TinyCard Game Maker

Hacker News - Tue, 04/08/2025 - 8:20am

Article URL: http://www.technoblogy.com/show?51KR

Comments URL: https://news.ycombinator.com/item?id=43620846

Points: 3

# Comments: 0

Categories: Hacker News

Show HN: FormReach – LLM form marketing automation for Japan

Hacker News - Tue, 04/08/2025 - 7:38am

I developed an AI tool that automates contact form submissions for marketing. Simply select target companies from a list, and our LLM automatically fills out and submits forms for you. I created this because user acquisition was always the most time-consuming part of my previous product development and sales efforts. While currently limited to the Japanese market, I hope it can help those doing business in Japan save significant time.

FormReach features: - No initial costs; you're only charged for successful submissions - AI handles all form completions and submissions automatically - Continuously updated database of compatible contact forms

If you're interested in the Japanese market or have feedback on this approach, I'd appreciate your thoughts.

Comments URL: https://news.ycombinator.com/item?id=43620509

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Simpleformapp – A lightweight form and table tool for lead capture

Hacker News - Tue, 04/08/2025 - 7:37am

Hi HN,

I built Simpleformapp to replace my Jotform + Airtable workflow. I was paying $58/month just to capture and manage leads — it worked, but felt like overkill.

So I made something simpler: - Create clean forms - View/manage submissions in a table - No bloat or complex pricing

It’s live and I use it daily to run my business. Would love any feedback — UX, performance, feature ideas, or thoughts on positioning.

Thanks!

Comments URL: https://news.ycombinator.com/item?id=43620505

Points: 1

# Comments: 0

Categories: Hacker News

Programmers I Know

Hacker News - Tue, 04/08/2025 - 7:36am
Categories: Hacker News

Show HN: I made an app to save you money on unused subscriptions

Hacker News - Tue, 04/08/2025 - 7:36am

I lost $90 in the month on unused subscriptions.

So I’m building a tool to stop that. It tracks subscriptions and warns me before I get charged again.

MVP in progress. Who’s in?

Comments URL: https://news.ycombinator.com/item?id=43620501

Points: 1

# Comments: 0

Categories: Hacker News

Comparing GenAI Inference Engines: TensorRT-LLM, VLLM, HF TGI, and LMDeploy

Hacker News - Tue, 04/08/2025 - 7:32am

Hey everyone, I’ve been diving into the world of generative AI inference engines for quite some time at NLP Cloud, and I wanted to share some insights from a comparison I put together. I looked at four popular options—NVIDIA’s TensorRT-LLM, vLLM, Hugging Face’s Text Generation Inference (TGI), and LMDeploy—and ran some benchmarks to see how they stack up for real-world use cases. Thought this might spark some discussion here since I know a lot of you are working with LLMs or optimizing inference pipelines:

TensorRT-LLM

------------

NVIDIA’s beast for GPU-accelerated inference. Built on TensorRT, it optimizes models with layer fusion, precision tuning (FP16, INT8, even FP8), and custom CUDA kernels.

Pros: Blazing fast on NVIDIA GPUs—think sub-50ms latency for single requests on an A100 and ~700 tokens/sec at 100 concurrent users for LLaMA-3 70B Q4 (per BentoML benchmarks). Dynamic batching and tight integration with Triton Inference Server make it a throughput monster.

Cons: Setup can be complex if you’re not already in the NVIDIA ecosystem. You need to deal with model compilation, and it’s not super flexible for quick prototyping.

vLLM

----

Open-source champion for high-throughput inference. Uses PagedAttention to manage KV caches in chunks, cutting memory waste and boosting speed.

Pros: Easy to spin up (pip install, Python-friendly), and it’s flexible—runs on NVIDIA, AMD, even CPU. Throughput is solid (~600-650 tokens/sec at 100 users for LLaMA-3 70B Q4), and dynamic batching keeps it humming. Latency’s decent at 60-80ms solo.

Cons: It’s less optimized for single-request latency, so if you’re building a chatbot with one user at a time, it might not shine as much. Also, it’s still maturing—some edge cases (like exotic model architectures) might not be supported.

Hugging Face TGI

----------------

Hugging Face’s production-ready inference tool. Ties into their model hub (BERT, GPT, etc.) and uses Rust for speed, with continuous batching to keep GPUs busy.

Pros: Docker setup is quick, and it scales well. Latency’s 50-70ms, throughput matches vLLM (~600-650 tokens/sec at 100 users). Bonus: built-in output filtering for safety. Perfect if you’re already in the HF ecosystem.

Cons: Less raw speed than TensorRT-LLM, and memory can bloat with big batches. Feels a bit restrictive outside HF’s world.

LMDeploy

--------

This Toolkit from the MMRazor/MMDeploy crew, focused on fast, efficient LLM deployment. Features TurboMind (a high-performance engine) and a PyTorch fallback, with persistent batching and blocked KV caching for speed.

Pros: Decoding speed is nuts—up to 1.8x more requests/sec than vLLM on an A100. TurboMind pushes 4-bit inference 2.4x faster than FP16, hitting ~700 tokens/sec at 100 users (LLaMA-3 70B Q4). Low latency (40-60ms), easy one-command server setup, and it even handles multi-round chats efficiently by caching history.

Cons: TurboMind’s picky—doesn’t support sliding window attention (e.g., Mistral) yet. Non-NVIDIA users get stuck with the slower PyTorch engine. Still, on NVIDIA GPUs, it’s a performance beast.

What’s your experience with these tools? Any hidden issues I missed? Or are there other inference engines that should be mentioned? Would love to hear your thoughts!

Julien

Comments URL: https://news.ycombinator.com/item?id=43620472

Points: 1

# Comments: 1

Categories: Hacker News

Show HN: Badgeify – Add Any App to Your Mac Menu Bar

Hacker News - Tue, 04/08/2025 - 7:32am

Article URL: https://badgeify.app/

Comments URL: https://news.ycombinator.com/item?id=43620471

Points: 1

# Comments: 0

Categories: Hacker News

Pages