Hacker News
Experts dispute claims dire wolf brought back from extinction
Article URL: https://www.bbc.com/news/articles/c4g9ejy3gdvo
Comments URL: https://news.ycombinator.com/item?id=43620853
Points: 2
# Comments: 0
TinyCard Game Maker
Article URL: http://www.technoblogy.com/show?51KR
Comments URL: https://news.ycombinator.com/item?id=43620846
Points: 3
# Comments: 0
The Institution That Engineered a Culture – Kasurian
Article URL: https://kasurian.com/p/institution-engineering-culture
Comments URL: https://news.ycombinator.com/item?id=43620838
Points: 1
# Comments: 0
Trump Order Seeks to Tap Coal Power in Quest to Dominate AI
Article URL: https://www.bloomberg.com/news/articles/2025-04-08/trump-order-seeks-to-tap-coal-power-in-quest-to-dominate-ai
Comments URL: https://news.ycombinator.com/item?id=43620831
Points: 2
# Comments: 1
Show HN: FormReach – LLM form marketing automation for Japan
I developed an AI tool that automates contact form submissions for marketing. Simply select target companies from a list, and our LLM automatically fills out and submits forms for you. I created this because user acquisition was always the most time-consuming part of my previous product development and sales efforts. While currently limited to the Japanese market, I hope it can help those doing business in Japan save significant time.
FormReach features: - No initial costs; you're only charged for successful submissions - AI handles all form completions and submissions automatically - Continuously updated database of compatible contact forms
If you're interested in the Japanese market or have feedback on this approach, I'd appreciate your thoughts.
Comments URL: https://news.ycombinator.com/item?id=43620509
Points: 1
# Comments: 0
Show HN: Simpleformapp – A lightweight form and table tool for lead capture
Hi HN,
I built Simpleformapp to replace my Jotform + Airtable workflow. I was paying $58/month just to capture and manage leads — it worked, but felt like overkill.
So I made something simpler: - Create clean forms - View/manage submissions in a table - No bloat or complex pricing
It’s live and I use it daily to run my business. Would love any feedback — UX, performance, feature ideas, or thoughts on positioning.
Thanks!
Comments URL: https://news.ycombinator.com/item?id=43620505
Points: 1
# Comments: 0
Programmers I Know
Article URL: https://endler.dev/2025/best-programmers/
Comments URL: https://news.ycombinator.com/item?id=43620504
Points: 1
# Comments: 0
Show HN: I made an app to save you money on unused subscriptions
I lost $90 in the month on unused subscriptions.
So I’m building a tool to stop that. It tracks subscriptions and warns me before I get charged again.
MVP in progress. Who’s in?
Comments URL: https://news.ycombinator.com/item?id=43620501
Points: 1
# Comments: 0
Let's Fix OAuth in MCP
Article URL: https://aaronparecki.com/2025/04/03/15/oauth-for-model-context-protocol
Comments URL: https://news.ycombinator.com/item?id=43620496
Points: 1
# Comments: 0
TCP/IP over Amazon Cloudwatch Logs (2019)
Article URL: https://medium.com/clog/tcp-ip-over-amazon-cloudwatch-logs-c1cf08f2296c
Comments URL: https://news.ycombinator.com/item?id=43620493
Points: 1
# Comments: 0
Every programming language needs its killer app to succeed
Article URL: https://www.grilly.com/posts/programming-languages-reason-to-exist/
Comments URL: https://news.ycombinator.com/item?id=43620480
Points: 2
# Comments: 1
First medical X-ray taken in space
Article URL: https://news.mit.edu/2025/3-questions-lonnie-petersen-first-medical-x-ray-taken-in-space-0407
Comments URL: https://news.ycombinator.com/item?id=43620479
Points: 1
# Comments: 0
Comparing GenAI Inference Engines: TensorRT-LLM, VLLM, HF TGI, and LMDeploy
Hey everyone, I’ve been diving into the world of generative AI inference engines for quite some time at NLP Cloud, and I wanted to share some insights from a comparison I put together. I looked at four popular options—NVIDIA’s TensorRT-LLM, vLLM, Hugging Face’s Text Generation Inference (TGI), and LMDeploy—and ran some benchmarks to see how they stack up for real-world use cases. Thought this might spark some discussion here since I know a lot of you are working with LLMs or optimizing inference pipelines:
TensorRT-LLM
------------
NVIDIA’s beast for GPU-accelerated inference. Built on TensorRT, it optimizes models with layer fusion, precision tuning (FP16, INT8, even FP8), and custom CUDA kernels.
Pros: Blazing fast on NVIDIA GPUs—think sub-50ms latency for single requests on an A100 and ~700 tokens/sec at 100 concurrent users for LLaMA-3 70B Q4 (per BentoML benchmarks). Dynamic batching and tight integration with Triton Inference Server make it a throughput monster.
Cons: Setup can be complex if you’re not already in the NVIDIA ecosystem. You need to deal with model compilation, and it’s not super flexible for quick prototyping.
vLLM
----
Open-source champion for high-throughput inference. Uses PagedAttention to manage KV caches in chunks, cutting memory waste and boosting speed.
Pros: Easy to spin up (pip install, Python-friendly), and it’s flexible—runs on NVIDIA, AMD, even CPU. Throughput is solid (~600-650 tokens/sec at 100 users for LLaMA-3 70B Q4), and dynamic batching keeps it humming. Latency’s decent at 60-80ms solo.
Cons: It’s less optimized for single-request latency, so if you’re building a chatbot with one user at a time, it might not shine as much. Also, it’s still maturing—some edge cases (like exotic model architectures) might not be supported.
Hugging Face TGI
----------------
Hugging Face’s production-ready inference tool. Ties into their model hub (BERT, GPT, etc.) and uses Rust for speed, with continuous batching to keep GPUs busy.
Pros: Docker setup is quick, and it scales well. Latency’s 50-70ms, throughput matches vLLM (~600-650 tokens/sec at 100 users). Bonus: built-in output filtering for safety. Perfect if you’re already in the HF ecosystem.
Cons: Less raw speed than TensorRT-LLM, and memory can bloat with big batches. Feels a bit restrictive outside HF’s world.
LMDeploy
--------
This Toolkit from the MMRazor/MMDeploy crew, focused on fast, efficient LLM deployment. Features TurboMind (a high-performance engine) and a PyTorch fallback, with persistent batching and blocked KV caching for speed.
Pros: Decoding speed is nuts—up to 1.8x more requests/sec than vLLM on an A100. TurboMind pushes 4-bit inference 2.4x faster than FP16, hitting ~700 tokens/sec at 100 users (LLaMA-3 70B Q4). Low latency (40-60ms), easy one-command server setup, and it even handles multi-round chats efficiently by caching history.
Cons: TurboMind’s picky—doesn’t support sliding window attention (e.g., Mistral) yet. Non-NVIDIA users get stuck with the slower PyTorch engine. Still, on NVIDIA GPUs, it’s a performance beast.
What’s your experience with these tools? Any hidden issues I missed? Or are there other inference engines that should be mentioned? Would love to hear your thoughts!
Julien
Comments URL: https://news.ycombinator.com/item?id=43620472
Points: 1
# Comments: 1
Show HN: Badgeify – Add Any App to Your Mac Menu Bar
Article URL: https://badgeify.app/
Comments URL: https://news.ycombinator.com/item?id=43620471
Points: 1
# Comments: 0
Apple Plans to Source More iPhones from India as Potential Tariff Fix
Article URL: https://www.wsj.com/tech/apple-iphone-production-china-tariffs-6cc37f40
Comments URL: https://news.ycombinator.com/item?id=43620458
Points: 1
# Comments: 0
Tuesday Telescope: Does this Milky Way image remind you of Powers of 10?
Article URL: https://arstechnica.com/space/2025/04/tuesday-telescope-the-heart-of-the-galaxy-revealed-in-two-kinds-of-light/
Comments URL: https://news.ycombinator.com/item?id=43620453
Points: 1
# Comments: 0
Meta got caught gaming AI benchmarks
Article URL: https://www.theverge.com/meta/645012/meta-llama-4-maverick-benchmarks-gaming
Comments URL: https://news.ycombinator.com/item?id=43620452
Points: 2
# Comments: 0
Navy SEAL. Harvard Doctor.NASA Astronaut. Don't Tell Mom About This Overachiever
Article URL: https://www.wsj.com/lifestyle/jonny-kim-nasa-astronaut-navy-seal-harvard-doctor-nasa-astronaut-7ad0e523
Comments URL: https://news.ycombinator.com/item?id=43620444
Points: 1
# Comments: 1
Plebiscitary Override in Venezuela: Eroding Democracy Deepening Authoritarianism
Article URL: https://journals.sagepub.com/doi/10.1177/00027162241309709
Comments URL: https://news.ycombinator.com/item?id=43620441
Points: 1
# Comments: 0
Attack of the Quack-Industrial Complex – Paul Krugman
Article URL: https://paulkrugman.substack.com/p/attack-of-the-quack-industrial-complex
Comments URL: https://news.ycombinator.com/item?id=43620437
Points: 1
# Comments: 0