Hacker News

First medical X-ray taken in space

Hacker News - Tue, 04/08/2025 - 7:33am

Article URL: https://news.mit.edu/2025/3-questions-lonnie-petersen-first-medical-x-ray-taken-in-space-0407

Comments URL: https://news.ycombinator.com/item?id=43620479

Points: 1

# Comments: 0

Categories: Hacker News

Comparing GenAI Inference Engines: TensorRT-LLM, VLLM, HF TGI, and LMDeploy

Hacker News - Tue, 04/08/2025 - 7:32am

Hey everyone, I’ve been diving into the world of generative AI inference engines for quite some time at NLP Cloud, and I wanted to share some insights from a comparison I put together. I looked at four popular options—NVIDIA’s TensorRT-LLM, vLLM, Hugging Face’s Text Generation Inference (TGI), and LMDeploy—and ran some benchmarks to see how they stack up for real-world use cases. Thought this might spark some discussion here since I know a lot of you are working with LLMs or optimizing inference pipelines:

TensorRT-LLM

------------

NVIDIA’s beast for GPU-accelerated inference. Built on TensorRT, it optimizes models with layer fusion, precision tuning (FP16, INT8, even FP8), and custom CUDA kernels.

Pros: Blazing fast on NVIDIA GPUs—think sub-50ms latency for single requests on an A100 and ~700 tokens/sec at 100 concurrent users for LLaMA-3 70B Q4 (per BentoML benchmarks). Dynamic batching and tight integration with Triton Inference Server make it a throughput monster.

Cons: Setup can be complex if you’re not already in the NVIDIA ecosystem. You need to deal with model compilation, and it’s not super flexible for quick prototyping.

vLLM

----

Open-source champion for high-throughput inference. Uses PagedAttention to manage KV caches in chunks, cutting memory waste and boosting speed.

Pros: Easy to spin up (pip install, Python-friendly), and it’s flexible—runs on NVIDIA, AMD, even CPU. Throughput is solid (~600-650 tokens/sec at 100 users for LLaMA-3 70B Q4), and dynamic batching keeps it humming. Latency’s decent at 60-80ms solo.

Cons: It’s less optimized for single-request latency, so if you’re building a chatbot with one user at a time, it might not shine as much. Also, it’s still maturing—some edge cases (like exotic model architectures) might not be supported.

Hugging Face TGI

----------------

Hugging Face’s production-ready inference tool. Ties into their model hub (BERT, GPT, etc.) and uses Rust for speed, with continuous batching to keep GPUs busy.

Pros: Docker setup is quick, and it scales well. Latency’s 50-70ms, throughput matches vLLM (~600-650 tokens/sec at 100 users). Bonus: built-in output filtering for safety. Perfect if you’re already in the HF ecosystem.

Cons: Less raw speed than TensorRT-LLM, and memory can bloat with big batches. Feels a bit restrictive outside HF’s world.

LMDeploy

--------

This Toolkit from the MMRazor/MMDeploy crew, focused on fast, efficient LLM deployment. Features TurboMind (a high-performance engine) and a PyTorch fallback, with persistent batching and blocked KV caching for speed.

Pros: Decoding speed is nuts—up to 1.8x more requests/sec than vLLM on an A100. TurboMind pushes 4-bit inference 2.4x faster than FP16, hitting ~700 tokens/sec at 100 users (LLaMA-3 70B Q4). Low latency (40-60ms), easy one-command server setup, and it even handles multi-round chats efficiently by caching history.

Cons: TurboMind’s picky—doesn’t support sliding window attention (e.g., Mistral) yet. Non-NVIDIA users get stuck with the slower PyTorch engine. Still, on NVIDIA GPUs, it’s a performance beast.

What’s your experience with these tools? Any hidden issues I missed? Or are there other inference engines that should be mentioned? Would love to hear your thoughts!

Julien

Comments URL: https://news.ycombinator.com/item?id=43620472

Points: 1

# Comments: 1

Categories: Hacker News

Show HN: Badgeify – Add Any App to Your Mac Menu Bar

Hacker News - Tue, 04/08/2025 - 7:32am

Article URL: https://badgeify.app/

Comments URL: https://news.ycombinator.com/item?id=43620471

Points: 1

# Comments: 0

Categories: Hacker News

Apple Plans to Source More iPhones from India as Potential Tariff Fix

Hacker News - Tue, 04/08/2025 - 7:29am

Article URL: https://www.wsj.com/tech/apple-iphone-production-china-tariffs-6cc37f40

Comments URL: https://news.ycombinator.com/item?id=43620458

Points: 1

# Comments: 0

Categories: Hacker News

Tuesday Telescope: Does this Milky Way image remind you of Powers of 10?

Hacker News - Tue, 04/08/2025 - 7:29am

Article URL: https://arstechnica.com/space/2025/04/tuesday-telescope-the-heart-of-the-galaxy-revealed-in-two-kinds-of-light/

Comments URL: https://news.ycombinator.com/item?id=43620453

Points: 1

# Comments: 0

Categories: Hacker News

Meta got caught gaming AI benchmarks

Hacker News - Tue, 04/08/2025 - 7:29am

Article URL: https://www.theverge.com/meta/645012/meta-llama-4-maverick-benchmarks-gaming

Comments URL: https://news.ycombinator.com/item?id=43620452

Points: 2

# Comments: 0

Categories: Hacker News

Navy SEAL. Harvard Doctor.NASA Astronaut. Don't Tell Mom About This Overachiever

Hacker News - Tue, 04/08/2025 - 7:28am

Article URL: https://www.wsj.com/lifestyle/jonny-kim-nasa-astronaut-navy-seal-harvard-doctor-nasa-astronaut-7ad0e523

Comments URL: https://news.ycombinator.com/item?id=43620444

Points: 1

# Comments: 1

Categories: Hacker News

Plebiscitary Override in Venezuela: Eroding Democracy Deepening Authoritarianism

Hacker News - Tue, 04/08/2025 - 7:27am

Article URL: https://journals.sagepub.com/doi/10.1177/00027162241309709

Comments URL: https://news.ycombinator.com/item?id=43620441

Points: 1

# Comments: 0

Categories: Hacker News

Attack of the Quack-Industrial Complex – Paul Krugman

Hacker News - Tue, 04/08/2025 - 7:27am

Article URL: https://paulkrugman.substack.com/p/attack-of-the-quack-industrial-complex

Comments URL: https://news.ycombinator.com/item?id=43620437

Points: 1

# Comments: 0

Categories: Hacker News

Bug crowd for small startups and vibe coders?

Hacker News - Tue, 04/08/2025 - 7:27am

Article URL: https://picklock.47labs.io/

Comments URL: https://news.ycombinator.com/item?id=43620434

Points: 1

# Comments: 1

Categories: Hacker News

Why the Ultrarich Are Unplugging from "Smart Homes"

Hacker News - Tue, 04/08/2025 - 7:25am

Article URL: https://www.hollywoodreporter.com/lifestyle/real-estate/tech-free-homes-luxury-trend-1236177909/

Comments URL: https://news.ycombinator.com/item?id=43620421

Points: 1

# Comments: 1

Categories: Hacker News

FreeDOS 1.4 Released

Hacker News - Tue, 04/08/2025 - 7:24am

Article URL: https://freedos.org/download/announce.html

Comments URL: https://news.ycombinator.com/item?id=43620415

Points: 1

# Comments: 0

Categories: Hacker News

What if we taxed advertising?

Hacker News - Tue, 04/08/2025 - 7:23am

Article URL: https://matthewsinclair.com/blog/0177-what-if-we-taxed-advertising

Comments URL: https://news.ycombinator.com/item?id=43620407

Points: 1

# Comments: 1

Categories: Hacker News

UK Home Office loses attempt to keep legal battle with Apple secret

Hacker News - Tue, 04/08/2025 - 6:37am

Article URL: https://www.theguardian.com/politics/2025/apr/07/uk-home-office-loses-attempt-to-keep-legal-battle-with-apple-secret

Comments URL: https://news.ycombinator.com/item?id=43620154

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Perry Lage is dead. An AI short story

Hacker News - Tue, 04/08/2025 - 6:36am

Article URL: https://show.franzai.com/a/tiny-queen-zebu

Comments URL: https://news.ycombinator.com/item?id=43620144

Points: 1

# Comments: 0

Categories: Hacker News

Tailscale has raised $160M

Hacker News - Tue, 04/08/2025 - 6:36am

Article URL: https://tailscale.com/blog/series-c

Comments URL: https://news.ycombinator.com/item?id=43620141

Points: 1

# Comments: 0

Categories: Hacker News

Go library for generating Anki decks

Hacker News - Tue, 04/08/2025 - 6:34am

Article URL: https://github.com/npcnixel/genanki-go

Comments URL: https://news.ycombinator.com/item?id=43620137

Points: 2

# Comments: 0

Categories: Hacker News

One-Time Programs (2022)

Hacker News - Tue, 04/08/2025 - 6:34am

Article URL: https://blog.cryptographyengineering.com/2022/10/27/one-time-programs/

Comments URL: https://news.ycombinator.com/item?id=43620132

Points: 1

# Comments: 0

Categories: Hacker News

LLM-hacker-news: LLM plugin for pulling content from Hacker News

Hacker News - Tue, 04/08/2025 - 6:32am

Article URL: https://github.com/simonw/llm-hacker-news

Comments URL: https://news.ycombinator.com/item?id=43620125

Points: 2

# Comments: 0

Categories: Hacker News

Where have all the good bloggers gone?

Hacker News - Tue, 04/08/2025 - 6:32am

Article URL: https://old.reddit.com/r/slatestarcodex/comments/1js9nfv/where_have_all_the_good_bloggers_gone/

Comments URL: https://news.ycombinator.com/item?id=43620123

Points: 1

# Comments: 1

Categories: Hacker News

Biographical Information Summary - This is Just a Summary Joe Pearce
About Joe Pearce joeintenn
Links Joe Pearce
Flounder's Keylime Pie is the Best in the World, At Least I Think So... Joe Pearce
Harley Ride Joe Pearce
Cobra with New Cover Joe Pearce
Mustang Cobra After Ceramic Coating Joe Pearce
Carter County Cruise In Joe Pearce
2003 Ford Mustang SVT Cobra Convertible NAPA Auto Car Show Top 10 Joe Pearce
Ponies in the Smokies - Mustang Trophy Joe Pearce

Hacker News

First medical X-ray taken in space

Comparing GenAI Inference Engines: TensorRT-LLM, VLLM, HF TGI, and LMDeploy

Show HN: Badgeify – Add Any App to Your Mac Menu Bar

Apple Plans to Source More iPhones from India as Potential Tariff Fix

Tuesday Telescope: Does this Milky Way image remind you of Powers of 10?

Meta got caught gaming AI benchmarks

Navy SEAL. Harvard Doctor.NASA Astronaut. Don't Tell Mom About This Overachiever

Plebiscitary Override in Venezuela: Eroding Democracy Deepening Authoritarianism

Attack of the Quack-Industrial Complex – Paul Krugman

Bug crowd for small startups and vibe coders?

Why the Ultrarich Are Unplugging from "Smart Homes"

FreeDOS 1.4 Released

What if we taxed advertising?

UK Home Office loses attempt to keep legal battle with Apple secret

Show HN: Perry Lage is dead. An AI short story

Tailscale has raised $160M

Go library for generating Anki decks

One-Time Programs (2022)

LLM-hacker-news: LLM plugin for pulling content from Hacker News

Where have all the good bloggers gone?

Pages

Welcome to Joe Pearce's Home Page.

Web page offered by Joe Pearce © 2004 - 2025 - All rights reserved.

Thanks to the ETSU Computer and Information Sciences Department.

Thanks to the NSTCC Computer and Information Sciences and Computer Engineering Technologies Department.

This is my Favicon.

You are here

Hacker News

Pages

Welcome to Joe Pearce's Home Page.

Web page offered by Joe Pearce © 2004 - 2025 - All rights reserved.

Thanks to the ETSU Computer and Information Sciences Department.

Thanks to the NSTCC Computer and Information Sciences and Computer Engineering Technologies Department.

This is my Favicon.