Hacker News
Apple Removes Another RFE/RL App at Request of Russian Regulator
Article URL: https://www.rferl.org/a/apple-removes-podcasts-independent-russia-rferl-radio-free-europe/33203321.html
Comments URL: https://news.ycombinator.com/item?id=42158775
Points: 2
# Comments: 0
Stress warps fear memories in multiple ways
Article URL: https://www.thetransmitter.org/memory/stress-warps-fear-memories-in-multiple-ways/
Comments URL: https://news.ycombinator.com/item?id=42158770
Points: 1
# Comments: 0
Insight: Borderless Europe fights brain drain as talent heads north
Article URL: https://www.reuters.com/world/europe/borderless-europe-fights-brain-drain-talent-heads-north-2024-11-14/
Comments URL: https://news.ycombinator.com/item?id=42158742
Points: 3
# Comments: 0
Gordon Welchman: The architect of 'Ultra' intelligence
Public Sector AI Playbook [pdf]
Massive Galaxies at High Redshift: we told you so
Article URL: https://tritonstation.com/2024/11/12/massive-galaxies-at-high-redshift-we-told-you-so/
Comments URL: https://news.ycombinator.com/item?id=42158712
Points: 1
# Comments: 0
Science Is in Trouble [video]
Article URL: https://www.youtube.com/watch?v=QtxjatbVb7M
Comments URL: https://news.ycombinator.com/item?id=42158704
Points: 1
# Comments: 0
How to Be a Multidisciplinary Neuroscientist
Article URL: https://www.thetransmitter.org/craft-and-careers/how-to-be-a-multidisciplinary-neuroscientist/
Comments URL: https://news.ycombinator.com/item?id=42158694
Points: 1
# Comments: 0
Half-Life 2 peaks at 50,914 concurrent players, 20 years after its release
Article URL: https://steamdb.info/app/220/charts/
Comments URL: https://news.ycombinator.com/item?id=42158686
Points: 1
# Comments: 0
HtmlRAG: HTML is Better than Plain Text
Article URL: https://arxiv.org/abs/2411.02959
Comments URL: https://news.ycombinator.com/item?id=42158650
Points: 1
# Comments: 0
Chevrolet Corvette Is the 2nd Most Dangerous Car on the Road
Memento Depot Native (See RFC 7089)
Article URL: https://mementoweb.org/depot/native/
Comments URL: https://news.ycombinator.com/item?id=42158638
Points: 1
# Comments: 0
Cloud consumption surge strains even the largest hyperscalers
Article URL: https://www.ciodive.com/news/aws-microsoft-google-cloud-capacity-constraints-market-growth/732062/
Comments URL: https://news.ycombinator.com/item?id=42158629
Points: 1
# Comments: 0
Fiery Tesla Crash Traps and Kills Four After Electric Doors Couldn't Open
Article URL: https://jalopnik.com/fiery-tesla-crash-traps-and-kills-four-after-electric-d-1851697336
Comments URL: https://news.ycombinator.com/item?id=42158628
Points: 16
# Comments: 2
CEO of Saudi Arabia's 100-Mile Skyscraper Out,Allegations of Mass Employee Death
AI-generated poetry is indistinguishable from human-written poetry
Show HN: ColiVara – State of the Art RAG API with Vision Models
we have been working on ColiVara and wanted to show it to the community. ColiVara is an api-first implementation of the ColPali paper using ColQwen2 as the LLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents.
Why should anyone working with RAG care?
ColPali makes information retrieval from visual document types - like PDFs - better. Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding built on top of ColPali.
(We are not affiliated with the ColPali team in anyway, although we are big fans of their work!)
Information retrieval from PDFs is hard because they contain various components: Text, images, tables, different headings, captions, complex layouts, etc.
For this, parsing PDFs currently requires multiple complex steps:
1. OCR
2. Layout recognition
3. Figure captioning
4. Chunking
5. Embedding
Not only are these steps complex and time-consuming, but they are also prone to error.
This is where ColPali comes into play. But what is ColPali?
ColPali combines: • Col -> the contextualized late interaction mechanism introduced in ColBERT • Pali -> with a Vision Language Model (VLM), in this case, PaliGemma
(note - both us and the ColPali team moved from PaliGemma to use Qwen models)
And how does it work?
During indexing, the complex PDF parsing steps are replaced by using "screenshots" of the PDF pages directly. These screenshots are then embedded with the VLM. At inference time, the query is embedded and matched with a late interaction mechanism to retrieve the most similar document pages.
Ok - so what exactly ColiVara does?
ColiVara is an API (with a Python SDK) that makes this whole process easy and viable for production workloads. With 1-line of code - you get a SOTA retrieval in your RAG system. We optimized how the embeddings are stored (using pgVector and halfvecs) as well as re-implemented the scoring to happen in Postgres, similar to and building on pgVector work with Cosine Similarity. All what the user have to do is:
1. Upsert a document to ColiVara to index it
2. At query time - perform a search and get the top-k pages
We support advanced filtering based on arbitrary document and collection metadata as well. So, we support re-ranking use cases and hybrid search.
State of the art?
We started this whole journey when we tried to do RAG over clinical trials and medical literature. We simply had too many failures and up to 30% of the paper was lost or malformed. This is just not our experience, in the ColPali paper - on average ColPali outperformed Unstructured + BM25 + captioning by 15+ points. ColiVara with its optimizations is is 20+ points.
We used NCDG@5 - which is similar to Recall but more demanding, as it measure not just if the right results are returned, but if they returned in the correct order.
You can see our full eval results here: https://github.com/tjmlabs/ColiVara-eval
If this sounds like something you could use, check it out on GitHub: https://github.com/tjmlabs/ColiVara
It’s fair-source with an FSL license (similar to Sentry), and we’d love to hear how you’d use it or any feedback you might have.
Additionally - our eval repo is public and we continuously run against major releases. You are welcome to run the evals independently: https://github.com/tjmlabs/ColiVara-eval
Comments URL: https://news.ycombinator.com/item?id=42158351
Points: 1
# Comments: 0
FTC Says Spam Calls Down 50% in Recent Years
Article URL: https://gizmodo.com/ftc-says-spam-calls-actually-down-50-in-recent-years-2000525120
Comments URL: https://news.ycombinator.com/item?id=42158337
Points: 1
# Comments: 0
Understanding Common Factor Attacks: An RSA-Cracking Puzzle
Article URL: http://www.loyalty.org/~schoen/rsa/
Comments URL: https://news.ycombinator.com/item?id=42158319
Points: 1
# Comments: 0