Hacker News
Kalash People
Article URL: https://en.wikipedia.org/wiki/Kalash_people
Comments URL: https://news.ycombinator.com/item?id=41544984
Points: 1
# Comments: 0
One in Ten
Article URL: https://eftegarie.com/one-in-ten/
Comments URL: https://news.ycombinator.com/item?id=41544976
Points: 1
# Comments: 0
The UAE got the US to bless its purchase of cutting-edge Nvidia chips
Article URL: https://www.semafor.com/article/09/13/2024/how-the-uae-got-the-us-to-bless-its-ai-ambitions
Comments URL: https://news.ycombinator.com/item?id=41544973
Points: 1
# Comments: 0
Modern Data Analytics in Excel
Article URL: https://www.oreilly.com/library/view/modern-data-analytics/9781098148812/
Comments URL: https://news.ycombinator.com/item?id=41544970
Points: 1
# Comments: 0
Show HN: Wordllama – Things you can do with the token embeddings of an LLM
After working with LLMs for long enough, I found myself wanting a lightweight utility for doing various small tasks to prepare inputs, locate information and create evaluators. This library is two things: a very simple model and utilities that inference it (eg. fuzzy deduplication). The target platform is CPU, and it’s intended to be light, fast and pip installable — a library that lowers the barrier to working with strings semantically. You don’t need to install pytorch to use it, or any deep learning runtimes.
How can this be accomplished? The model is simply token embeddings that are average pooled. To create this model, I extracted token embedding (nn.Embedding) vectors from LLMs, concatenated them along the embedding dimension, added a learnable weight parameter, and projected them to a smaller dimension. Using the sentence transformers framework and datasets, I trained the pooled embedding with multiple negatives ranking loss and matryoshka representation learning so they can be truncated. After training, the weights and projections are no longer needed, because there is no contextual calculations. I inference the entire token vocabulary and save the new token embeddings to be loaded to numpy.
While the results are not impressive compared to transformer models, they perform well on MTEB benchmarks compared to word embedding models (which they are most similar to), while being much smaller in size (smallest model, 32k vocab, 64-dim is only 4MB).
On the utility side, I’ve been adding some tools that I think it’ll be useful for. In addition to general embedding, there’s algorithms for ranking, filtering, clustering, deduplicating and similarity. Some of them have a cython implementation, and I’m continuing to work on benchmarking them and improving them as I have time. In addition to “standard” models that use cosine similarity for some algorithms, there are binarized models that use hamming distance. This is a slightly faster, similarity algorithm, with significantly less memory per embedding (float32 -> 1 bit).
Hope you enjoy it, and find it useful. PS I haven’t figured out Windows builds yet, but Linux and Mac are supported.
Comments URL: https://news.ycombinator.com/item?id=41544969
Points: 1
# Comments: 0
New evidence upends contentious Easter Island theory, scientists say
Article URL: https://www.cnn.com/2024/09/14/science/ancient-dna-easter-island-collapse/index.html
Comments URL: https://news.ycombinator.com/item?id=41544959
Points: 1
# Comments: 0
Peaceful Border Negotiation
Article URL: https://benjaminhollon.com/musings/peaceful-border-negotiation/
Comments URL: https://news.ycombinator.com/item?id=41544939
Points: 1
# Comments: 0
What Michael Pollan Learned from Quitting Caffeine [video] for 3 Months
Article URL: https://www.youtube.com/watch?v=mAPG18zNtXk
Comments URL: https://news.ycombinator.com/item?id=41544912
Points: 1
# Comments: 0
Ask HN: Recommendation for a tool for history research (gathering)?
Hello,
I've been trying to find a tool to help me gathering and organizing historical and biographical data.
I like doing some research on specific interests, but as an amateur I do most of it on text files, sheets and mind maps in a chronological directory structure with images and documents, etc. it is horrendous, but I did not find a tool to help me managing this. Is there something for this use case out there?
I believe something like a mind map with chronological information/visualisation, cross-linking, attachments and references would work, but I'm kinda struggling to find the right tool (looking at TreeSheet and another mind-map tool right now, but it is not feeling right yet).
Thanks
Comments URL: https://news.ycombinator.com/item?id=41544909
Points: 1
# Comments: 1
Growing the Graveyard of "Better Spreadsheets"
Article URL: https://taylor.town/better-spreadsheets
Comments URL: https://news.ycombinator.com/item?id=41544891
Points: 1
# Comments: 0
Bay Area teacher saving half my income – why the doubt? (2019)
Article URL: https://www.bogleheads.org/forum/viewtopic.php?t=290459
Comments URL: https://news.ycombinator.com/item?id=41544888
Points: 1
# Comments: 0
What's Next? The Future with Bill Gates [video]
Article URL: https://www.youtube.com/watch?v=6xxhYr4gbQE
Comments URL: https://news.ycombinator.com/item?id=41544886
Points: 1
# Comments: 0
Apple AirPods 2nd Generation Features
Article URL: https://northamericanpulse.blogspot.com/2024/09/apple-airpods-2nd-generation-features.html
Comments URL: https://news.ycombinator.com/item?id=41544878
Points: 1
# Comments: 1
Google Pixel 9 Pro Fold has serious durability issues [video]
Article URL: https://www.youtube.com/watch?v=NJK_sLBJvsw
Comments URL: https://news.ycombinator.com/item?id=41544877
Points: 1
# Comments: 0
Big Data to Small Data: AI Efficiency's Next Frontier
Article URL: https://worldwirz.blogspot.com/2024/09/from-big-data-to-small-data-next.html
Comments URL: https://news.ycombinator.com/item?id=41544855
Points: 2
# Comments: 1
Spots Around Hyrule (TotK)
Article URL: https://totk-loci.mataroa.blog
Comments URL: https://news.ycombinator.com/item?id=41544851
Points: 1
# Comments: 0
New Simulations Suggest Planet Nine Might Not Be a Planet at All
Article URL: https://www.sciencealert.com/new-simulations-suggest-planet-nine-might-not-be-a-planet-at-all
Comments URL: https://news.ycombinator.com/item?id=41544826
Points: 1
# Comments: 0
Monk Mode
Article URL: https://bitfieldconsulting.com/books/monk-mode
Comments URL: https://news.ycombinator.com/item?id=41544813
Points: 2
# Comments: 0
There are now more electric cars than gas cars on Norway's roads
Article URL: https://electrek.co/2024/09/14/there-are-now-more-electric-cars-than-gas-cars-on-norways-roads/
Comments URL: https://news.ycombinator.com/item?id=41544804
Points: 1
# Comments: 0
OpenAI Discriminates According to ChatGPT
OpenAI has made the unfortunate choice to not allow pre-paid cards for pre-paid credits, despite numerous other online providers happily allowing such cards. Even for post payment on cloud services. I'll let ChatGPT explain why that's discriminatory -- and a breach of OpenAI's ideals.
I think that it's important for organizations like OpenAI not just to talk about ideals, but to practice them.
https://chatgpt.com/share/66e64823-3eb0-8000-ad2e-92a2c33bf5d1
Comments URL: https://news.ycombinator.com/item?id=41544795
Points: 1
# Comments: 1