Alex-Programs 7 hours ago

This is neat. I notice that you do similar reverse lemmatisation as my Wiktionary API wrapper: https://dictionary.nuenki.app/get_definition?language=Finnis...

Feel free to use it if you like, it's practically free, and open source if you'd like to host it yourself: https://github.com/Alex-Programs/nuenki-dictionary

Looking at your code, you're processing the actual word in a Finnish-specific way. If you'd like to generalise it, you can also just see if the definitions contains `x of y`, where y is a link, and automatically dereference to `y`.

  • hiAndrewQuinn 6 hours ago

    Ha, what an honor to finally be noticed by Nuenki guy. Big fan of the project.

    I did consider doing something Wiktionary-centric, and in fact have a Wiktionary JSONL scrape lying around courtesy of https://kaikki.org/ (from the same guy who started OpenSSH!). `tsk` does something similar to your defereferencing when it hits a "go deeper" phrase.

    I decided against that approach in favor of the libvoikko spell checker because Finnish lies in this interesting zone of being an agglutinative language with a really, really regularized orthography. People love their neologisms here, and unfortunately most of them aren't catalogued in Wiktionary quite yet. I've found the mechanistic approach covers a lot of those edge cases well.

    Take the word junttihenkiseni - the root form is junttihenkinen, but as of 05/06/2025 https://en.wiktionary.org/wiki/junttihenkinen does not actually exist. So `tsk` will have no data for it, but `finstem` with its mechanical approach works just fine. The word was originally coined to refer to Finland's unique spin on rock music in the 1970s and 80s.

    On a broader level, if I can avoid hitting the network with small personal projects like this, I do try to. For example, `tsk` comes bundled every Finnish word with an English dictionary entry from Wiktionary, in a ~25 MB JSONL embed, and that allows us to build the randomly pruning trie that lets us get instantaneous prefix search across such a large space of things. I have met a lot of people who want to move to Finland from places where Internet is a sparse and valuable commodity, and I think their lives are much improved by having a tool they can just download one time and then use any place their laptop can be powered on.

MrGuts 8 hours ago

I think everyone in the world knows, at least: "kolme, kaksi, yksi, tuli!"