• FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    49
    arrow-down
    4
    ·
    5 months ago

    Bear in mind, though, that the technology for dealing with these things are rapidly advancing.

    I have an enormous amount of digital archives I’ve collected both from myself and from my now-deceased father. For years I just kept them stashed away. But about a year ago I downloaded the Whisper speech-to-text model from OpenAI and transcribed everything with audio into text form. I now have a Qwen3 LLM in the process of churning through all of those transcripts writing summaries of their contents and tagging them based on subject matter. I expect pretty soon I’ll have something with good enough image recognition that I can turn loose on the piles of photographs to get those sorted out by subject matter too. Eventually I’ll be able to tell my computer “give me a brief biography of Uncle Pete” and get something pretty good out of all that.

    Yeah, boo AI, hallucinations, and so forth. This project has given me first-hand experience with what they’re currently capable of and it’s quite a lot. I’d be able to do a ton more if I wasn’t restricting myself to what can run on my local GPU. Give it a few more years.

    • Dave@lemmy.nz
      link
      fedilink
      English
      arrow-up
      7
      ·
      5 months ago

      I agree. I keep loads of shot that I’m hoping one day will just be processed by an AI to pick out the stuff people might want to actually see.

      “People” includes me. I don’t delete anything (when it comes to photos, videos, etc) and just assume at some point technology will make it easy to find whatever.

    • 𝕸𝖔𝖘𝖘@infosec.pub
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 months ago

      You said you released it on your writing. How did you go about doing that? It’s a cool use case, and I’m intrigued.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        13
        arrow-down
        1
        ·
        5 months ago

        It’s a bit technical, I haven’t found any pre-packaged software to do what I’m doing yet.

        First I installed https://github.com/openai/whisper , the speech-to-text model that OpenAI released back when they were less blinded by dollar signs. I wrote a Python script that used it to go through all of the audio files in the directory tree where I’m storing this stuff and produced a transcript that I stored in a .json file alongside it.

        For the LLM, I installed https://github.com/LostRuins/koboldcpp/releases/ and used the https://huggingface.co/unsloth/Qwen3-30B-A3B-128K-GGUF model, which is just barely small enough to run smoothly on my RTX 4090. I wrote another Python script that methodically goes through those .json files that Whisper produced, takes the raw text of the transcript, and feeds it to the LLM with a couple of prompts explaining what the transcript is and what I’d like the LLM to do with it (write a summary, or write a bullet-point list of subject tags). Those get saved in the .json file too.

        Most recently I’ve been experimenting with creating an index of the transcripts using those LLM results and the Whoosh library in Python, so that I can do local searches of the transcripts based on topics. I’m building towards writing up something where I can literally tell it “Tell me about Uncle Pete” and it’ll first search for the relevant transcripts and then feed those into the LLM with a prompt to extract the relevant information from them.

        If you don’t find the idea of writing scripts for that sort of thing literally fun (like me) then you may need to wait a bit for someone more capable and more focused than I am to create a user-friendly application to do all this. In the meantime, though, hoard that data. Storage is cheap.

        • 𝕸𝖔𝖘𝖘@infosec.pub
          link
          fedilink
          English
          arrow-up
          4
          ·
          5 months ago

          That’s awesome! Thank you!

          If you don’t find the idea of writing scripts for that sort of thing literally fun…

          I absolutely do. What I find as a potential showstopper for me right now, is that I don’t have a nonintegrated GPU, which makes complex LLMs hard to run. Basically, if I can’t push the processing to CPU, I’m looking at around 2-5 seconds per token; it’s rough. But I like your workflow a lot, and I’m going to try to get something similar going with my incredibly old hardware, and see if CPU-only processing of this would be something feasible (though, I’m not super hopeful there).

          And, yes, I, too, am aware of the hallucinations and such that come from the technology. But, honestly, for this non-critical use case, I don’t really care.

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            5
            arrow-down
            1
            ·
            5 months ago

            I only just recently discovered that my installation of Whisper was completely unaware that I had a GPU, and was running entirely on my CPU. So even if you can’t get a good LLM running locally you might still be able to get everything turned into text transcripts for eventual future processing. :)

        • notfromhere@lemmy.ml
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 months ago

          It sounds like something similar to RAG (retrieval augmented generation) or a database lookup. Are you storing the transcripts in a SQL like database or noSQL db or doing semantic similarity on any of it?

          I was thinking of a similar project and building a knowledge graph for each person.

      • Plebcouncilman@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        edit-2
        5 months ago

        If you’re interested in “chatting” with your writing there’s a couple of out of the box solutions right now, like Kortex or Reflect Notes. They are AI first note taking apps. I don’t use them out of privacy concerns but if you don’t care that much they might allow you to do what you want. They claim to be E2E encrypted and the AI unable to phone home but these are companies that sprung out of nowhere so I don’t trust they necessarily have done all their homework to actually provide full privacy.

        Alternatively there’s an Obsidian plugin that I believe allows you to do such a thing as well with local LLMs if you wanted to which is the privacy first way to this. I’ve just moved to Obsidian from Capacities so I have yet to try it out as I’m still setting up my vault.