• anton@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    21
    ·
    edit-2
    1 day ago

    My IDE says: '(', '+', '-', '.', ';', <operator>, '[' or '}' expected, got ';'
    But the rust compiler explains

    error: unknown start of token: \u{37e}  
    help: Unicode character ';' (Greek Question Mark) looks like ';' (Semicolon), but it is not```   
    what a killjoy.
    • Owl@mander.xyz
      link
      fedilink
      arrow-up
      15
      ·
      1 day ago

      But the rust compiler explains

      If this is true then rust deserves all the praise it gets

      • billwashere@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 hours ago

        This is pretty cool. But my question is if the compiler knows it’s basically the same thing visually, why doesn’t it treat it the same way as far as syntax and just make them functionally equivalent;

      • anton@piefed.blahaj.zone
        link
        fedilink
        English
        arrow-up
        6
        ·
        1 day ago

        While the language can be hard to get used to, the error messages are mostly great.
        But sometimes you can send it on a goose chase with impossible type inference.

  • m_‮f@discuss.online
    link
    fedilink
    English
    arrow-up
    59
    ·
    2 days ago

    ; and ; respectively, in case anyone wants to see how it renders on their machine and is also lazy.

    • anton@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      4
      ·
      1 day ago

      As if a white space sensitive language protects from this fuckery.

      • How many thin spaces are one level of indentation?
      • Will anyone notice a hair space?
      • Who can tell the difference between a space and a figure space? they are the same size in a mono spaced font
  • Infernal_pizza@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    Why do characters like this even exist? I’ve run into this before where I couldn’t find a file I’d downloaded by searching for it. I remembered what folder it was in and checked it was still there, after playing around with the name for a bit I realised the “a” in the file name wasn’t actually an a.

    • Frezik@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 day ago

      Simple answer is that Unicode is a design by committee attempting to make every single human written language work. It’s more complicated than it needs to be, but we also don’t want to redo all the work it would take to replace it with something more sane. Especially KJC languages. Trying to get those three to agree on anything is for people who deal with frustration better than me.

    • Frezik@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      Since they’re fundamentally predicting the next token, and there isn’t a lot of training data out there that would actually do this, I wouldn’t expect that LLMs are going to start putting in lookalike characters. They only lookalike to humans.

      That said, you could probably poison their training datasets this way.

      • lordnikon@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        24 hours ago

        Yeah that was the idea get the llms to start using look alike characters to poison their outputs.