• 0 Posts
  • 10 Comments
Joined 3 months ago
cake
Cake day: August 2nd, 2024

help-circle


  • Not necessarily, CVs have complicated formatting. Nobody (should) write blocks of text, and you don’t know how many columns the candidate is using. Is the candidate using a specific section to show star based skill rating or word based? So you can still search for individual keywords but if you try copying the whole pdf and paste it in txt (which is what will be forwarded to ATS), it does not make much sense. The structure is too complicated extract where you studied, what did you studied and your grade, what other experiences you have and how long you worked there etc.

    Extracting structured data is in its own right a different field of science. There is plenty of recent research on extracting structured data from academic pdfs (I was working on this in a research institute in germany around 2022), even when LLMs are used it can get really complicated to the point that there are specialized LLMs for just that.

    But ATS systems are cheap/not high enough priority to even use OCR let alone LLMs so unfortunately the responsibility of making an easily parsable CV comes down to the candidate.

    Try this next time you see your CV, copy its text to a txt then think about if you can write a program that can reliably extract your experience, education, interests etc. Its going to be super difficult and even then it won’t generalize to thousands of other CVs.


  • I think OCRs are really good nowadays but i think old ATS systems don’t use them or at least use old OCR. If you parse a pdf (without OCR) a word exported pdf preserve the text order much better than a latex ones.

    Like i actually tried some websites and python libraries to extract the text from my latex pdf, none of them gave good results like words inside pdf would be out of order.

    If i use ocr then I get good coherent text. Which is really important for ATS but I doubt people use OCRs cuz they are kinda expensive or maybe people just use old ATS systems etc



  • just_an_average_joe@lemmy.dbzer0.comtoLinkedinLunatics@sh.itjust.worksPDFs
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    10
    ·
    28 days ago

    Actually this is good advice. Nowadays nobody reads your CV in the first step. Your CV first gets through an automated system (ATS i think its called). It’s designed to filter out as much as possible.

    The problem with PDF is that it’s terrible to parse cuz it’s designed for humans reading it, not machines. The only reliable way to parse it is by converting it to images and then OCR, which is kinda expensive.

    So before you send a PDF, you should first try to convert it to txt and see if the content make enough sense. Or just use word to make a CV then export to PDF.

    When i was looking for a job, i remember there was a website that would give you tips on your CV and they had an ATS report of your CV. I was so shocked to realize that ATS totally messed up completely to parse the correct info from my latex CV. Like I have a lot of AI/ML experience and it completely missed it and thought i had quality assurance one. And i was applying for AI jobs, no wonder I couldn’t get any interviews. Then I changed it to word and an exported pdf where word wasn’t accepted. I got many more interviews after that.