Actually this is good advice. Nowadays nobody reads your CV in the first step. Your CV first gets through an automated system (ATS i think its called). It’s designed to filter out as much as possible.
The problem with PDF is that it’s terrible to parse cuz it’s designed for humans reading it, not machines. The only reliable way to parse it is by converting it to images and then OCR, which is kinda expensive.
So before you send a PDF, you should first try to convert it to txt and see if the content make enough sense. Or just use word to make a CV then export to PDF.
When i was looking for a job, i remember there was a website that would give you tips on your CV and they had an ATS report of your CV. I was so shocked to realize that ATS totally messed up completely to parse the correct info from my latex CV. Like I have a lot of AI/ML experience and it completely missed it and thought i had quality assurance one. And i was applying for AI jobs, no wonder I couldn’t get any interviews. Then I changed it to word and an exported pdf where word wasn’t accepted. I got many more interviews after that.
I have gotten some response in the past that some people see europass as somewhat being lazy which is why I moved to latex. Also my CV got a bit too long with europass (2-3 pages I think).
I think OCRs are really good nowadays but i think old ATS systems don’t use them or at least use old OCR. If you parse a pdf (without OCR) a word exported pdf preserve the text order much better than a latex ones.
Like i actually tried some websites and python libraries to extract the text from my latex pdf, none of them gave good results like words inside pdf would be out of order.
If i use ocr then I get good coherent text. Which is really important for ATS but I doubt people use OCRs cuz they are kinda expensive or maybe people just use old ATS systems etc
You can extract text from PDFs without using OCR, they aren’t all images embedded in a file.
I’m sure you’ve opened PDF documents before and selected text in it, or searched for something. That works because the text is embedded in the document, I’m sure.
You can also create PDF documents with the text converted as images, but those are usually larger in size.
Not necessarily, CVs have complicated formatting. Nobody (should) write blocks of text, and you don’t know how many columns the candidate is using. Is the candidate using a specific section to show star based skill rating or word based? So you can still search for individual keywords but if you try copying the whole pdf and paste it in txt (which is what will be forwarded to ATS), it does not make much sense. The structure is too complicated extract where you studied, what did you studied and your grade, what other experiences you have and how long you worked there etc.
Extracting structured data is in its own right a different field of science. There is plenty of recent research on extracting structured data from academic pdfs (I was working on this in a research institute in germany around 2022), even when LLMs are used it can get really complicated to the point that there are specialized LLMs for just that.
But ATS systems are cheap/not high enough priority to even use OCR let alone LLMs so unfortunately the responsibility of making an easily parsable CV comes down to the candidate.
Try this next time you see your CV, copy its text to a txt then think about if you can write a program that can reliably extract your experience, education, interests etc. Its going to be super difficult and even then it won’t generalize to thousands of other CVs.
All those “problems” apply to Word too. Maybe you use tables, maybe you use lists, maybe you use stars, maybe … So there’s no advantage in forcing people to use Word “because the machine can understand it better”. Because that’s a lie.
Actually this is good advice. Nowadays nobody reads your CV in the first step. Your CV first gets through an automated system (ATS i think its called). It’s designed to filter out as much as possible.
The problem with PDF is that it’s terrible to parse cuz it’s designed for humans reading it, not machines. The only reliable way to parse it is by converting it to images and then OCR, which is kinda expensive.
So before you send a PDF, you should first try to convert it to txt and see if the content make enough sense. Or just use word to make a CV then export to PDF.
When i was looking for a job, i remember there was a website that would give you tips on your CV and they had an ATS report of your CV. I was so shocked to realize that ATS totally messed up completely to parse the correct info from my latex CV. Like I have a lot of AI/ML experience and it completely missed it and thought i had quality assurance one. And i was applying for AI jobs, no wonder I couldn’t get any interviews. Then I changed it to word and an exported pdf where word wasn’t accepted. I got many more interviews after that.
For my most recent application I submitted an Europass resume. It embeds an xml with the pdf, making it machine readable.
Whether or not the ATS can read it, I don’t know.
I have gotten some response in the past that some people see europass as somewhat being lazy which is why I moved to latex. Also my CV got a bit too long with europass (2-3 pages I think).
I’ve never heard that. I want my CV to be a representation of what I can do, not how much time I spent making what I can do look good.
My resume was about 4 pages with Europass, but in the end the cover letter did the heavy lifting.
Was it that the PDF produced by latex was less OCR friendly than the word one, or just that you didn’t submit the PDF at all most of the time?
I guess if you trained a program to OCR PDFs that are produced by word it might get really good at that and less good at PDFs from other sources.
I’m curious if your CV font was computer modern?
I think OCRs are really good nowadays but i think old ATS systems don’t use them or at least use old OCR. If you parse a pdf (without OCR) a word exported pdf preserve the text order much better than a latex ones.
Like i actually tried some websites and python libraries to extract the text from my latex pdf, none of them gave good results like words inside pdf would be out of order.
If i use ocr then I get good coherent text. Which is really important for ATS but I doubt people use OCRs cuz they are kinda expensive or maybe people just use old ATS systems etc
You can extract text from PDFs without using OCR, they aren’t all images embedded in a file.
I’m sure you’ve opened PDF documents before and selected text in it, or searched for something. That works because the text is embedded in the document, I’m sure.
You can also create PDF documents with the text converted as images, but those are usually larger in size.
Not necessarily, CVs have complicated formatting. Nobody (should) write blocks of text, and you don’t know how many columns the candidate is using. Is the candidate using a specific section to show star based skill rating or word based? So you can still search for individual keywords but if you try copying the whole pdf and paste it in txt (which is what will be forwarded to ATS), it does not make much sense. The structure is too complicated extract where you studied, what did you studied and your grade, what other experiences you have and how long you worked there etc.
Extracting structured data is in its own right a different field of science. There is plenty of recent research on extracting structured data from academic pdfs (I was working on this in a research institute in germany around 2022), even when LLMs are used it can get really complicated to the point that there are specialized LLMs for just that.
But ATS systems are cheap/not high enough priority to even use OCR let alone LLMs so unfortunately the responsibility of making an easily parsable CV comes down to the candidate.
Try this next time you see your CV, copy its text to a txt then think about if you can write a program that can reliably extract your experience, education, interests etc. Its going to be super difficult and even then it won’t generalize to thousands of other CVs.
All those “problems” apply to Word too. Maybe you use tables, maybe you use lists, maybe you use stars, maybe … So there’s no advantage in forcing people to use Word “because the machine can understand it better”. Because that’s a lie.
Exactly what I was about to reply. Try copying a crazy multi-column Word document into text, and you’ll get similar results.
Copy-pasting parts of your PDF document is not any more difficult than doing the same thing for a Word document.