Is there a quick and effective way to detect PDF files without using any toolkits?

Modified on Tue, Oct 29, 2024 at 8:35 PM

I am working a project in .NET Core. And, I want to be able to determine if the content in my Stream object is actually a PDF file. Is there a simple way to tell if the in-stream content is PDF?


Yes, there is. 


According to the Adobe PDF specification, the first five bytes of a PDF file typically start with the sequence "%PDF-". This header provides a quick way to check if a file is likely in PDF format. By reading the initial five bytes from the stream and verifying if they match "%PDF-", you can identify PDF content across platforms without needing extra libraries.


Here’s an example in C#:


using System;
using System.IO;
using System.Text;
public bool IsPdf(Stream stream)
{ byte[] buffer = new byte[5]; stream.Seek(0, SeekOrigin.Begin); // Reset stream position stream.Read(buffer, 0, buffer.Length); string header = Encoding.ASCII.GetString(buffer); return header == "%PDF-"; }


This code snippet reads the first five bytes of a Stream and confirms if they match "%PDF-". For more details, you can refer to section 7.5.2 of the Adobe PDF Specification, and for the latest specification, see the PDF Association’s resource page.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article