Is there a quick and effective way to detect PDF files without using any toolkits?

Modified on Tue, Oct 29, 2024 at 8:35 PM

I am working a project in .NET Core. And, I want to be able to determine if the content in my Stream object is actually a PDF file. Is there a simple way to tell if the in-stream content is PDF?

Yes, there is.

According to the Adobe PDF specification, the first five bytes of a PDF file typically start with the sequence "%PDF-". This header provides a quick way to check if a file is likely in PDF format. By reading the initial five bytes from the stream and verifying if they match "%PDF-", you can identify PDF content across platforms without needing extra libraries.

Here’s an example in C#:

using System;
using System.IO;
using System.Text;
public bool IsPdf(Stream stream)
{ byte[] buffer = new byte[5]; stream.Seek(0, SeekOrigin.Begin); // Reset stream position stream.Read(buffer, 0, buffer.Length); string header = Encoding.ASCII.GetString(buffer); return header == "%PDF-"; }

This code snippet reads the first five bytes of a Stream and confirms if they match "%PDF-". For more details, you can refer to section 7.5.2 of the Adobe PDF Specification, and for the latest specification, see the PDF Association’s resource page.