Jul 30, 2007
The new Robots exclusion protocol allows for HTTP header instructions to be used when serving the file.
The HTTP header directives can be served when displaying PDF files, video, word, excel, xml, flash... and many other non HTML file types.
We've extended our support for META tags so they can now be associated with any file. Simply add any supported META tag to a new X-Robots-Tag directive in the HTTP Header used to serve the file. Here are some illustrative examples:
* Don't display a cache link or snippet for this item in the Google search results:
X-Robots-Tag: noarchive, nosnippet
* Don't include this document in the Google search results:
* Tell us that a document will be unavailable after 7th July 2007, 4:30pm GMT:
X-Robots-Tag: unavailable_after: 7 Jul 2007 16:30:00 GMT
Sometimes you may not want to tell people about the existence of certain files via the robots.txt files. You may wish the files to be accessible on your website, say by emailing selected people about them. It might be easier to add robots exclusions on a file by file basis rather than having an ever expanding robots.txt file.So if you add the HTTP Headers when you serve the file, you can make sure that if anyone does happen to link to the file, it will be excluded from the search results.
The following would let you serve a file of a different filename from the one named in the file structure.
header('X-Robots-Tag: noarchive, nosnippet'); // the Google robots instructions
header('Content-Type: application/msword'); // application/zip or application/pdf etc
header('Content-Disposition: attachment; ' .'filename="'.$different_filename.'"');
readfile('/path/to/files/' . $filename);
See examples of HTTP headers on Understanding HTTP Headers