If you are using a sitemap.xml file you know you need to submit it to the search engines on a regular basis (say nightly). This is done via a GET request to each search engine’s ‘ping’ URL. Many of the solutions out there for automatically pinging your sitemap.xml file to Google and Bing rely on PHP, Ruby, or other scripting language. On Linux servers, a simple bash script using the built in command wget is sufficient and avoids complexity. As of December 2016, looks like Google and Bing are the only two search engines that support this. Ask.com took their ping script offline, and Yahoo integrated with Bing. I wound up writing this post because I was annoyed with what I found out there. Django has a built in one but it only supports Google and never shows output even with verbosity turned up.
The first thing you need to do is URL encode the URL to your site map.
This can be done using an online URL encoding tool.
The URL to the site map for my blog is:
https://www.laurencegellert.com/sitemap.xml, but the search engine ping URL accepts it as a query string parameter, so it needs to be url encoded.
The URL encoded version of my sitemap url is http%3A%2F%2Fwww.laurencegellert.com%2Fsitemap.xml.
Use the URL encoded version of your sitemap url in the script below where indicated.
Place this script somewhere on your system, named ping_search_engines.sh.
#!/bin/bash echo ------------- Pinging the search engines with the sitemap.xml url, starting at $(date): ------------------- echo Pinging Google wget -O- http://www.google.com/webmasters/tools/ping?sitemap=YOUR_URL_ENCODED_SITEMAP_URL echo Pinging Bing... wget -O- http://www.bing.com/ping?siteMap=YOUR_URL_ENCODED_SITEMAP_URL echo DONE!
The -O- part tells wget to pipe the output to standard out, instead of a file. That means when you run it manually it displays the output on screen. Wget’s default behavior is to save the returned data to a file. A less verbose mode is -qO-, which hides some of wget’s output, but I prefer to have all that information in the log.
Run chmod +x ping_search_engines.sh so the file is executable.
Add the following cron entry, which will trigger the script every night at 1am:
0 1 * * * /path/to/ping_search_engines.sh >> ~/cron_ping_search_engines.log 2>&1
This script is a good initial way to get going for a simple website. For heavy duty websites that are mission critical, or that your job relies on I’d take it a few steps further:
- Locate the cron log output in a directory that gets logrotated (so the log file doesn’t get too big). The output is small so even after running it for a year or more the file won’t be that large, but like all log files, it should be setup to auto rotate.
- Scan for absence of the 200 response code and alert on failure.