
This is my first blog entry on Appropedia. Eventually I will make more personal blog entries, but it seems like this will be a good place to chronicle some of the mediawiki fun we get to have. Hopefully it will help others as well. Let me know what you think.
So I wanted an Appropedia sitemap to submit to search engines. I am still unsure if this is necessary since we are indexed by all the major search engines and have a robots.txt to tell the well behaved spiders where to go, but reviewing the available literature makes it seem like - yes, we want a sitemap. Last year we implemented the MediaWiki (MW) extension Google Sitemap. It took a few hours of hacking, but eventually the extension made a sitemap every time you visited Special:GoogleSitemap. One limitation was that the sitemap was limited to 5000 entries, a limit we eventually passed. After upgrading to a newer version of MW the extension exhibited a much stronger limitation... it broke some things, e.g. rendering Special:Version as blank.
Well it turns out MW has a built in sitemap generator (since MW 1.6), but I could not figure it out. It is amazing how little information is available about it, but I finally have, with the help of a few other sites and Curt, figured it out. It took me a while to figure it out because of the lack of available information and two wrong assumptions on my part.
My first erroneous assumption was that we needed to have only one sitemap file... so I spent way to long trying to combine the many lines outputted from the MW sitemap script. That assumption is wrong as described at this hard to find page at Google Support.
My second big error was forgetting to update the apache rewrite rules while testing the new sitemap with Google. I kept getting an error, which I tried, in futility, to correct in the code of the sitemaps. So I added the following rule to .htaccess
RewriteCond %{REQUEST_URI} !^/sitemap* that allows search engines to access the sitemap index and files and it worked!
The attached file (see notes below for a breakdown of the meaning of the code) is a simplification and adaption that works for Appropedia based on the great code at jrandomhacker.info and the great blog entry at dralspire.com. This adapted code is based on a few assumptions and changes, which may not work for you. I dropped the file onto our root directory and am run it with a cron file (or just log in to the server and csh FileName).
Once you have the sitemap, you can go to the following sites to submit the sitemaps to search engines. You could write a script to automate this (see dralspire.com), but we will be updating the sitemap nightly and I don't want to ping the search engines that often. The search engines seem to update fairly often on their own.
http://www.google.com/webmasters/tools/ping?sitemap=http://www.appropedia.org/sitemap.xml
http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=http://www.appropedia.org/sitemap.xml
http://submissions.ask.com/ping?sitemap=http://www.appropedia.org/sitemap.xml
http://api.moreover.com/ping?u=http://www.appropedia.org/sitemap.xml
http://webmaster.live.com/ping.aspx?siteMap=http://www.appropedia.org/sitemap.xml
Some notes:
Meanings of the code in the attached file:
To remove the old google sitemap extension:
Once these forums and blogs are out of beta, we can add a second sitemap to the root directory. I know Google will accept that. Or we can add a new sitemap file to the index (sitemap.xml), which I think will be the best plan.
Comments
hey o.O that's weird -
hey o.O that's weird - what's with the "inline:fixsitemap.txt=attached" URL of the attached file... FF doesn't know how to handle "inline:"
Inline error
Hi. Thanks for the note. I do not know what is wrong with that code, but I am sure Curt or Chris can fix it. For now, I got rid of the offending line.
Still problems
Looks like the script is only fixing the main sitemap.xml and not all the namespace ones. Look for a way to fix that!!!
Thanks for the reference
Hey, thanks for the reference. I'm going to be revisiting this topic soon, and I'll come back and let you know how things have gone.
Thanks for the code
I look forward to hearing what comes of your revisit. Thank you for sharing so much of your work at http://jrandomhacker.info/.
FINALLY a fix
Check http://forum.appropedia.org/blog/finally-working-mediawiki-sitemap for the new version.
Post new comment