Finally a working mediawiki sitemap

Lonny's picture

This is a follow up to Making a sitemap (opensource documentation wild goose chase).

So, thanks to the OLPC folks, we finally have a working google-compliant sitemap. The code below is adapted from http://wiki.laptop.org/go/SEO_for_the_OLPC_wiki/sitemapgen. Here are some notes about the implementation:

  1. Save the code as makesitemap.sh on the root directory (appropedia.org/)
  2. Add executing permissions to the file, by typing chmod u+x makesitemap.sh
  3. Run it from the site root by typing sh makesitemap.sh, or even better run using cron.
  4. I had to download a new version of generatesitemap.php from http://www.mediawiki.org/wiki/Manual:GenerateSitemap.php. I was getting a Class 'MWNamespace' error.
  5. I had to get rid of the ticks on the echo lines that were present in the OLPC version. They were breaking the echo.

----makesitemap.sh----

 #!/bin/sh

echo Sitemap script ...
cd maintenance/
/usr/local/php5/bin/php generateSitemap.php
echo Sitemap script ... done

echo Moving files, ungzing ...
mv -f *.gz ../
cd ..
gzip -d *.xml.gz
echo Moving files, un'gz'ing ... done

echo Archiving old sitemap...
mv sitemap.xml sitemap.xml.$(date "+%Y%m%d%H")
echo Archiving old sitemap... done

echo Cating all of the name spaces ...
cat sitemap-appropedia-w1-NS_*.xml > sitemap.xml
echo Cating all of the name spaces ... done

echo Replacing "localhost" with www.appropedia.org ...
sed -i 's,localhost,www.appropedia.org,g' sitemap.xml
echo Replacing "localhost" with www.appropedia.org ... done

echo Cleaning that up ...
sed -i "/?xml version/d" sitemap.xml
sed -i "/urlset/d" sitemap.xml
echo Cleaning that up ... done

echo Adding the XML headers back...
echo "" >> sitemap.xml
sed -i '1i\
<?xml version="1.0" encoding="UTF-8"?>\
' sitemap.xml
echo Adding the XML headers back... done

echo Cleaning up the catd files ...
rm sitemap-appropedia-w1-NS*
echo Cleaning up the catd files ... done

echo Pinging the sitemap update to the search engines...
wget -q -O /dev/null http://www.google.com/webmasters/tools/ping?sitemap=http://www.appropedi...
wget -q -O /dev/null http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=http://w...
wget -q -O /dev/null http://submissions.ask.com/ping?sitemap=http://www.appropedia.org/sitema...
wget -q -O /dev/null http://api.moreover.com/ping?u=http://www.appropedia.org/sitemap.xml
wget -q -O /dev/null http://webmaster.live.com/ping.aspx?siteMap=http://www.appropedia.org/si...
echo Pinging the sitemap update to the search engines... done

echo All done

----

Please feel free to ask questions. I hope that this is finally fixed right.

Comments

Chriswaterguy's picture

priority weighting

As I understand it (from chatting with Seth at OLPC) the priority weighting is a zero sum game - it just guides Google in dividing up whatever weighting they give us.

So, I think we should have 1.0 for our landing pages (i.e. portals), and much lower values for everything else. How does that sound?

-- Chriswaterguy

Lonny's picture

weighting

If you think this would help, we could experiment with 1s for portals and .9s for other pages (hopefully, someone will write the script for that).

The easiest way to keep the portal pages high in ranking is to keep them linked to and to keep updating them. How recent pages are, affects their ranking.

ticks

Whoops. Yeah, I added some extra echo's when I put it up on the wiki. Poke around at Google Webmaster tools when you're done, you can fine tune a few options there, get your portals set as sub-headings in search results, etc.

--S

Lonny's picture

sub-headings

Thanks. How do you set up pages as sub-headings. I only see where you can block certain pages from being sub-headings, but not where you can suggest which pages should be.

Hrrm... come to think of it

Hrrm... come to think of it I'm not sure. But I suspect if you are providing a high page weight to a given page that it will likely by dynamically generated as one of the options. Also, external links? I thought there was a mechanism to do so directly, but that doesn't appear to be the case.

Thanks, and an update

http://jrandomhacker.info/MediaWiki_tricks_and_tips/google_sitemap_repai...

Thanks for checking in on this sitemap issue. I'm happy you were able to find a solution.

I took your script and adapted it to be a bit more serviceable by everyday people. Just a few environment variables, nothing special.

Didn't work

Your link requires people to register/login. On purpose?

Fixed

Sorry about that, I was playing around with some other software and changed some settings momentarily. It's fixed.

Google webmaster tools says it's an invalid format

Presumably I goofed something up. Perhaps my work would help someone else out there, but at this point I'm not willing to troubleshoot further. I'm going to retire my MediaWiki use anyways.

Getting the mediaWiki sitemap working on GoDaddy

Hey,

I wish I'd found this post earlier. I've just done pretty much the same as you - using generateSitemap.php and a script based on Sy's, but focusing on GoDaddy users. I've included some earlier steps like making sure the maintenance scripts are enabled and how to set up the cron using the (slow) GoDaddy interface. Generating a sitemap for MediaWiki hosted on GoDaddy

I wonder if the latest MediaWiki has fixed the generateSitemap.php.

jjh

Post new comment

The content of this field is kept private and will not be shown publicly.
  • <ref>, </ref> around notes. <nowiki>, </nowiki> around text not to be formatted.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <sup> <h1> <h2> <h3> <h4> <h5> <big> <small> <img> <ref> </ref> <references> <nowiki> </nowiki>
  • Allow MediaWiki syntax (limited).
  • Images can be added to this post.
  • Handle "[[Page]]" (wikis), [http://... link text] (exts)
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
  • You may use [inline:xx] tags to display uploaded files or images inline.
  • You can use context links in the text to create context-related links to pages or sites that provide additional information about a word or phrase.

More information about formatting options

Captcha
This question is for testing whether you are a human visitor and to prevent automated spam submissions.