
This is a follow up to Making a sitemap (opensource documentation wild goose chase).
So, thanks to the OLPC folks, we finally have a working google-compliant sitemap. The code below is adapted from http://wiki.laptop.org/go/SEO_for_the_OLPC_wiki/sitemapgen. Here are some notes about the implementation:
chmod u+x makesitemap.sh
sh makesitemap.sh, or even better run using cron.
----makesitemap.sh----
#!/bin/sh
echo Sitemap script ...
cd maintenance/
/usr/local/php5/bin/php generateSitemap.php
echo Sitemap script ... done
echo Moving files, ungzing ...
mv -f *.gz ../
cd ..
gzip -d *.xml.gz
echo Moving files, un'gz'ing ... done
echo Archiving old sitemap...
mv sitemap.xml sitemap.xml.$(date "+%Y%m%d%H")
echo Archiving old sitemap... done
echo Cating all of the name spaces ...
cat sitemap-appropedia-w1-NS_*.xml > sitemap.xml
echo Cating all of the name spaces ... done
echo Replacing "localhost" with www.appropedia.org ...
sed -i 's,localhost,www.appropedia.org,g' sitemap.xml
echo Replacing "localhost" with www.appropedia.org ... done
echo Cleaning that up ...
sed -i "/?xml version/d" sitemap.xml
sed -i "/urlset/d" sitemap.xml
echo Cleaning that up ... done
echo Adding the XML headers back...
echo "" >> sitemap.xml
sed -i '1i\
<?xml version="1.0" encoding="UTF-8"?>\
' sitemap.xml
echo Adding the XML headers back... done
echo Cleaning up the catd files ...
rm sitemap-appropedia-w1-NS*
echo Cleaning up the catd files ... done
echo Pinging the sitemap update to the search engines...
wget -q -O /dev/null http://www.google.com/webmasters/tools/ping?sitemap=http://www.appropedi...
wget -q -O /dev/null http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=http://w...
wget -q -O /dev/null http://submissions.ask.com/ping?sitemap=http://www.appropedia.org/sitema...
wget -q -O /dev/null http://api.moreover.com/ping?u=http://www.appropedia.org/sitemap.xml
wget -q -O /dev/null http://webmaster.live.com/ping.aspx?siteMap=http://www.appropedia.org/si...
echo Pinging the sitemap update to the search engines... done
echo All done
----
Please feel free to ask questions. I hope that this is finally fixed right.
Comments
priority weighting
As I understand it (from chatting with Seth at OLPC) the priority weighting is a zero sum game - it just guides Google in dividing up whatever weighting they give us.
So, I think we should have 1.0 for our landing pages (i.e. portals), and much lower values for everything else. How does that sound?
-- Chriswaterguy
weighting
If you think this would help, we could experiment with 1s for portals and .9s for other pages (hopefully, someone will write the script for that).
The easiest way to keep the portal pages high in ranking is to keep them linked to and to keep updating them. How recent pages are, affects their ranking.
ticks
Whoops. Yeah, I added some extra echo's when I put it up on the wiki. Poke around at Google Webmaster tools when you're done, you can fine tune a few options there, get your portals set as sub-headings in search results, etc.
--S
sub-headings
Thanks. How do you set up pages as sub-headings. I only see where you can block certain pages from being sub-headings, but not where you can suggest which pages should be.
Hrrm... come to think of it
Hrrm... come to think of it I'm not sure. But I suspect if you are providing a high page weight to a given page that it will likely by dynamically generated as one of the options. Also, external links? I thought there was a mechanism to do so directly, but that doesn't appear to be the case.
Thanks, and an update
http://jrandomhacker.info/MediaWiki_tricks_and_tips/google_sitemap_repai...
Thanks for checking in on this sitemap issue. I'm happy you were able to find a solution.
I took your script and adapted it to be a bit more serviceable by everyday people. Just a few environment variables, nothing special.
Didn't work
Your link requires people to register/login. On purpose?
Fixed
Sorry about that, I was playing around with some other software and changed some settings momentarily. It's fixed.
Google webmaster tools says it's an invalid format
Presumably I goofed something up. Perhaps my work would help someone else out there, but at this point I'm not willing to troubleshoot further. I'm going to retire my MediaWiki use anyways.
Getting the mediaWiki sitemap working on GoDaddy
Hey,
I wish I'd found this post earlier. I've just done pretty much the same as you - using generateSitemap.php and a script based on Sy's, but focusing on GoDaddy users. I've included some earlier steps like making sure the maintenance scripts are enabled and how to set up the cron using the (slow) GoDaddy interface. Generating a sitemap for MediaWiki hosted on GoDaddy
I wonder if the latest MediaWiki has fixed the generateSitemap.php.
jjh
Post new comment