Migrating from DasBlog to WordPress

The Ripple Rock blogs have been on DasBlog for a long time. It was a great blog engine in its time with a small footprint because it stored all of its data in XML and didn’t need a database. However things have changed and we finally decided to take the plunge and move over to Word Press.

TL;DR? Scroll down to How to do it

The Journey

I had initially thought the migration would be a simple process. There are several tech people out there with blog articles on how they transitioned from DasBlog to WordPress. There were even plugins that would import the XML files that DasBlog uses into WordPress and preserve all the legacy content. However many of those articles were written several years ago and one of the plugins for importing that content from DasBlog no longer works.

I spent ages looking at various solutions that would work and with many I hit brick walls where certain plugins were not supported or just didn’t work as technology had moved on, or the site that hosted them had long since disappeared!

Eventually I realised I couldn’t export directly from DasBlog to WordPress I had to export from DasBlog to a format that was still supported by WordPress and that format was BlogML!

I discovered a DasBlog to BlogML Converter on Merill Fernando’s site. He had made a GUI wrapper for the converter which was originally made by Paul Van Brenk. Unfortunately the link to this converter which was hosted on a Microsoft site no longer worked. However Shital Shah kindly made the application available from his git hub repo found here.

Finally I was able to export my blog from DasBlog to BlogML!

Next I needed a BlogML to WordPress WRX converter.

I discovered a BlogML importer created by Saravana using some of the source code from the legacy blog migrator project which sadly no longer appears to exist anymore. Saravana created this code back in 2012, I then discovered another chap Michael Freidgeim who took the source code and made some improvements to it, such as logging and fixing the importing of comments. You can see the repo he made for it over here.

Michaels code worked like a charm, however on importing a large DasBlog into WordPress I ran into some issues where WordPress kept on repeating the same article over and over again. I wasn’t sure what was to blame here and I spent ages looking on WordPress forum’s about the issue. Several people had encountered this issue but there never really seemed to be a solution to the issue. So I decided to look into the PHP code myself to try and workout what was going on. To be clear, I am not a PHP coder I mainly code in C#.

But what I discovered made perfect sense. My SQL which is what WordPress uses as its database can support some pretty high integer numbers and in theory when people share details about how many articles WordPress can support they post some high numbers. The problem is My SQL can support those high numbers but WordPress was basically taking the post id number from MySQL and converting it to an int. An int in PHP can only support a number no greater than 2147483647. If you try to cast an into any higher than that number PHP will just convert it back to 2147483647 which was the post id of the article I kept on seeing duplicates of.

What had happened was the BlogML importer had kept the GUID’s that DasBlog used for its postid’s when I had imported this into WordPress, it had just attempted to convert these to integers but very high level integers. To get around the issue. I changed the BlogML to WRX code so instead of using the existing post id’s I got it to use a configurable identity seed which you can set yourself. This solved the issue for me. You can access the fork of the repository here which has my changes.

How to do it

Convert to BlogML

Convert your DasBlog to BlogML using the DasBlogML converter. The converter is pretty straight forward. You just need to point it to the root of your dasblog folder and it will do the rest.

Converting from BlogML to WordPress WRX

Convert the BlogML to WordPress WRX format using the converter found here. (Don’t forget to use an identity seed for your postId’s )

Lets unpack a bit of whats happening on the command line here. I have put in my existing blog url and the target url where I am currently setting up my WordPress blog. I am also using the BlogPostIDSeed of 50. On a new WordPress blog this seems to be a safe number to me. If you are using content with an existing blog I’d look in your WordPress database just to be on the safe side. For more details on why I use a BlogPostIDSeed, please see the journey text above.

The above will create

  • [filename].wrx.Redirect.txt – This contains the redirect rules in .htaccess format from your old blog urls to your new so you can keep your SEO traffic. More on this later
  • [filename].wrx.SourceQA.txt – This is a list of source urls that were processed
  • [filename].wrx.TargetQA.txt – This is a list of their corresponding target urls
  • [filename].wrx.xml – This is the file that contains all of your blog articles.

Importing your WRX file into WordPress

Max WordPress File Upload

Before you get to this step, you will probably need to increase the size of the allowable upload size for files to WordPress to do this I made the change in my PHP.INI file. Depending on your hosting provider you’ll probably want to check which method is best for you. There is an article here

Importing

WordPress has an import menu from its tools men where you can select the import feature you want. In this case we are selecting the WordPress Importer

On this screen select the wrx.xml file you created in the previous step.

The importer should work. If you have problems with file sizes you will need to increase the max uploaded file size allowed

Redirects

Redirection

WordPress is going to change the urls of your blog articles. If search engines have your old blog article URL’s indexed, users are going to get 404 errors when visiting them. To prevent this we need to put some redirects in, I made use of a Word Press plugin called Redirection by John Godley. You can install this plugin from the WordPress plugins menu option. Install the plugin

Editing your Redirect Files

In one of our previous steps the BlogML converter creates a files called [filename].wrx.Redirect.txt . This file contains redirects you would usually see in the .htaccess file. if you are happy pasting these into your .htaccess file go ahead now. I wanted to use the redirection plugin so I could keep track of errors or any other redirect issues. However I ended up editing this file to simplify it for me. I wasn’t able to import it as it was for my purposes.

Step 1

I imported the file as a space separated file into Excel and I deleted the columns I didn’t want (see the image) I just wanted the Source URL and the Target URL

Step 2

I made all the URLs relative with a simple search and replace. You can see in the image I have done the first column. I also did this for the second column. I also replaced the .aspx$ to just be .aspx. After this I exported my file as a CSV file.

Step 3

I then imported my CSV file into the Redirect plugin we installed earlier in WordPress

You can now see I have all my redirects imported. All the legacy URL’s will now permanently 301 redirect to their Word Press URLs

Finishing Touches

If like me you made use of plugins to display your code. You may find your code looks a bit odd now.

The above code was formatted in a plugin for Windows Live Writer called Smart Content. The Word Press styles seem to throw this code out a bit, I found I needed to add a bit of CSS to correct that by selecting the Customize option (found at the top left of the page when logged in and on an article page) and then selecting Additional CSS.

If you are late to the migration party like I was, hopefully this article will be helpful to you.