Upgrade PST files to Unicode and run ScanPST from the command line

Identified an issue with old PST files that started to misbehave when Microsoft Office was upgraded to 2010. PST files created by Outlook 97, 2000 or XP are stored using ANSI, whereas the ones created by Outlook 2003, 2007, and 2010 use Unicode. Microsoft recommend that you don’t use ANSI PST files anymore.

You can determine what format your PST file is by right-clicking it in Outlook (2003 and higher) and selecting Data File Properties… then clicking the Advanced… button and looking in the Format: box. If it says Outlook Data File then you’re already using a Unicode file, if it says Outlook Data File (97-2002) then it’s ANSI. You can also do it programmatically by reading in the 11th byte of the PST file, if (when converted to an integer) it’s 14 or 15 then it’s an ANSI PST file, if it’s 23 then it’s Unicode – see wVer in the PST file format header specification.

There are manual methods to solve the problem – you basically create a new PST file, open it in Outlook, and move everything from your old format PST file to the new one. Not something that your busy users (or less technically capable ones) will appreciate.

Luckily there’s a great utility called Upstart to the rescue. It gives your users a nice easy way upgrade their PST files.

Plus, it has a command line version – so useful to system admins in larger organisations. I identified 1200 ANSI format PST files on user personal drives that needed converting. Using the Upstart command line utility I’ve been able seamlessly upgrade these overnight prior to upgrading my PCs and XenApp servers with Office 2010.

Also, it’s very sensibly (reasonably) priced.

Even better, cupstart.exe (the command line PST upgrade utility) lets you run the Microsoft ScanPST utility, so you can ensure the PST files have no issues before you attempt to upgrade them. As ScanPST has no command line functionality, cupstart is a great way to get this.

Best yet, the guy who wrote it, Pete, is really helpful in the event that you encounter any issues or don’t read the manual properly (ehem…).

So, some hints and tips based on my experiences with the Command Line version, cupstart.exe:

  • You need to run cupstart on a machine that has Outlook 2003 or higher installed as it uses Outlook’s resources.
  • cupstart by default will run ScanPST to fix problems within the PST file before it converts it. If this isn’t run then any major problems in the file could cause the conversion process to fail. However, ScanPST doesn’t always exit nicely and can crash cupstart. The solution to this is to run cupstart with the scan option first, then repair the file using the repair option if necessary, before finally converting the file to Unicode using the cu option.
  • The Outlook profile keeps a record of what PST files a user has open in Outlook, and what type they are. If you convert an ANSI PST file to Unicode then you can sometimes get issues if the Outlook profile still thinks the PST file is ANSI. Luckily, cupstart provides an option, cp,  to process the PST entries in the Outlook profile to correct this. I found that it was a good idea to also use the /C (continue on errors) switch as if a user has a PST file specified in their Outlook profile that no longer exists cupstart will stop and not process any further PST entries.
  • cupstart does not seem to multithread, I’m running it on a PST with a fast quad core processor but it only uses 25% CPU maximum.
  • To eliminate disk-based or network latency bottlenecks I’ve written a VBScript that copies the user’s PST files to a fast SSD on my processing PC, I then process the file and copy the converted files back to the user’s network storage, also dropping a marker that my logon script picks up to tell it to run cupstart cp /c for the user next time they log on.
  • PST files on the whole get smaller when converted to Unicode, but a few get bigger. I’ve yet to work out why – and probably won’t be able to as trawling through people’s email is both time consuming and probably aginst some kind of data protection agreement). Here’s a graph of before and after sizes for some of the larger PST files I’ve processed (Y axis is size in MB):
This entry was posted in Applications, Outlook, Scripting and tagged , , , , , , , , , , , . Bookmark the permalink.

1 Response to Upgrade PST files to Unicode and run ScanPST from the command line

  1. Pete Maclean says:

    This is the Pete who is the developer of Upstart, the product mentioned in this blog entry. The blog was written half a year ago but I only just came across it. Thank you, Robin, for the glowing and informative mention.

    Robin writes, “PST files on the whole get smaller when converted to Unicode, but a few get bigger. I’ve yet to work out why”. Here’s the answer…

    When a PST is converted from ANSI to Unicode, two forces come into play that affect the resulting size. The first of these is the way that text is represented. In an ANSI PST, text is stored in multi-byte form meaning that each character consists of a sequence of between 1 and 6 bytes. Digits, Latin letters (A-Z) and common punctuation symbols are all single bytes while Chinese, Japanese and Korean characters occupy multiple bytes. In a Unicode PST, text is expressed in Unicode, UTF-16 to be precise, meaning that every character requires either 2 or 4 bytes. This means that, if one is converting a PST containing mostly messages in English or another Western European language, the size will increase because the space needed for most characters changes from 1 byte to 2 bytes. By contrast, if one is converting a PST containing mostly Chinese-language messages, the size may decrease because many characters are stored more compactly. The second force is the elimination of unused space. When messages in a PST are deleted, the space they occupy is not removed from the file; this space gets reused for new messages and any unused space is removed only when the PST is compacted (which is a manually initiated operation). (Looking at it another way, each PST has its own internal recycle bin.) So very often, PSTs being converted will contain substantial amounts of unused space and that unused space is not carried over to the Unicode version.

    So the bottom line is that conversion can easily result in either a smaller or a larger file. Robin’s histogram shows what is probably a very typical range of variation.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.