contents   index   previous   next



Tagging Characters

 

And now for something I think is quite beautiful: You can use RazzmaTag to insert typesetting tags that represent special characters, even Unicode characters, which means it’s now possible to convert Unicode characters in Word into something useable in QuarkXPress or other typesetting programs. This is a Big Deal because (at least as of the date of this writing) QuarkXPress will not import Unicode characters, even though more and more Word users are using them.

Here’s a simple example of a master list to tag two special characters, an e with an acute accent and a u with an umlaut:

é|<e-acute>

ü|<u-umlaut>

 

(Of course, the tags should be whatever you need in your production environment.) Please note that we didn’t include a +P, +A, +F, +p, +a, or +f, because these codes tell RazzmaTag to look for formatting or tags. In this case we’re not looking for formatting or tags; we’re just looking for text (characters).

Need to tag Unicode characters? Let’s say you’ve got the Unicode character for the Greek character alpha in your Word document. Let’s also say you you’re using QuarkXPress and have access to a Greek font. How do you convert the Word alpha into a Quark alpha? Like this:

1. In your RazzmaTag master list, include a caret and lowercase u (for Unicode) followed by the Unicode decimal (not hex) character number for alpha:

^u945

 

2. Insert a pipe symbol to separate your Find and Replace entries.

 

3. Insert the Quark XPress Tags that indicate a character style (which will use the Greek font) and character that will produce the alpha in QuarkXPress:

<@Greek>a<@$p>

 

The full master-list entry looks like this:

^u945|<@Greek>a<@$p>

 

After you run RazzmaTag and bring the file into QuarkXPress, you’ll need to format the character style sheet (in this case, named Greek) with the Greek font, which will display the lowercase a in the Greek character style as an alpha.

You can use this technique to convert any Unicode character in Word into any character in a special font in QuarkXPress—math characters, foreign scripts, whatever. To do so, you have to know two things: (1) the Unicode decimal character number and (2) the character that will produce the special character when formatted with your special font in QuarkXPress.

Here’s a more mundane example that doesn’t involve Unicode. Let’s say you’re working in Word 2000, on a PC, and you want to convert em dashes to the appropriate character for QuarkXPress on a Macintosh. Piece of cake:

^0151|<\#208>

 

The ^0151 is the code to find an em dash in Microsoft Word. <\#208> is the QuarkXPress code for an em dash on a Macintosh.

Why those numbers? The 0151 is simply the extended ASCII number for an em dash on a PC. The 208 is the extended ASCII number for an em dash on a Macintosh. You can use the code pattern of ^0??? to find any ASCII character in Word, and you can use the code pattern of <\#???> with any character number to specify a character in QuarkXPress. I’ve provided a list of these numbers for special characters on both Mac and PC in an appendix to this document.

For some special Word characters, there’s an easier way. Consider this Find and Replace string:

^+|<\#208>

 

The ^+ is the code to find an em dash in Microsoft Word. Again, <\#208> is the QuarkXPress tag for an em dash on a Macintosh.

For a list of codes (such as +^) for finding special characters in Microsoft Word, please see the document called “Advanced Searching in Microsoft Word,” which is included with your RazzmaTag files.

You can find lists of Unicode characters in many places on the Internet, but my favorite resource for this purpose is Allan Wood’s Unicode Resources site at http://www.alanwood.net/unicode/.

If you create a RazzmaTag master list for converting Unicode or other special characters and would be willing to share it with others, please email it to me at editor@editorium.com. I’ll be happy to post it on our Web site.