Find Duplicate References

The Find Duplicate References macro is found on the References menu as Duplicate Refs:

The macro works with both unnumbered and numbered reference lists (works better when the numbers are not autonumbers, but it does work with autonumbered lists). It also works with the reference list left in the manuscript with the text paragraphs and when the reference list has been moved temporarily to its own file (it works, like other reference-specific macros in EditTools, better when the references are moved to a separate, references-only file).

Like all macros, the Find Duplicate References macro is "dumb"; that is, it only finds identical references. The following image shows references 19 and 78 as submitted for editing.

As the image shows, although references 19 and 78 are identical references and are likely to appear identical to an editor, they will not appear identical to the Find Duplicate References macro. Items 1 and 2 show a slight difference in the author name (19: "Infant", 78: "Infantile"). The journal names are different in that in 19 the abbreviated name is used (#3) whereas in 78 the name is spelled out (#4). Finally, as #5 and #6 show, there are a couple of differences in the cite information, namely, the order, the use of a hyphen or en-dash to indicate range, and the final page number.

Tip: Formatting References

It does not matter what style you apply to the references. What does matter is that you are consistent. The macro does two passes: one forward (from beginning of reference) and one backward (from end of reference) so as to minimize — but not eliminate — the chance of identical references being missed. Even something as "minor" as an extra space in one reference but not in the other, or different spelling (e.g., American vs British), will cause the macro to miss pairing identical references. So, for example, if one article title uses American spelling and the other uses British (e.g., "immunization" and "immunisation") will result in a missed pairing.

Because any one of these differences would prevent the macro from pairing these references and marking them as potentially identical, it is important that the references go through a round of editing first. After editing, which should also include running the Journals macro, the references are likely to look like this:

If you compare the same items (1 and 2, 3 and 4, 5 and 6) in the above image, you will see that they now better match. (Ignore the inserted comments for now; they are discussed below.)

Some additional steps are required before the Find Duplicate References macro can be run successfully. However, the macro will perform those steps for you (see #A below). When you run the Find Duplicate References macro, the opening screen looks like this:

This screen provides all of the instructions for running the macro plus it describes how the macro will prepare your file for the Find Duplicate References macro (#A).

These steps are important for several reasons. First, you want to be sure that your edited reference list is saved and not disturbed. This is the document you will be sending to your client. Second, to ensure that the original edited version is not disturbed, a work copy of the file needs to be created. Third, the document needs to be clean, so highlighting is removed. Fourth, and most important, when the macro runs, it needs to "see" only the edited version; anything else could interfere with the matching. Consequently, any queries/comments you inserted are deleted and all of the changes that were made are accepted. Remember that in Word, when changes are made with Tracking on, the material marked as deleted is not yet actually deleted; consequently, when the macro is run, the Tracked items will interfere (as will any comments).

After accepting all changes and deleting the comments, the entries for references 19 and 78 look like this:

The Find Duplicate References macro assumes that references are in a separate document with no other text. If the references are in the primary manuscript, rather than their own file, you need to either move them to a separate file or tell the macro where the references begin and end by inserting special bookmarks (dupBegin and dupEnd). To insert the bookmarks, close the Find Duplicate References macro dialog and open the EditTools Bookmarks dialog. To make it easier, the Bookmarks macro now has buttons to insert these bookmarks:

After inserting the bookmarks, reopen the Find Duplicate References dialog.

If the references are in a separate file, the macro can be run from the primary dialog, shown here.

The Find Duplicate References macro matches a set number of characters, including spaces. The default is 120 but you can change the number to 36, 48, 60, 72, 84, 96, or 108 using the dropdown arrow shown at #1 in the Find Duplicate References dialog above. Then click Run (#2).

The macro does a two-pass search, one from the beginning of the reference and another from the end of the reference, which is why a list of duplicates may have repetitions.

The results of the search appear like this:

(They appear as tracked changes only if the macro is run with Tracking on; if Tracking is off, the results appear as normal text.) Note the title of the duplicates is "Duplicate Entries (Nondefinitive)." The reason for "Nondefinitive" is to remind you that the macro is "dumb" and there is no guarantee that the list includes all duplicates or that all listed items are duplicated. Much of the macro's accuracy depends on the consistency of editing, including formatting.

For these examples, the Find Duplicate References macro was run on a list of 735 references and the list of possibilities shown represents those likely duplicate references the macro found. Note that references 19 and 78 were found (#19 and #78 indicate the portions of those references found duplicated by each pass of the macro); however, if, for example, in editing the page range separator in #19 was left as an en-dash in reference 19 and as a hyphen in reference 78, the macro would not have listed the material at #19 as there would not have been a match, but it would have listed #78. Similarly, if the author name in reference 19 had been left as "Infant" and in reference 78 as "Infantile", the macro would not have listed the material at #78 as there would not have been a match; however, it would have listed #19—thus the reason for the two passes.

The next step is to determine which are duplicates. This is done using Word's Find Navigation pane, as shown here:

Copy part or all of what was found (#1) into the Find field (#2). Find will display the search results ("3 matches") (#3); clicking the Browse button (the rightmost button at #3) lists the three matches found (#4 to #6). The first entry (#4) is always the text in the duplicates list (#1), which means that, in this example, the possible duplicates are #5 and #6. Click on the text marked #5 to see the complete text of that entry. Then compare that text to the text of the reference at #6. (It is possible for the macro to find more than two possible matches for the same text — and all, some, or none may be duplicates.)

Tip: Use comments to track duplicates

When a duplicate is found, you can insert a prewritten, standardized comment (using EditTools' Insert Query) to tell the client that references x and y are duplicates and that one is being deleted and renumbered (see image below for a sample comment). Insert the comment at each of the duplicate references, slightly modifying the comment so that it is appropriate for the reference to which it is being attached. The comment shown below is inserted at reference 78 and its language is appropriate for that reference. It tells the client that references 19 and 78 are identical and that reference 78 has been deleted and renumbered as 19. This type of comment is added to the version (e.g., the Track Changes version) of the reference list that will be given the client. The comment is added to the appropriate references as duplication is confirmed.

The comment, in addition to serving as a message to the client, serves as a reminder message during editing of the manuscript. Duplicate references require renumbering so as to keep reference callouts in number order. For example, it may be that reference 78 is called out after the callout for reference 10 and before that for 19. In that case, reference 78 would be moved to position 11 in the list and renumbered as 11 and the comment would be modified (easy to do using EditTools' Comment Editor). A prewritten note (another new EditTools feature) would be inserted at point 78 in EditTools' Reference Number Order Check and reference 19 would be marked as deleted, the inserted comment (see above) would be modified, and a note would be added to Reference Number Order Check at point 19. (See the discussion below about the report.)

When editing of the manuscript is finished, have the Reference Number Order Check macro export a renumbering report to send with the edited file to the client. A partial sample report is shown here:

Every report bears the creator's identification information (#1) and file title (#2). You set the creator information once and it remains the same for every report until you change it using a manager. The file title is set each time you create a report.

As the report shows, reference 78 was deleted and all callouts numbered 78 were renumbered as 19 (#3). The prewritten, standard message (a new feature) can be inserted with a mouse click; only the numbers need to be inserted or modified. The report shows that the renumbering stopped at callout 176 (#4) and started again at 197 (#5). Number 6 shows another deletion and renumbering.