LIBRES: Library and Information Science Research
Electronic Journal ISSN 1058-6768
1996 Volume 6 Issue 1/2; June.
Quarterly LIBRE6N1 SEAMAN

CIRCULATION DATA MIGRATION: A CARL SYSTEMS TO INNOVATIVE INTERFACES CASE STUDY

January, 1996

ABSTRACT

This essay describes the process used to transfer circulation patron and transaction records from a CARL Systems, Inc. (CSI) online circulation module into an Innovative Interfaces, Inc. (III) circulation module. It describes how the University of Colorado designed and programmed a software interface that reformatted CARL data into a format acceptable to III. This article is based on the author's presentation at the 4th Circulation Open Forum sponsored by the Access Services Coalition, Denver, Colorado, 1995.

In December of 1994, the University of Colorado at Boulder migrated from a CARL Systems, Inc. (CSI) to Innovative Interfaces, Inc. (III) online library system. The data migration was performed during Christmas with the III online catalog and circulation system being live on January 5th, 1995. Implementing the new III circulation module presented many challenges-an unusually short time frame, limited budget, equipment delays, new Ethernet wiring, and creating new circulation policies suitable to the III circulation module.

Undoubtedly, however, the most challenging aspect was migrating the transaction, patron, and financial data from the CSI circulation system to the III circulation system. In recent history, no library had migrated from a CSI system. Consequently, CSI had to create the software to extract the circulation data. In addition, III had no experience bringing CSI data into their circulation system. And this was made even more complicated by the unique-almost peculiar-format CSI stored some of its' circulation data.

Data Migration

Data migration is probably the most important and certainly the most difficult aspect of the conversion process. It is also the most sensitive aspect. Because mistakes made in data migration can be profound and unrecoverable. If a mistake is made in setting a parameter, it can always be reset. But if patron names are mistakenly transferred into the address field of the new system or borrower types are recalculated incorrectly, there can be an extraordinary amount of manual clean up.

So what is data migration? In current circulation systems-whether CARL, Dynix, or DRA-there are patron and transaction records. Those have to be moved from the old computer system to the new computer efficiently and accurately.

There are several ways that may be accomplished. One way is to re-key every patron record then re-key each of the transactions from a printout of the old patron and transaction file. In some situations-particularly for small libraries-this works very well. When the University of Colorado's Health Science Library migrated from CARL to III, they chose this technique and it proved successful for them.

But with larger files than that require automated migration. That is, transferring the patron and transaction records onto tapes then reading those tapes into the new system. Of course, it is never quite that simple.

Circulation Record Formats

All librarians understand the MARC format. It is a standard way of storing bibliographic records so they can, among other things, be easily moved from computer system to computer system. A 010 or 050 or 245 field always refers to an LC card number, an LC call number, and title information regardless of the design of the online system in which they reside. For that reason, bibliographic records are relatively easy to move from one system to another.

But there is no uniform MARC equivalent for circulation records. There is no common format. Instead, each vendor stores patrons and transactions in their own way. And some, to be sure, some vendors store in a more "unique" format than others. That means migrating circulation data can be a real challenge.

This, however, is likely to change. ALA's Public Services Roundtable is devising standards for patron and transactions records. This process has been going on for about 2 years and perhaps within another 2 years those standards will be published. But even then it will take several more years for vendors to incorporate the standards into their systems. So any library planning on migrating within the next 5 years will probably have to deal with systems that use different formats.

Vendors & Interfaces

The contract signed with the vendor determines who is responsible for migrating data from the old to the new circulation system. Most often, it is the library that is responsible for ensuring that the old data is compatible with the new system. Generally, the new vendor provides documentation describing their circulation record formats. The library is then responsible for delivering the data in that format. But even if the vendor is paid to perform data migration, libraries will still end up doing a lot of what is described here. The new vendor will not necessarily know how data is structured or what is to be migrated and so on.

But, because there is no universal format for circulation data, it is not possible to simply write patron and transaction files out of one system and read them into another. Instead, it is often necessary to create a software interface. The interface is a piece of software that takes the old data and reformats it to match the new system. If fortunate, the old system might output data in a similar format to what the new system requires and the interface does not have to do much work and will be easy to design. But if the old system outputs in a dramatically different way than the new system requires, the interface can be complicated and time-consuming to design. For example, some data in the old system may not be needed in the new system so it must be discarded. Conversely, there could be data needed in the new system that is not in the old file. So this data has to be created. Occasionally the data is there, but it exists as a 3-digit numeric in the old system but the new system uses a 2-character alphabetic to express that same information. New values, then, must be calculated from the old data.

How We Did it (Mostly) Good

Broadly, circulation systems consist of 2 kinds of records-transaction records and patron records. Each different kind of record needs a different interface. In fact, the patron file might need 2 or 3 interfaces depending on how much data you choose to migrate. Consequently, considerable time is invested in creating these interfaces.

Throughout this process at the University of Colorado, Boulder, there were 3 people involved: myself; a systems analyst from our campus computing center; and a programmer who was also from our computing center. As Head of Circulation, my primary role was to determine which pieces of CSI data were relevant and to where they were to be transferred. The systems analyst wrote an outline describing how each file would be delivered, what the records were like, and where in the new records the fields were to be transferred. The programmer worked from that outline to code the interface.

Data Profile

The University Libraries at the University of Colorado, Boulder has a main library and five external branch libraries. There are eight circulation desks in the six different buildings. In all, there are forty-five III circulation terminals. At the time of migrating from CARL to III the patron file consisted of about 100,000 records. The transaction file contained about 125,000 transactions and there were nearly $35,000 in outstanding fines. With so many records, then, we had no choice but to perform an automated transfer of records.

The Transaction File

The transaction file consists of "who has what checked out." It is a relatively simple file that contains a patron id number, an item id number, a check out date and a due date.

In our situation, III required the library to supply a file structured such that each line in the file represents one check out transaction (see Illustration 1). Each line has 5 data fields. Each data field is separated by a colon (:). The colons are there to tell the computer where the field stops and starts. The fields are:


1-	transaction code - "o" for checkout

2-	transaction time - in PC time-stamp format which is yymmddhrmn

3-	item id number   - item barcode number, preceded by the text 'b'

4-	patron id number - unique patron identifier preceded by the text 'b'

5-	due date         - in PC time-stamp format, as above

Innovative Interfaces required the file to be sorted on the 4th element-the patron id number.

This is a very straight forward file structure. Illustration 2 is an example of how an extended portion of the file would look. And in our case it would be about 125,000 lines long because we had about 125,000 outstanding transactions.

Of course, that is the format in which III required the data to be delivered. Unfortunately, CSI does not deliver data in exactly this format. Instead, CSI output transaction data as outlined in Illustration 3:


1-  transaction code  -  "o" for overdue

			 "c" for charged

			 "l" for lost

2-  transaction time  -  in PC time-stamp format which is yymmddhrmn

3-  item id number    -  item barcode number, preceded by the text 'b'

4-  patron id number  -  unique patron identifier preceded by the text 'b'

5-  due date          -  in PC time-stamp format which is yymmddhrmn

This is, of course, very similar but not exactly what III calls for. There is a single difference in the file structure. CSI defines the transaction code to be 1 of 3 values-o, c, or l. III only allows a value of "o" in that position. Inserting a "c" or "l" in the transaction code field will cause the entire transaction to be rejected. So we had to write a piece of software-an interface-that inserted an "o" for every occurrence of the transaction code. After that, the file could be read into III.

Because we were already writing the interface, we also decided to design the interface to perform error checking. We knew, for example, that every patron id was to be 9 digits. We also knew that the item id's were to be 12 digits. So we had the interface check each number to be sure that it contained the proper number of digits. It also checked to see if any of the fields were missing-such as the transaction code, date/time, id numbers, or due date. If there was anything suspicious about a number or a field, those records would be deleted from the file and printed on a report.

Notice that there is no place in the CSI output file or even the III input file for recall/hold information. And that is because recall/hold status does not transfer. Suffice it to say that the recall/hold functions are so different between systems, that there is no good way to electronically transfer that information. We overcame this by creating, beforehand, a list of all the outstanding recall/holds. After we went live on III, we then manually recreated the recalls and holds.

So in this case, all that was involved was writing an interface that:
searched for occurrences of "l" and "c" and replaced them with "o's", searched for patron id's that did not equal 9 digits, deleted and printed them on a report, searched for item id's that did not have 12 digits, deleted and printed them on a report, and, searched for records that seemed to be missing any one of the 5 required fields, deleted them and printed them on a report.

Again, this was very straightforward. It was a small record-only 5 data elements-and only one change was being made. Moreover, that change was always the same-put an "o" in the first field. That was quick and easy.

The Patron Files

The patron file proved considerably more complicated to transfer. First, there were a lot more data fields. Second, it took several different interfaces to do the job.

Patron files consist of

name/address information,
notes, and,
financial information.

Each of these is stored in a different place in the CSI system. Each represents a different data extract and, consequently, each must have a different interface. So to fully transfer the patron records from CSI to III would have required 3 different interfaces.

After discussion, it was decided that our patron notes were not worth the time and expense to migrate and were abandoned. We also chose to abandon our CSI financial information. In our case, we found that we could get more detailed financial data from our University Bursar's computer system. So we extracted fine and lost book charges from our Bursar's accounting system and created an interface to reformat that data to be loaded into III. That was a seperate process that is not described in this essay.

What we did migrate out of CSI and into III was the name/address information. And, just as with the transaction file, III requires a very specific record format. The III patron record consists of 2 parts: a 9-element fixed field; and an 8-element variable field. These are:


9 Data Elements In A Single Fixed Field



1-			-	field code

2-			-	patron type

3-			-	patron code 1

4-			-	patron code 2

5-			-	patron code 3

6-			-	home library code

7-			-	patron message code

8-			-	patron block code

9-			-	patron expiration date (mm-dd-yy)



8 Data Elements In Variable Length  Fields



1-		u	-	institution assigned id

2-		b	-	barcode, patron id

3-		n	-	name (last name first)

4-		a	-	address

5-		h	-	address 2

6-		t	-	telephone number

7-		p	-	telephone number 2

8-		j	-	major



Illustration 4 depicts the III patron record as the University of Colorado was supply. Much of this record is obvious-the name, address, telephone numbers. There are the fixed field elements including the patron codes and message codes. Illustration depicts how III expected to be formatted for input into the circulation module.

But CSI output patron data fields considerably different than that specified by III. As shown in Illustration 5, there are considerably more data fields in the CSI record. Illustration 6 depicts the full record as output by CSI. Unlike the III record, there are no colons distinguishing fields-the only way to know where fields start and stop is to count the position. Also note that every field is fixed length, so there are considerable blank spaces in certain fields.

The task the interface must perform is to break apart the CSI record and reassemble the fields into the format required by III. That means, for example, taking the CSI patron id number and stripping out the "A9/" and inserting a "u" in front of the number and then count 9 digits and end the field and start with the next data element. The interface discarded CSI's "occurrence," "added borrower," and the "institution" fields because they did not relate to anything in the III record. The interface also recalculated the CSI borrower type and status fields to the equivalent III character.

Again, some fields would transfer easily. For example, the expiration date from the CSI record could be moved right into the III expiration date field. But in some instances, calculations would have to be performed on the data. For example, CSI delivers a patron block code as one of 3 possible values: "g" for good; "s" for soft block; and "x" for hard block. III also has a field for patron block code. But you can not simply move the CSI code into the III field because III has no "soft block" equivalent. In III, the patron is either blocked or they are not. Consequently, the interface must look at each block value as it transfers it to III. If it was a "g," it discarded it leaving that field blank. If it was an "s" or "x," it inserted an "c" into the block field which produces a message of "CARL block" in the III system.

The interface was designed to examine every data field in the CSI record, convert it to the III equivalent, then write the record to a file. Eventually, III personnel would FTP the file to their headquarters in Berkeley and load the data into the circulation system. It took considerable machine time to reformat 100,000 records into the III format-nearly continuous 70 hours. Because of the job size, special permission was required from our campus computing center to monopolize one of their computers for so long.

Conclusion

When migrating circulation files, each data element is its' own adventure. Some fields are very easily transferred to the new system, some are more difficult. Because of lack of a standardized record format, migrating circulation data can be very involved and time-consuming. Consequently, migrating data is a kind of "hidden cost" to changing systems. In addition to staff time, some very real costs are involved. These can include systems analysis, programming, and computer costs. But it is possible to successfully migrate circulation data from CARL. My advice is to:

start early; bring in people who are familiar with data processing. Without help from our computing center, this would never have been completed; split the tasks of migration between several experienced people; ask people who have already done it. There are a lot of people who can offer experiences and advice.

Finally, prepare yourself and your staff for the inevitable: some things are going to transfer incorrectly and a few things are not going to transfer at all. Circulation data migration is complicated. You can not foresee every outcome. There will be mistakes and a certain amount of clean-up afterwards. Just consider that to be part of the process.

________________________________________
This document may be circulated freely with the following statement included in its entirety:

Copyright Scott Seaman 1996.

This article was originally published in _LIBRES: Library and Information Science Electronic Journal_ (ISSN 1058-6768) June, 1996 Volume 6 Issue 1/2.
For any commercial use, or publication (including electronic journals), you must obtain the permission of the author:

Scott Seaman
University Of Colorado
Norlin Library
Cb 184 Rm E157
Boulder, Co 80309
seaman@spot.Colorado.Edu

To subscribe to LIBRES send e-mail message to listproc@info.curtin.edu.au with the text:
subscribe libres [your first name] [your last name] ________________________________________


Return to Contents Page
Return to Libres Home Page


This page is maintained by Derek Silvester, Dept of Information Studies, Curtin University of Technology, Perth, Western Australia.
Please sent comments and suggestions to Derek@biblio.curtin.edu.au
CRICOS provider code: 00301J