A lot of people have been asking me: “What are you doing?” or “So, what is your internship about?” It’s a bit hard to answer these questions because on one hand you can spend ages going on about the actual work, and on the other hand you can spend ages going on about libraries and information science (LIS). And, as if that wasn’t enough, you can go on and on about the organization itself. What ends up happening for me usually depends on what I’ve been doing most in the recent moments.
Let’s Talk About Work
Any internship is going to involve putting in some grunt work, just like a job. It’s inevitable. The catalog at ODC has had some significant issues with it and the only way to actually get those issues “solved” is to go through the catalog in its entirety, record by record. If this sounds challenging and potentially unbearable, I can concur: it can be like purgatory at times. (Fortunately, as I’ll describe below, it’s off-set by other responsibilities). When I go into the office, there are so many things going on, but there’s always that mountain of data I have to sort through. I look at it like a marathon: it’s not an issue of intellectual impossibility, but rather an endurance test. And so I have the lovely pleasure of splicing up the “work” for better management.
Go figure: I just started my online Project Management course at UW (which I’ll be taking throughout this academic quarter here in Cambodia), and I’m already becoming a pro at project management quickly through this practical experience. I call that lucky! But what it looks like is this: there’s a project management site (a cloud application, which actually just got implemented into the ODC project), and each person on the team has tasks assigned to her/him. I’ve taken the liberty of creating a major task for cataloging revisions, and creating sub-tasks for every letter (I’ve been reviewing the entire catalog systematically, by Author Last Name). I’ve given myself two days to “complete” a letter worth of records (remember: I’m doing many other things each day on top of the catalog alone), and every time I do go through each letter, I cross it off in the app. On one hand, it keeps track of my progress in a transparent way, but it also provides a nice ego boost to show my accomplishment, my nearing to the goal.
What does the work actually mean, though? What am I actually doing when I say I’m “revising the catalog?” Over the past year, the catalogers who were assigned to adding new records, new entries, were given the responsibility with little oversight by editors. The first problem is that we have folks who aren’t native English speakers cataloging in English (and sometimes Khmer, actually, using unicode), and so there are numerous linguistic errors. From grammar to spelling, the records are rough. And anyone who’s into searching for information knows what difference a character in a field can make, and how it can break a search experience. In most cases, though, the errors I’ve described are either minimal or non-existent; but they do exist, and the trouble is that there’s no report or query a person can make to highlight all the errors.
The second problem of the catalog, as it stands, is the lack of standardization. In some cases, people have used Z39.50 to find records in other libraries, or have downloaded MARC records from the Library of Congress (or elsewhere) and imported them. In other cases, librarians (presumably Margaret) have created records (usually for the local resources that haven’t found use outside of this environment), and the records are great–they’re descriptive and all the necessary fields (and then some) are filled. And then, of course, you have records that are, simply but accurately put, incomplete. Again, untrained library folks have attempted to do their job, but have left out important fields, or incorrectly filled out fields. One of the troubles I’ve had is reviewing and converting authority files. The majority of the Author authorities have been entered under personal name, when 50% are individuals, and 50% are corporate authors (organizations, the government, corporate bodies, etc). This has been a hassle but it’s part of the grind. In other cases, name fields and title fields and subtitle fields and extent fields and keyword fields and subject fields are either present and lacking or not present at all. In some cases there haven’t been any major access points: the headings that we would normally think of are completely empty, and the only information in the record describes a resource that lacks a shell to keep everything in place.
When I arrived, the first thing I thought I’d be able to do is streamline he cataloging work. But I’m realizing that the project’s needs are very diverse, and the resources of the individuals working on the project are spread thin. I haven’t even mentioned design in this post: to get some very basic design changes (in the OPAC and on the server) might take a lot longer than I thought, even though the changes have been identified and expressed and agreed upon. Cataloging is the same way: there aren’t really any helpers for me. I am the helper. A super helper, sure, but a helper nonetheless. One thing that I’ll be taking care of mid-next-week is standardization. I’m going to be retraining everyone who worked on the library in the past, as well as all the editors, on how to appropriately catalog monograph resources. I haven’t really had a chance to think about cataloging videos and other visual media, though that’s certainly possible. For now, we’ll stick with the books and the reports, since that’s our major influx of material (mostly digital, but that’s another conversation).
It’s about control.
Standardization is all about keeping control over the data, and I’m in the lucky position of identifying what fields are control fields, and what the vocabulary can be used for those fields, and how those fields should be displayed. During the meeting next week, I will go into the necessary fields: what’s needed for each document at a minimum, as well as the fields that are allowed as supplemental. Here’s the reason I’m not saying “go crazy and use every field.” In a library with a full time librarian, that might be a wise decision. And ODC might be hiring a Cambodian librarian at some point in the near future. But how long will that take? It’s not immediate. What is immediate is having folks adding new material to the catalog and being consistent about it.
Redundancy is a major issue. So is categorization. In some cases records have information repeated (IE the keywords are the same as the title or the keywords are the same as the subject headings). In some cases keywords are used in each record and in other cases subject headings. ODC recently rolled out a taxonomy (that has finally seen version 1.0 and is now at 1.1) and Margaret and I decided on using 654 in MARC to identify the taxonomy in each record. The taxonomy fortunately is only two levels deep so it’s possible to be accurately descriptive in MARC. At some point that might need to change. It’s also hard to implement the taxonomy when there are cool things like FAST going on in our RDA universe. But when all is said and done, there are going to be records, they’re going to look like each other, and they’re going to have all the information that the common researcher would need when visiting the site.
To go back to standardization, I also have to mention review. On one hand, understanding the catalog fields is necessary. On the other hand, ensuring integrity and completeness of the field is an issue. And so I will have to construct (probably over the weekend, or on Monday) a workflow procedure for the editors. I really want to have faith that anyone trained in cataloging will do a perfect job, but after having fixed countless errors, I don’t think that faith can be established. The workflow and procedural content I draft will hopefully become policy (or adapted into policy) by a future librarian at ODC. Until that time, the documentation can be used as a reference resource for folks who need assistance, reminders, and for folks who will have to review each new MARC record. I think it’s feasible.
Holiday in Cambodia
This post is coming a bit early as the work week was only three days this week. It’s currently the height/climax of Pchum Ben Festival, a Buddhist holiday where most Cambodians return to their home villages and spend time with their family, bring food to the monks (to give to the ancestors), and take time to relax and reflect on life. If you’re an expat, you’re most likely going to wind up on some beach or island enjoying yourself and having fun. I was supposed to go to Laos, but this holiday remarkably falls in the period of time at the end of the rainy season, where massive flooding is regular throughout the entire country. The folks who invited me to Laos decided not to risk the roads, and instead went south to Koh Rong, a popular destination for escape and moments of paradise.
I decided to stay in Phnom Penh because I heard the city gets a bit quiet. And so far I’ve noticed it. Actually, jut yesterday was one of the largest floods I’ve seen here, conveniently on the first major holiday day. I road my bike through water that was a foot deep, ate Vietnamese food, and enjoyed spending many hours alone in my apartment, reading a book, listening to music, and catching up on some LIS-related material, such as: metadata, linked data, open libraries (a MUST READ for everyone interested in any library or digital repository), public domain, data visualization and spatial reasoning, and cataloging, to name a few. I also took a moment to review Beyond Access, which I’d love to see get started in Cambodia, maybe through ODC-as-Catalyst, and Open Library, which might inspire some projects around here.
Okay, well, I’ve filled up a post with the reflections above. I think next time I’ll talk a little bit more about design, and eventually I’ll touch on the library as one component of many in the ODC toolbox. Collaboration with tools is something ODC is exploring, and it’s very exciting. I had to get the cataloging piece off my chest, though!