Technical Communication, Data Conversion and the Entrepreneur’s Evolution:

An Interview with Mark Gross, CEO of Data Conversion Laboratory

There are two parts to capitalizing on an opportunity. First, you have to recognize that the opportunity exists. A lot of people can do that one. Harder is the second part, actually taking advantage of that opportunity. The ability to take that step is what separates the dreamer from the entrepreneur.

I had the pleasure of speaking at length with Mark Gross, co-founder and CEO of Data Conversion Laboratory (DCL), and he has proven over the last thirty years that he can definitely handle both parts of the opportunity equation. DCL specializes in data and document conversion. Taking information in one type of format and transforming it into another format. Nobody does that better than DCL, and they have been doing it longer than anybody else in the industry.

Evolving Document Conversion from Large Systems to the PC

The company was founded in 1981, before technologies such as SGML and XML even made their first appearance. There were some markup languages around in those days, such as GML, but there’s no question that companies had very different requirements in those days. Mark identifies several different evolutions in the types of conversion that take place, but his business has always been about taking existing documents and making them usable in new technologies.

In 1981, the Apple II home computer had already been released but the IBM personal computer had not. At that time Mark was a consultant for Arthur Young, one of the big eight accounting consulting firms, working to build very large systems for corporate clients. When he saw the Apple II, Mark took a fancy to it and set himself to figuring out how this new breed of computer might be used in a large corporate environment. He tried to interest Arthur Young in setting up a business unit supporting personal computers in corporates, but he was “laughed off” and generally met with skepticism that anybody would use those “little things.” Mark decided the time was right to found his own company, focusing on corporations using personal computers.

Shortly thereafter, IBM released its first personal computers and that triggered greater corporate interest in the new technology. Mark started out implementing systems for his clients, but soon became interested in the process for moving data from the older large and time-sharing systems, that era’s version of “big data”, to the new smaller computers. Over the next year, he slowly began developing a specialty in data conversion, including developing custom software and hardware. “Pretty soon we were like the expert in the industry, because nobody else was really doing it.”

When Mark first started, the only software available for working with documents was the early word processors such as WordStar. The conversions at that time consisted of getting text into word processor formats. A few years later, WYSIWYG publishing systems began to appear and companies were able to started producing documents in-house instead of sending them out for typesetting and composition. Those systems really raised the bar for what companies wanted to do with their documents. They now wanted to get their content in a format that they could use with the new publishing systems. This type of conversion was the norm for the rest of the 1980s. DCL often did multiple conversions of the same content for a company over the course of ten or twelve years to move it to succeeding generations of these publishing systems.

Moving to Markup: the Next Stage of Document Conversion

The modern world of document conversion started with the introduction and growing popularity of SGML in the early 1990s. Mark was intrigued by this new technology and began learning more about it. “When I saw SGML, I really thought that suddenly put into an operational context how you could actually move information back and forth and maintain the information content and not just the look.” It took a year or two before SGML projects began showing up on DCL’s radar. Things really began to take off when DCL took on a conversion project for General Motors converting a very large number of standards documents to SGML The task was considered impossible at the time, but DCL succeeded and thereby proved that you could do large scale movement of information into SGML. For the next five to seven years, SGML grew in importance as the target for document conversion projects.

The late 1990’s ushered in the next generation of conversion, when XML took the world by storm. XML was not that different from SGML, but it had a great deal more hype and was easier to work with than SGML. Because XML opened up many different industries to storing content in a markup language, XML became the focus of DCL’s projects for the rest of the 1990s and into the 2000s. During that time, the arrival of DITA on the scene caused a mini-generational leap, as companies were more able to use it as an “out of the box solution that fit a range of needs” and no longer had to develop their own document types.

Most recently, eBooks have risen in importance as the latest generation of documents. Mark believes that each new generation has raised the bar on how people use content and are able to access the information in that content. eBooks provide a convenient, very portable way to access information on a number of different devices. Interactive Electronic Technical Manuals (IETM) were the first generation of eBooks, but were really designed for applications like repairing airplanes. What has really driven the recent explosion in eBooks is the availability of low-cost devices like the Kindle that enable anybody to read an eBook. eBooks can also now support advanced information delivery and access through three-dimensional graphics, dictionary lookups, and so forth.

Drivers for Data Conversion Projects

Metadata has also become increasingly important as part of the document conversion process. Mark says that “everything’s metadata now,” especially once you get past novels to other types of content. While DCL does convert a lot of novels, they have a special expertise in scientific, medical, and similar publications with additional and complex information requirements associated with the content. Now DCL’s client are usually not taking legacy content and just converting it into a form that you can put out there. DCL also often has to pull information from other places to make the content richer. They not only are extracting metadata from the document content, but also retrieving information from other areas like databases. For almost every client DCL is working with now, they are doing extensive metadata work as part of the conversion.

So why does a company decide to do a data conversion project? Mark explains that most companies find it increasingly difficult to manage the vast quantities of information they are collecting. A typical company in the industrial world might have hundreds or thousands of different manuals to track, many of which contain similar information. “Managing all of this information means it just makes commercial sense for them to take all of the information and turn it into DITA or some other kind of form that they can manage better.” Also, companies have now become international and have to translate their content into other languages, especially for consumer products. While the typical company translates to more than 20 languages, Mark has some clients that translate content into sixty, seventy, or even eighty languages. Translation can be an incredible cost for a company, considering that there are 23 official European languages alone, so having that content in XML makes that process easier and cheaper.

Another driver is the sheer number of products and variations on those products that some companies produce. Maintaining the documents to support those products and their variations pretty much requires a content management system (CMS) and a document format that can be effectively managed in that CMS. In addition, organizations that produce scientific, technical, and standards information now find that the internet and eBooks makes their information more accessible than it has ever been before. That organization “suddenly discovers that their market is no longer just their own organization. It’s a much wider market, because there’s lots of people interested in those documents. They just never could find them before. Now they have a bigger market, but in order to sell to that bigger market they’ve got to have their material in a form that is easy to sell.” A lot of these organizations are finding that content they produced as long as sixty years ago still has value. The people who want to buy it just could never find it before. Converting that legacy content to XML makes it much easier to sell.

Companies with significant compliance requirements for the government often turn to a data conversion project to reduce the maintenance burden. In the Pharmaceutical industry, they are required to provide a specific set of information to include with prescription medication. DCL has around two hundred customers for whom they take the information required for these medications and provide it to the FDA as a service for those companies. Having DCL take care of that is often the least expensive option, especially if the company has a small number of products. Mark also points out that the federal government is requiring more and more that companies provide the required information in XML for various regulated industries.

Choosing a Path for Data Conversion: In-House or Vendor Support?

Once a company determines that they need to do a conversion project, how can they determine whether it can be handled in-house or should be done by a vendor like DCL? Mark believes that the main determining factor is the amount of subject matter expertise that needs to be transferred over. Unless a company is going to have to do conversion projects on a constant basis, very few companies are going to save money by doing their own conversion efforts. The exception is when the converted information is so specialized that the company’s experts have to review almost every page of the converted material. Even in that case, Mark feels “most companies overestimate how much subject matter expertise is really required to do the work – and often you can do the work with less skilled people as long as there is a good method for subject matter experts to review ambiguities.”

In Mark’s experience, companies underestimate what’s involved in doing a conversion project. While small conversion projects or those involving specialized, inconsistent content might be better handled in-house, the actual cost of doing the conversion is really just the tip of the iceberg and might be only ten percent of the real cost. The often hidden costs include the time involved in planning what needs to be done; the quality review and quality control required to ensure the conversion was done correctly; fixing any errors found in the conversion; and the overall cost of managing the effort.

Most companies radically underestimate the costs of doing a conversion project, not just in money but in the distractions caused by an internal effort. For example, a company might take a developer who specializes in writing GUIs and put him to work for three weeks developing conversion software. In that case, you’ve lost that resource for the work you really need that person doing. Since conversions are not the developer’s area of expertise, he is also likely to underestimate the time and effort involved to do the work.

Mark feels DCL really excels at managing the conversion effort. For a project, you not only have to do the conversion but you also have to check the converted material to make sure it’s correct. DCL prides itself on sending converted material back to its customers that doesn’t really need a lot of review. It’s “ready to go.” DCL actually provides a tool to customers enabling them to compute their return on investment for conversion projects.

DCL’s primary competition for a data conversion project is a company deciding to do it themselves, perhaps by using a tool from another vendor. There are also companies in India that are doing conversions of all kinds. However, Mark feels that many of those companies really only do the actual conversion without providing the essential overall management and quality checking offered by DCL. DCL uses software to automate as much of the work as possible. The work requires proofreading and review, a very labor intensive process. DCL uses offshore resources when possible to reduce those costs. However for projects that have significant data security concerns, such as government and financial sectors where the information should not leave the country, DCL uses onshore resources. Mark feels DCL is very competitive, not just on data conversion activities, but also on the management and quality assurance work they bring to a project.

Pricing Considerations for Data Conversion Projects

As for pricing, Mark relays the following: “There are many variables that affect conversion pricing, including source and target formats, volume, delivery time, content complexity, and accuracy expectations. For example, converting a typical trade book to an eBook will cost less than converting complex books such as textbooks, art books, children’s books, and other books requiring special formatting.  In addition, you need to evaluate what’s included in the price – it’s not just the ‘price per page.’ Rather, it is important to determine if the pricing presented is all inclusive and guaranteed. Ask the vendor: will they provide you the support to make sure that everything gets done on time, and that you don’t have to go back to recheck for accuracy and reproof everything?”

The DCL Project Methodology

A typical DCL conversion project follows a defined methodology. The first phase comprises analysis to develop a set of specifications with the customer that outline what needs to be done. Specifications can be simple, such as in an eBook project, but even those projects require definition of ten or fifteen parameters, such as the eBook format, colors, and so forth. For large projects, DCL then does a small sample conversion (called a “hand-coded sample”), based on the specifications, which the customer reviews and tests in their systems if necessary. This phase often results in revisions to the specifications. The next phase is a “production sample,” where DCL configures its conversion software for that customer’s unique needs. After configuring the software, DCL runs a larger sample volume of a few hundred pages through the entire system to demonstrate how that larger volume conversion will look. Once the customer reviews and approves the production sample, the project goes into the production phase where thousands of pages a week are run through the automated process. Mark notes that the pre-production phases generally take three to five weeks with most of that time being customer reviews of the initial output.

The Future of Data Conversion and DCL Services

DCL continues to focus on taking advantage of new opportunities and recently announced a newly upgraded service called DCL On Demand. This service enables customers to request a conversion project through the DCL website for as little as a single document. DCL On Demand actually grew out of a predecessor project called “books2bytes” started about ten years ago. That project was aimed at mid-sized publishers with out-of-print books that they wanted to put back into print. About two years ago, DCL decided to further automate the service and roll it out to a wider market than the initial few hundred publishers. Today, thanks to companies like Amazon, people are used to ordering things online and DCL wanted to take advantage of that trend.

DCL On Demand is intended for simpler data conversion projects where the customers can define the parameters themselves. This self-service approach keeps costs down as well. For example, converting a document to an eBook typically only costs a few hundred dollars. The on demand service offers a quite rapid turnaround that averages three to four days to produce a converted eBook. Even in this short time frame, DCL reviews every page in an eBook before sending it to the customer. The original on demand service only handled conversions to eBooks, but DCL recently added a second service for more general projects. DCL plans to add additional services to the on demand area over the next couple of years.

When Mark looks into the future, he doesn’t feel he can really predict what DCL might be converting then. Five years ago, he would not have predicted that eBooks would be such a large part of DCL’s business today. He also notes that a lot of the conversion work DCL takes on is for historical documents. As a recent example, a DCL client had gone out and searched various university libraries to find and license a very complete collection of Civil War newspapers and wanted to make that information available to both historians and Civil War buffs. DCL scanned the newspapers and produced XML versions of those newspapers for that purpose.

In general, Mark is not really concerned with the type of devices on which converted information might be viewed in the future. Today, to the trend is mobile platforms like smart phones and tablets, but we might be displaying information directly to the retina in a few years. Regardless, the key is having the converted information available in “good” XML. Once the information is in coded in that format, it retains its value thorough future years regardless of how it is presented.

Another cool thing DCL offers is the DCL Learning Series. This free series of webinars often features industry leaders such as JoAnn Hackos and Ann Rockley, covering topics that might have nothing to do with document conversion, but everything to do with topics of concern to technical communicators. Mark reports that the webinars have been well received with some attracting 250-300 people who stay on for the full hour. “People are thirsting for this type of information,” he says. DCL started the series in 2011, originally doing one webinar every six weeks, but now often doing more than one a month. He expects the frequency to increase over time, because of the large number of topics to cover and the value they provide to DCL’s customers.

Mark says that he often runs into people in the industry he has not seen in a few years and they ask him what he’s been up to. Mark always tells them “converting data.” He expects he’ll have the same answer five years from now.

Chris Goolsby

Chris Goolsby first posted on TECHWR-L in the early 90s soon after he turned from creative to technical writing. That switch was brought on by the pressing need to consistently put food on the table. He started working with hypertext before the internet was invented, was a Usenet early adopter, and has watched the world wide web build up strand by strand over the course of his career. He’s used markup languages from the very beginning, starting with nroof/troof on Unix work stations and currently working in DITA. Chris works for PTC Arbortext in Ann Arbor, Michigan. He is also an active member of the OASIS DITA technical committees. Chris lives with his wife, children, dog, and cat in a 100 year old house. He’s still working on that novel in his spare time.

Read more articles from Chris Goolsby