Can Computers Write? Advances in Automatic Content Production

Background: Automatic Content in the Legal Profession

automatic contentIn the summer of 1993, I had an enjoyable university summer job. The research interests of the institute that employed me included creating an artificial intelligence system that could produce automatic précis of legal judgments. The idea was that humans like me would create a précis by hand, by selecting enough sentences from the judgment that it provided a reliable summary. My selections were recorded, and the idea then was to use neural network software to learn from the work of the human editors so that the process of creating a précis could be automated. The researchers had uncovered certain linguistic insights. For example, the phrase “with all due respect” inevitably indicated a severe judicial disagreement with the argument in question, and strongly indicated the next few sentences were significant. Although The FLAIR (Faculty of Law Artificial Intelligence) research project was never commercialized as hoped, it was an early innovator of this field of producing machine-created documents.

Contracts that are made mostly from boilerplate are increasingly being generated by computer. Take a look at these sites for generating wills or divorce agreements or other contracts: http://www.legalwills.ca/ and http://www.lawdepot.ca/.

The use of a human lawyer is still recommended, particularly if there is any complication whatsoever in your circumstances. Clearly, though, if a task can be automated, increasingly it is being automated.

But What About Automatic Article Writing in Journalism?

But when it comes to journalism, writers might start to feel threatened. Surely writing articles about events and experiences that matter to humans requires human involvement. How can a software program, even with the most clever algorithms, replace a sportswriter like Walter “Red” Smith? At this time, we cannot expect a computer program to generate unique philosophical insights from a game, nor can we expect any original metaphors. But if you just want a reasonably well-written article about a discrete set of data, it turns out that software can do a pretty good job.

One goal of the software company Narrative Science, formed in 2010 in Chicago, is to create automatically written articles out of data sets. The data set might be the stats from a Little League game, or the data set might be a quarterly financial report.

If you have read either sports articles or financial articles, you have undoubtedly noticed that they often tend to sound formulaic. With baseball, for example, there are pitchers, hitters, strikeouts, home runs, innings, with the total number of concepts making a finite list of consistent elements that are just mixed and matched a bit. Narrative Science actually does use the skills of expert journalists, but they use these experts to create the templates, control the vocabulary, and set up rules about the production of content.

Will automatic production of articles take away the jobs of journalists, who may already feel under siege, as newspapers continue to fold practically every day? Narrative Science says no. There is so much data collected, that it is impossible for humans to analyze manually, let alone write about.  Some examples of what Narrative Science can do are shown here:

automatic content example

Companies such as Narrative Sciences can help produce automatic content from data.

Building Automatic Content: Writing, or Data Analysis?

For example, do you have a written narrative of your energy usage, even if you have a smartmeter? In theory, intelligent software could analyze the data set that is your energy usage, and write you a personalized advisory about it. I am imagining something along these lines (I have not confirmed this idea with Narrative Science, nor used their software).

If you set your dryer on a timer, you could run it between 2 AM to 4 AM, rather than during peak hours as you do currently, and therefore reduce your energy costs by 5%. If you reduced the temperature of your hot water by 5 degrees, you would reduce your energy costs by 1%. However, most importantly if you turned the heat down to 60 when you are out, based on last month’s patterns, you would save 15% of your energy costs.

Seen in this light, software just provides a way of enabling more data interpretation. Undoubtedly, if you had sat down with your energy data set and Microsoft Excel, you could have eventually come to the same conclusions. But given you might not be an expert at the type of analysis required, why not rely on a software program that is programmed to do exactly that? Narrative Science software is not just for writing, but for blending analysis and writing.

Narrative Science is banking on the idea that there are abundant micro-markets for articles written about relatively minor events that would otherwise never warrant even a high-school cub reporter’s time. It’s a fair bet that not every volleyball or curling match gets its own article. With the judicious use of templates and rules, Narrative Science claims that its software can adapt any desired tone. For Little League games, snarky articles would likely be unwelcome, but a triumphant tone might be favored for an article aimed at the fans of the winning college football team.

Machine Translation

Machine translation is another form of automatic content production. Companies like SDL International are finding machine translation to be increasingly cost-effective and useful when used in domains with carefully structured text and limited vocabulary. Much to my surprise, I discovered in a personal conversation at LavaCon 2012 that a major retailer uses machine translation for the Spanish translation of its product catalog. SDL International claims that machine translation is “up to 50% faster and 40% less cost than traditional human translation.” Typically machine translation is not used alone, but includes a cycle for human editing. As someone who started using Altavista BabelFish back in the late 1990s, and now uses Google Translate today (for fun and not profit), I have noticed definite improvements in contextual interpretations and general quality.

Computational Linguistics

If you are interested in the theoretical underpinnings of automatic content production and machine translation, look up computational linguistics, a field devoted to discovering the rules that underpin communication and to approaching the subject of natural language from a computational standpoint. There is scarcely any field that better demonstrates the essential unity of the humanities and the sciences. The Association for Computational Linguistics can serve a good starting place for information on this burgeoning field.

Automation: It’s Everywhere!

Writing is not the only human art form that is increasingly subject to automation. Various experts are working on computer-generated music and visual art as well.  Human creativity is not threatened if a machine can be programmed to do particular tasks. First of all, programming the machine takes a huge amount of creative effort on the part of both the software developer and the creative experts, whether these be journalists, musicians, or artists. Second, the use of automation may free up huge chunks of time to focus on the aspects of the creative process that cannot yet be programmed.  As technical communicators, fear is not going to get us anywhere, whether our concern is being replaced by technology or by outsourcing. We have to find ways to add value other than through mundane automatic content production.

Disclaimer: I work for a company that uses Narrative Science for financial reports. This work occurs in a separate division than the one I am employed by, and I have no involvement with that work.

Further Reading

Atlantic article: Can the Computers at Narrative Science Replace Paid Writers?