Structure and Content


As part of developing sustainable business models for online media, I have tried to catch-up with what Reg Chua is thinking about the structure of journalism at his blog “(Re)Structuring Journalism”.  I have followed Reg on this topic for about a year.  But I continue to struggle with what are we structuring and what problem are we solving.  I think Reg would argue that through structure we create enduring context, which leads to engagement, which should lead to value.  But there are a lot of turns in that argument and to put some of this into practice, I have tried to break it down a bit further.

As I read through my notes on this – yes, I still keep a real, paper notebook – the discussion has stretched the term “structure” quite a bit.  So, I am going to break it up into types of structure and see if that doesn’t help me get to something that I can take to market. Let’s start with a few definitions:

  • Data (or facts):  These are the elements that can be verified and make up the main actors in online content – person, place, institution, location, date, relationship between people/things/institutions.
  • Rhetoric: The organization of statements about this data to deliver a message/thought/story/argument.  Rhetoric exists within the content.
  • Topic:  These are subjective classifications of content by some organizing principal – topic of content, purpose of writing (review, news, opinion), or something else entirely.  Topic exists above the content.
  • Audience:  Stretching here.  But there are clearly come structures that social networks are enabling that create audience structures around the engagement with specific content.

These all apply to something I will call an element of content (story, essay, video, review, game, …)  I will leave to the J-School grads to determine what qualities these content elements must have to be labeled “journalism”.

Data structures:  Many people have written about the value in linked data, microformats,… for content, especially news and directory content.  Tim Berners-Lee even made an impassioned plea for open data at TED last year – The Next Web he called it.  I have not been through all the literature, but the two discussions or practices that I am most familiar with are micro-formats and structured vertical databases.

As a friend in the vertical search/database space reminds me, the enthusiasm for microformats has waxed and waned a couple of times through the past five years, but there are still very few broadly accepted standards beyond hcard and hreview that are being used.  The vertical database efforts continue to evolve and grow around different specialty interests, but the data is usually captive within the site, which created the original structure.  Companies are finally beginning to make their datasets and the accompanying structure open to other companies to use.  I have always encountered two problems with creating proprietary databases  within media companies  – I have presented and worked on them in at least four news organizations.  The first problem is really at creation – what types of data to focus on and how to resource building the initial dataset (proof of concept).  This is the classic business model question, what do we do well, what do we know more about than others and how much should we invest to generate a return.  But the real killer in this and it is linked to return – how do I create a work process to maintain, audit and improve the dataset.  This is where these projects notoriously fall apart in mainstream newsrooms.  This is traditionally not a reporter’s job and not an editor’s job, it gets handed to librarians and archivists, if they haven’t been fired.

As Reg mentions in his blog, probably the most important innovation required to solve the second problem is improvements to major CMS’s that “require” key data to be identified/linked to some internal database and then of course a matching work flow to make sure the data is created.  For example, if you write an article on Richard Li, you should have to link this name to an internal Wikipedia-like entry that provides the background and past links to articles on Richard Li.  If this is his first mention, then create an entry.  Same can be done for places, organizations, etc.  Newsrooms have always uniquely had access to the data, but never had the discipline or the financial incentives to change their work flows sufficiently to make capturing this data as important as getting the individual story on the site or on press.

I have hope for this category of structured data from news organizations, since it does seem to have value, and there are examples to look at for guidance, Wen Wei Publishing Company’s ChinaVitae.com, and Techcrunch’s Crunchbase.  Reg also highlights a few on ReStructuring Journalism – Politifact from Poynter and WhoRunsGov.com from The Washington Post.  I will feel especially encouraged when one company feels they have created something of sufficient value to build a business around whether through advertising or more likely through subscriptions of some sort.

Rhetorical Structures:  Structures like this still seem very hard for me to understand where or how much value is created, with the exception of some very specific examples that Reg highlights in the blog, e.g. sports results, market closings, disaster stories, stock performance, perhaps movie reviews, …   Even these may make sense only within a single large organization where the rhetoric/content structure for Stock Market Opening and other regular reports has been standardized into something that looks like boilerplate.  Other than these, I am not sure that there are enough real identifiable standards in how content is organized to extract value from them in aggregate. So, for the time being, this one hard figure out what to build in order to productize this.

Topical Structures:  Organizing content units (stories, video, photos, …) into topical groups holds some promise.  There are two executions of this.  One execution is the attempt to organize individual stories around a topic to give the stories in aggregate context.  It is in a way a reaction to the traditionally episodic nature of journalism and the new expandable media that will allow a story to be told on a continuous basis.  Any site that has organized stories around a topic has tried this. But with the exception of the Google’s Living Stories Beta, they all seem to lack any added value beyond tagging the story and then displaying them in reverse chronological order.  Few editors have organized these pages to tell the story better in aggregate than in the individual story.  There are rarely synopses of the topic. Why is/was the topic important?  Who were the key actors in the topic (remember that database of facts)? Generally, the newsroom treats tagging stories by topic as an added nuisance and so value created is not too great.  Show me a news organization that has elevated managing the “topics” they cover to a regular “beat” and I will change my mind.  For the most part these pages are really aggregation pages to improve search results, not provided to add context to individual stories.

The second execution of building topical structures is revenue-driven. Topix.com is the best example that I have encountered.  Topix crawls thousands of websites and then indexes their content by topic and location.  The result is an index of thousands of local and topical pages with accompanying discussion groups and advertising as an overlay.  But other than combining the pieces, there is very little value added to the topical categories.  So, the site actually serves as a somewhat more precise search.  The question is exactly how long will that advantage exist as Google and Bing continue to fine tune their own news search engines to deliver the same results.

Audience Structures: The last area is one that is really just emerging, using content to create structures of individual interest.  Think of the content as the bait that entices an interested consumer to subscribe, follow, friend, like, share – take some action of engagement.  Choose your word.  There is a lot of discussion around this given the rapid growth of FaceBook and Twitter.  There are a handful of examples of success here.  And in the end you quickly find yourself combining these examples with subscription models and special access models.  I will expand on this structure more as part of a product development effort in SE Asia.

As I look back at the musings above, there are many barriers to execution.  One, unfortunately is the traditional workflow and workload of the mainstream newsroom.  It is not for lack of desire; it is usually just fatigue.  But it is a barrier nonetheless.  The second obstacle is clearly technology, to create data, topical or audience structures from content in the news, you need to have the technology capabilities and the technology flexibility to build the databases and the data capture tools that you need.  I have to believe that the need for these tools would not be a surprise to the major CMS’s.  So, one of my next steps will be to survey the open source and proprietary CMS’s to understand what their capabilities are.  Finally, there is the planning and execution aspect of making this kind of move as a regular part of the news gathering and reporting work flow.

For my work with for ICFJ as a Knight Fellow, I am exploring two areas of structure for clients in Southeast Asia – one topical structures, both in changes to the CMS as well as in working with the newsrooms to create work flow that encourages and rewards topic editing and management.  The other is working with a very avid contributor group to use content to build out audience structures and engagement.  This last area will receive several more posts as we start building out the product requirements for the “engaged social user base”.

I have found trying to stay on top of all the various developments in the online content space a challenge while trying to work with clients, so please add examples, suggested readings, or counter examples. Hat tip to Reg in Hong Kong for pushing me on these topics.

3 thoughts on “Structure and Content

  1. Structured data will come of age soon – pudits predicted 2010 was the year, but as with your Richard Li wiki example above, it will take data overload for it to be accepted and used routinely. That day can’t be far away – personal data storage is becoming harder to manage as we are able to store more and more. hCard, hReview and hProduct will be the start, but soon all data will need to be better structured just to simplify the sorting and sifting processes we are likely to apply.

  2. Ross, interesting post. I wrote up a tad more at my blog (http://structureofnews.wordpress.com/2010/09/24/structuring-structure/), but the gist is that audience structure may be the fastest way forward, since it doesn’t entail overcoming newsroom culture. The only problem there is that newsroom culture in theory brings verified data, and it would be great if we could marry that part of the culture with a willingness to contribute data.
    The other structures probably require too many moving parts to link up at the same time – unless we can start with a small team and a small project.
    What’s probably worth trying is to figure out some marriage of audience and public interest data that they’re able to contribute, and see if that works as an experiment. (I’m sure there must be some out there.)

  3. Hey Ross,
    Greetings from various of your HK pals. Typing this from SCMP, where I just had a very interesting conversation with Reg and Yolanda, mostly about structured journalism. Look forward to catching up via SKYPE before long.
    – Bill

Leave a comment