How Outside, FT and Dow Jones use AI for search and solutions

view original post

These days, digital media companies are all trying to figure out how to best incorporate AI into their products, services and capabilities, via partnerships or by building their own. The goal is to gain a competitive edge as they tailor AI capabilities to their audiences, subscribers and clients’ specific needs.

By leveraging proprietary Large Language Models (LLMs) digital media companies have a new tool in their toolboxes. These offerings offer differentiation and added value, enhanced audience engagement and user experience. These proprietary LLMs also set them apart from companies that are opting for licensing partnerships with other LLMs, which offer more generalized knowledge bases and draw from a wide range of sources in terms of subject matter and quality.

A growing number of digital media companies are rolling out their own LLM-based generative AI features for search and data-based purposes to enhance user experience and create fine-tuned solutions. In addition to looking at several of the offerings media companies are bringing to market, we spoke to Dow Jones, Financial Times and Outside Inc. about the generative AI tools they’ve built and explore the strategies behind them.

Digital media companies are harnessing the power of generative AI to unlock the full potential of their own – sometimes vast amounts – of proprietary information. These new products allow them to offer valuable, personalized, and accessible content to their audiences, subscribers, customers and clients. 

Take for example, Bloomberg, which released a research paper in March detailing the development of its new large-scale generative AI model called BloombergGPT. The LLM was trained on a wide range of financial data to assist Bloomberg in improving existing financial natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, news classification, and question answering, among others. In addition, the tool will help Bloomberg customers organize the vast quantities of data available on the Bloomberg Terminal in ways that suit their specific needs. 

Launched in beta June 4, Fortune partnered with Accenture to create a generative AI product called Fortune Analytics. The tool delivers ChatGPT-style responses based on 20 years of financial data from the Fortune 500 and Global 500 lists, as well as related articles, and helps customers build graphic visualizations. 

Generative AI helps customers speed up processes

A deeper discussion of how digital media companies are using AI provides insights to help others understand the potential to leverage the technology for their own needs. Dow Jones, for example uses Generative AI for a platform that helps customers meet compliance requirements. 

Dow Jones Risk Compliance is a global provider of risk and compliance solutions across banks and corporations which helps organizations perform checks on their counterparties. They do that from the perspective of complying with anti-money laundering regulation, anti-corruption regulation, looking to also mitigate supply chain risk and reputational issues. Dow Jones Risk Compliance provides tools that allow customers to search data sets and help manage regulatory and reputational risk. 

In April, Dow Jones Risk & Compliance launched an AI-powered research platform for clients that enables organizations to build an investigative due diligence report covering multiple sources in as little as five minutes. Called Dow Jones Integrity Check, the research platform is a fully automated solution that goes beyond screening to identify risks and red flags from thousands of data sources.

The planning for Dow Jones Integrity Check goes back a few years, as the company sought to provide its customers with a quicker way to do due diligence on their counterparties, Joel Lange, executive Vice President and General Manager, Risk and Research at Dow Jones explained.

Lange said that Dow Jones effectively built a platform which automatically creates a report for customers on a person or company, using technology from AI firm Xapien. It brings together Dow Jones’ data that is plugged into other data sets, corporate registrar information, and wider web content. It then leverages the platform’s Generative AI capability to produce a piece of analysis or a report. 

Dow Jones Risk & Compliance customers use their technology to make critical, often complex, business decisions. Often the data collection process can be incredibly time consuming, taking days if not weeks.

The new tool “provides investigations, teams, banks and corporations with initial due diligence. Essentially it’s a starting point for them to conduct their due diligence, effectively automating a lot of that data collection process,” according to Lange.

Lange points out that the compliance field is always in need of increased efficiency. However, it carries with it great risk to reputation. Dow Jones Integrity Check was designed to reshape compliance workflows, creating an additional layer of investigation that can be deployed at scale. “What we’re doing here is enabling them to more rapidly and efficiently aggregate, consolidate, and bring information to the fore, which they can then analyze and then take that investigation further to finalize an outcome,” Lange said. 

Regardless of the quality of the generated results, most experts believe that it is important to have a human in the loop in order to maintain content accuracy, mitigate bias, and enhance the credibility of the content. Lange also said that it’s critical to have  “that human in the loop to evaluate the information and then to make a decision in relation to the action that the customer wants to take.”

When Google first launched Bard in February 2023, it provided incorrect information about many subjects, including the James Webb Space Telescope. This raised concerns about the accuracy and reliability of generative AI, and caused a drop in Alphabet’s stock value to the tune of $100b. Digital media companies’ generative AI products may avoid similar issues because they use their own content archives and data, rather than content scraped willy-nilly from the open web.  

In recent months, digital media companies have been launching their own generative AI tools that allow users to ask questions in natural language and receive accurate and relevant results.

To name a few: The Washington Post is partnering with Virginia Tech’s Sanghani Center for Artificial and Data Analytics to develop a new generative AI project where readers can get answers to questions, using data taken from The Post’s previous coverage.

The Associated Press created Merlin, an AI-generated search tool that makes searching the AP archive more accurate. “Merlin pinpoints key moments in our videos to exact second and can be used for older archive material that lacks modern keywords or metadata,” explained AP Editor in Chief Julie Pace at The International Journalism Festival in Perugia in April.

Outside’s Scout: AI search with useful results

Chatbots have become a popular form of search. Originally pre-programmed and only able to answer select questions included in their programming, chatbots have evolved and increased engagement by providing a conversational interface. Used for everything from organizing schedules and news updates to customer service inquiries, Generative AI-based chatbots assist users in finding information more efficiently across a wide range of industries. 

In January, Outside Inc. launched a custom chatbot OpenAI’s LLM and trained on Outside’s library of content as a knowledge base, to provide summaries and answers to audience questions. Called Scout, the chatbot is integrated into the top navigation bar of Outside’s 20+ websites and serves as the primary search engine across the Outside network. 

Much like The Guardian, The Washington Post, The New York Times and other digital media organizations that blocked OpenAI from using their content to power artificial intelligence, Outside CEO Robin Thurston explained that Outside Inc. wasn’t going to let third parties scrape their platforms to train LLM models.

Instead, they looked at leveraging their own content and data. “We had a lot of proprietary content that we felt was not easily accessible. It’s almost what I’d call the front page problem, which is you put something on the front page and then it kind of disappears into the ether,” Thurston said. 

“We asked ourselves: How do we create something leveraging all this proprietary data? How do we leverage that in a way that really brings value to our user?” Thurston said. The answer was Scout, Outside Inc.’s AI search assistant. Scout is a custom-developed chatbot.

The company could see that generative AI offered a way to make that content accessible and even more useful to its readers. Outside had a lot of evergreen content that wasn’t adding value once it left the front page. Their brands inspire and inform audiences about outdoor adventures, new destinations and gear – a lot of which is evergreen and proprietary content that still had value if it could easily be surfaced by its audience. The chat interface allows their content to continue to be accessible to readers after it is no longer front and center on the website. 

Scout gives users a summary answer to their question, leveraging Outside Inc’s proprietary data, and surfaces articles that it references. “It’s just a much more advanced search mechanism than our old tool was. Not only does it summarize, but it then returns the things that are most relevant,” he explained.

Additionally, Outside Inc’s old search function worked by each individual brand. Scout searches across the 20+ properties owned by the parent company which include Backpacker, Climbing, SKI Magazine, and Yoga Journal, among others. Scout brings all of the results together, from the 20+ different Outside brands, from the best camping destinations, to the best trails, outdoor activities for the family, gear, equipment and food all in one result.

One aspect that sets Outside Inc.’s model apart is their customer base, which differs from general news media customers. Outside’s customers engage in a different type of interaction, not just a quick transactional skim of a news story. “We have a bit of a different relationship in that they’re not only getting inspiration from us, which trip should I take? What gear should I buy? But then because of our portfolio, they’re kind of looking at what’s next,” Thurston said.

It was important to Thurston to use the LLM in a number of different ways, so Outside Inc launched a local newsletter initiative with the help of AI. “On Monday mornings we do a local running, cycling and outdoor newsletter that goes to people that sign up for it, and it uses that same LLM to pick what types of routes and content for that local newsletter that we’re now delivering in 64,000 ZIP codes in the U.S.”

Thurston said they had a team working on Scout and it took about six months. “Luckily, we had already built a lot of infrastructure in preparation for this in terms of how we were going to leverage our data. Even for something like traditional search, we were building a backend so that we could do that across the board. But this is obviously a much more complicated model that allows us to do it in a completely new way,” he said. 

Connecting AI search to a real subscriber need

In late March, The Financial Times released its first generative AI feature for subscribers called Ask FT. Like Scout, the chat-based search tool allows users to ask any question and receive a response using FT content published over the last two decades. The feature is currently available to approximately 500 FT Professional subscribers. It is powered by the FT’s own internal search capabilities, combined with a third-party LLM.

The tool is designed to help users understand complicated issues or topics, like Ireland’s offshore energy policy, rather than just searching for specific information. Ask FT searches through Financial Times (FT) content, generates a summary and cites the sources.

“It works particularly well for people who are trying to understand quite complex issues that might have been going on over time or have lots of different elements,” explained Lindsey Jayne, the chief product officer of the Financial Times.

[embedded content]

Jayne explained that they spend a lot of time understanding why people choose the FT and how they use it. People read the FT to understand the world around them, to have a deep background knowledge of emerging events and affairs. “With any kind of technology, it’s always important to look at how technology is evolving to see what it can do. But I think it’s really important to connect that back to a real need that your customers have, something they’re trying to get done. Otherwise it’s just tech for the sake of tech and people might play with it, but not stick with it,” she said. 

Trusted sources and GenAI attribution

Solutions like those from Dow Jones, FT and Outside Inc. highlight the power of a brand with a trusted audience relationship to create deep, authentic relationships built on reliability and credibility. Trusted media brands are considered authoritative because their content is based on credible sources and facts, which ensures accuracy.

Currently, generative AI has demonstrated low accuracy and poses challenges to sourcing and attribution. Attribution is a central feature for digital media companies who roll out their own generative AI solutions. For Dow Jones compliance customers, attribution is critical to customers, to know if they’re going to make a decision based on information that is available in the media, according to Lange. 

“They need to have that attributed to within the solution so that if it’s flowing into their audit trails or they have to present that in a court of law, or if they would need to present it to our internal audit, the attribution is really key. (Attribution) is going to be critical for a lot of the solutions that will come to market,” he said. “The attribution has to be there in order to rely on it for a compliance use case or really any other use case. You really need to know where that fact or that piece of information or data actually came from and be able to source it back to the underlying article.”

The Financial Times’ generative AI tool also offers attribution to FT articles in all of its answers. Ask FT pulls together lots of different source material, generates an answer, and attributes it to various FT articles. “What we ask the large language model to do is to read those segments of the articles and to turn them into a summary that explains the things you need to know and then to also cite them so that you have the opportunity to check it,” Jayne said.

They also ask the FT model to infer from people’s questions when it should be searching from. “Maybe you’re really interested in what’s happened in the last year or so, and we also get the model to reread the answer, reread all of the segments and check that, as kind of a guard against hallucination. You can never get rid of hallucination totally, but you can do lots to mitigate it.” 

The Financial Times is also asking for feedback from the subscribers using the tool. “We’re literally reading all of the feedback to help understand what kinds of questions work, where it falls down, where it doesn’t, and who’s using it, why and when.”

Generative AI seems to have created unlimited opportunities and also considerable challenges, questions and concerns. However it is clear that an asset many media companies possess is a deep reservoir of quality content and it is good for business to extract the most value from the investment in its creation. Leveraging their own content to train and program generative AI tools that serve readers seems like a very promising application. 

In fact, generative AI can give trustworthy sources a bit of a super power. Jayne from the FT offered the example of scientists using the technology to read through hundreds of thousands of research papers and find patterns in a process that would otherwise take years to read in an effort to make important connections. 

While scraped-content LLMs pose risks to authenticity, accuracy and attribution, proprietary learning models offer a promising alternative. 

As Jayne put it, “The media has “an opportunity to harness what AI could mean for the user experience, what it could mean for journalism, in a way that’s very thoughtful, very clear and in line with our values and principles.” At the same time, she cautions that we shouldn’t be “getting overly excited because it’s not the answer to everything – even though we can’t escape the buzz at the moment.”

We are seeing many efforts bump up against the limits of what generative AI is able to do right now. However, media companies can avoid some of generative AI’s current pitfalls by employing the technology’s powerful language prediction, data processing and summarization capabilities while leaning into their own strengths of authenticity and accuracy.