filter
Credit: By "The Wanderer's Eye Photography" on Flickr. Some rights reserved.

French news outlet Les Echos is launching a new business news aggregation platform, using semantic analysis to sort and rank sources.

The project - called LesEchos360 - was shared by head of digital at Les Echos Frederic Filloux, also editor of the Monday Note, at the Global Editors Network in Paris last week. The project, which was worked on with Syllabs, is in private beta, and publishers are being invited to be part of the public beta launch later this year by emailing the project.

Filloux said he was saddened that news aggregation platforms "are not being invested in by media companies", with the exception of Zite which is now owned by CNN.

Speaking to Journalism.co.uk later, he added that a lack of this investment by legacy media "reflects our inability to innovate".

The task now is to "restore some agility in our big companies," he said. "That's what I'm willing to do at Les Echos."

Reflecting on sites such as TechMe during his presentation, whose "underlying engine is extremely clever", Filloux said, Les Echos embarked on its own aggregation project.

The idea was to "use our brand, our label", he said, and give a business focus to the aggregation.

This was a "simple idea", but had a "very complex execution", he said.

Les Echos small

The first issue was with choosing sources for the aggregation. Les Echos did not want to simply "collect the 10 biggest business sites in France", he explained.

The first step it took was to look at the accounts of its own journalists, of which there were 72 in total.

In terms of their tweets he said this "yielded 5,000 URLs", but that these were often related to their own work.

Therefore it instead looked at who their journalists were following on Twitter, which offered up 872,000 URLs instead.

From there it was necessary to sort and rank these, based on areas such as "relevancy" and "serendipity", Filloux added. They wanted to surface content "people might enjoy to discover", he said.

LesEchos360 is also using a "scoring system" to help rank the sources, which is achieved by looking at factors such as retweets, sending to original URLs and finding original domains.

By applying these filters the platform narrowed its sources down to around 160 URLs, which now form the "base of our leaderboard", Filloux said.

The platform is also performing "semantic analysis of every story we collect".

Speaking to Journalism.co.uk, Filloux said there was a desire to have a collection of content "which both reflects some quality regarding the information", as well as "the relative weight of news in the news cycle".

In order to "cluster the news" being aggregated, each piece of content is taken through a semantic analysis, in a bid to hunt out "the important stuff".

This includes looking for names of people or companies, dates, events or key business event terms such as 'takeover', Filloux explained.

While the main process will be "managed by a machine", Filloux added that some human oversight may be necessary, to help hunt out those stories with less obvious business links.

"We need to have some manual tweaking," he said.

The platform will publish a "snippet" of the story, the length of which is currently being decided, but is likely to be around two lines, to encourage readers to click through to the original source.

"By itself it will only be interesting if [you] click on the story and move elsewhere," Filloux said.

One future aim for the project, is to perhaps be able to offer it as a "vertical for a business company", he said, if they were "interested in being able to do some news gathering" of content relevant to their industry.

"So what we're building is a fairly light engine that could allow us to do exactly this kind of thing," he added.

For now though, he said the focus is on producing the prototype and using it "to learn a lot of things".

Free daily newsletter

If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).