Browse > Home / Archive by category 'Advanced Techniques / Search engines - advanced'

| Subscribe via RSS

Making the most of RSS: readers reviewed

By Colin Meek (@colinmeek) and Judith Townend (@JTownend)

Reports of the “death of RSS” are not only greatly exaggerated – they are simply wrong. Not only is RSS part of the fabric of the internet, tapping that resource through RSS Readers is still an important and valuable component of a researcher’s toolset.

As a quick re-cap, RSS (Really Simple Syndication) readers allow you to track the content feeds from different websites. In other words, you can monitor a number of sites without visiting those sites individually. Use an RSS reader to work more efficiently – that’s the theory.

The practice is often disappointing however. People often find RSS readers (and web-based readers in particular) frustrating and difficult to configure. If you subscribe to a range of general feeds (such as BBC news feeds) and content-specific feeds you can feel overwhelmed by the tsunami of posts that appear. Subscribe to several dozen feeds and you quickly need some way to filter and focus the content further.

Of course, Twitter and email alert services can be used in conjunction with RSS, but think of Twitter as the live broadcast, and RSS as the TV catch-up and what’s on guide. No matter how long you’ve been offline, it’s really easy to see what new content has been published on your chosen sites.

If you do need to monitor many active feeds then you need to choose a reader that can filter out the noise so you can monitor posts for specific keywords, categories or tags. If you’re determined to see everything written about a subject, follow the results of a specific search term by RSS – it can be easier to track than Google alerts to your email inbox, for example.

In this post, we review a number of different RSS readers that will help you stay on top of a range of feeds from a number of sources.

Web-based: Google Reader

Google Reader is a fantastic basic reader to collect feeds, but can be infuriating to use as your main tool. It can slightly lag and is a little cumbersome for organising your feeds into different folders.

It has its advantages however: it’s easy to add feeds to Google Reader from your browser and you can sync it with another desktop or mobile reader. It’s easy to publicly share stories with your followers, and send stories to other social networks, such as Twitter, StumbleUpon and Tumblr, for example.

Another nice feature is that Google recommends feeds to you. The recommendations are automatically generated and take into account your existing feed subscriptions.

Items can be tagged and ordered fairly easily. Google has a “magic” sort, which is its attempt to put your feeds in the “most relevant and interesting order”. You can view feeds by ‘list’ or ‘expanded’ view.

Here’s how Google Reader looks in your browser:

The other reviewed readers can be synced with your base Google Reader. The advantage of this is that you have a web-based back up of your feeds and it’s easy to switch news aggregator when something better comes along. But the big plus for doing this is that you can view your feeds from wherever you are – on phone, laptop, work and home computers – and they’ll all be synced.

There are a number of Google Reader bookmarklets to pop in your browser bar for quick read, subscription and share options.

Additionally, there are a number of  shortcuts you can use while in Google Reader. See below:

Finally, for a bit of fun, have a play with Google’s ‘Reader Play’. Perhaps not the most efficient way of scanning stories, but it could be a pleasant way of flicking through your reader while you’re eating your breakfast:

Google Reader does not have more sophisticated filter options yet, but its forum suggests using an external service like FeedRinse for creating filters within your different feed folders. Feedrinse is an effective way to filter your feeds but if you have to monitor a range of feeds for various terms it is definitely not the most elegant solution.

So, which readers do offer an easy filter feature that are easy to configure?…

Desktop: NetNewsWire

For Mac, we recommend NetNewsWire. One big benefit is that it’s fast loading and easy to scroll through.

You can bookmark pages on the Delicious service, or by flagging or adding them to a clippings folder.

If you wish, it can order feeds by attention and unread count, with the aim of bringing the most important items to the top.

It has a “tabbed browser feature” which means you can click on an article to view it and then click back to continue scrolling through the next stories, without losing your place.

You can create “special subscriptions” which let you search tags on Flickr, or Delicious, or search terms on Twitter and Yahoo – giving you an integrated way to monitor social media and  your RSS feeds from one interface.

Smart folders are also handy: they organise items from your feeds according to pre-determined rules. For example, you can specify a specific keyword that must appear in the title and NetNewsWire will show items that fit this rule in the relevant Smart List. But there are other criteria you can specify for items as well: by author, link or its ‘read status’, for example.

Here’s a bit more detail. First, select the ‘New Smart List’ option under the ‘File’ tab.

Next, specify which key words you would like to filter. You can create more filter boxes by using the ‘+’ option on the right hand side. In the example below we’ve added four extra filter boxes:

There are several options for filtering as you can see below. It’s useful to specify the group name of a set of feeds (eg. ‘photography’) and within that, stories that match a specific search term (eg. title contains ‘police’).

The end result? A feed with all the stories matching those conditions.

Another option for Macs is NewsFire which has an uncluttered and intuitive interface but less functionality.

As a Windows equivalent, FeedDemon works very well. It has some really nifty features, which make it especially good for research purposes.

You can create Search Feeds that tell you when a keyword appears in any feed; whether it’s one you subscribe to or not. You can manually tag items by keyword and then view by tag.

It also has filtering options, called ‘Content Filters’ but these are only available in the paid for Pro version. However, in the basic free version you can set up Watch alerts for when certain keywords appear in any feeds you’re subscribed too.

Resulting in a feed like this:

Mobile: Reeder

We tried out a few readers for iPhone, but found Reeder to perform really well. It syncs efficiently with Google Reader. The bottom navigation allows you to jump between ‘starred’, ‘unread’ and ‘all items’.

It’s a very simple interface, but for reading RSS by your phone that’s all you really need. You can easily scroll up and down stories to read and there are a range of sharing options via social media and also quick functions for copying and emailing links, saving to ‘pinboard’ or making a note.

Of course, they are plenty of alternative RSS readers and aggregators to try out too. You can find some more suggestions on this list provided by BBC News. Download software and find even more via CNet at this link. Wikipedia has a very comprehensive list here. We’ll be reviewing more in the near future.

Tags: , , , , , ,

Twitter advanced research techniques 1: searching twitter

By Colin Meek (@colinmeek) and Judith Townend (@JTownend)

If you are engaged in serious research on topical issues you can’t ignore Twitter. And, if you can’t ignore Twitter, then you need to grasp the best ways to tap the resource. Understanding how Twitter is used is key to getting what you want without ending up knee deep in posts about coffee breaks and late trains.

Pivotal to success is understanding that the information carried within Twitter is often not the content with value. The real value for the serious researcher are the networks of people you can access, the connections that exist between them and the links people post. You can also locate those posts in time and place.

Like ordinary search engine research – you need to be precise and have a clear idea what content you are after and where it is likely to be found. For example, the screen grab below shows results from a search for posts originating in Cairo before 27 February.

Tactics like this are easy to master. In this insite series we’ll group tactics and tools into a range of research categories including: Finding the right People; #Hashtags and Trends; and Searching the Twitter Archive. This first post explores Advanced Twitter Searching.

Searching Twitter

A lot of confusion exists about advanced searching in search engines and services such as Twitter. Most, including Twitter, offer an ‘Advanced Search‘ form. This gives you clearly useful tools to help you interrogate the Twittersphere. But more useful, flexible and powerful are the ’advanced operators‘ that are typed directly into Twitter’s search box and give you the power to construct much more precise and powerful search strings. This guide explains how to get to grips with these tools. See the insite guide to Google’s advanced operators.

The Twitter guide to each advanced operator is set out below with links to the corresponding search results. The ‘operators’ are highlighted in red.

Operator Finds tweets…
twitter search containing both “twitter” and “search”. This is the default operator.
happy hour containing the exact phrase “happy hour”.
love OR hate containing either “love” or “hate” (or both).
beer -root containing “beer” but not “root”.
#haiku containing the hashtag “haiku”.
from:alexiskold sent from person “alexiskold”.
to:techcrunch sent to person “techcrunch”.
@mashable referencing person “mashable”.
“happy hour” near:“san francisco” containing the exact phrase “happy hour” and sent near “san francisco”.
near:NYC within:15mi sent within 15 miles of “NYC”.
superhero since:2011-02-26 containing “superhero” and sent since date “2011-02-26″ (year-month-day).
ftw until:2011-02-26 containing “ftw” and sent up to date “2011-02-26″.
movie -scary :) containing “movie”, but not “scary”, and with a positive attitude.
flight :( containing “flight” and with a negative attitude.
traffic ? containing “traffic” and asking a question.
hilarious filter:links containing “hilarious” and linking to URLs.
news source:twitterfeed containing “news” and entered via TwitterFee

[Source: Twitter.com]

Operators such as the ‘OR‘ command or double quote marks to designate phrase searching are common to many search engines and can be extremely useful when treated carefully. Turning to the Twitter-specific operators – some are more important for serious research than others. Here are real examples from the best of the bunch.

The ‘people’ operators let you trace posts ‘from‘ and ‘to‘ specific people. Here we used the search term: from:blibrahim to obtain all the recent posts from that user in Cairo.

You can use the ‘to‘ operator to trace all posts sent directly to that user and the @ operator to obtain posts that reference a specific user.

The ‘location’ operators help you identify Twitter users geographically. At the height of the uprising in Libya we used this search to find Twitter users in Tripoli: near:Tripoli

You can use the ‘within‘ operator to search for posts ‘within’ a certain distance of a specific location. For example, your search string can be: near:tripoli wthin:25mi to search within 25 miles of that city. The Twitter search page also gives you drop-down options for searching ‘within’ specific distances.

You can search for answers by searching for questions using the ? operator. Use this operator to find the questions other Twitter users are asking about a specific topic. For example, this query: Ajdabiya ? was used to find posts about that town on the day it was bombed by pro-Gaddafi forces.

Finding links. Discovering other resources and breaking news about a search term is one of the main reasons researchers and journalists turn to Twitter and the best way to do that is to use the filter:links operator. This looks for posts that contain your search term and internet links. For example, here we used this search string: Ajdabiya filter:links.

Searching within specific time periods. To search for post up to a specific date use the “until” operator like this: until:2010-02-28. Equally, you can search for posts after a specific date like this:since:2010-02-28. This is particularly effective when you want to examine the reaction to a specific event before and after it has occurred. Searching by timeframe is not always reliable however. If you use a common search term then Twitter may tell you your specified date is too old.

Combining operators. As with Google, the most powerful way to use advanced operators is to combine them in innovative ways to get to the material you want with the minimum of fuss. In this example, we searched for posts that contain the term “tahrir square” and originate in Cairo. The results contain leads that are exactly on topic.

The next post in this series will look at ways you can identify the right people who are using Twitter (tweeps)  and how you can identify networks and connections between people.

** Learn more about sophisticated search techniques on our one-day Advanced Internet Research course, Wednesday 16 March in London. Book soon though as only two places available at the time of writing. **

Tags: , , , , ,

Google’s advanced operators for journalists

Master commands for precision surfing. Presentation.

Confusion is rife about how and when you can use Google’s advanced operators. Used effectively they can transform your research by helping you get better results faster. Here’s my recently updated presentation on advanced operators with some context and example results.

Tags: , , , , ,

Research anonymity using Tor

tor-logo Over the next few weeks I intend to describe some of the best ways to protect your privacy if you carrying out online investigations and need to keep a low profile. At the very least – by not revealing your IP address.

One solution is the Tor (Onion Router network) Project which ‘protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world.’ It protects you by preventing sites from analysing your browsing and prevents sites from working out where you are.

It is a splendid application that allows to switch your anonymity on and off using your Firefox browser. I recently came across this basic introduction from Unwired which does a very good job. More soon.



UPDATE: More information about Tor for Mac users here.

Tags: , ,

Semantic search – an Interview with Brooke Aker

December 15th, 2008 | 3 Comments | Posted by Colin Meek in Advanced Techniques, Featured, Search engines - advanced, Semantic web

For my second expert interview on the semantic web I set out to find a key commentator who is currently involved in the heart commercial semantic search. Can someone like that describe how these web developments will impact on coal-face journalists and researchers? I didn’t need to look far. Brooke Aker is an expert in competitive intelligence and before taking up his post at Expert System he formed both Acuity Software and Cipher Systems. He has worked with 130 of the Global 2000 in the formation and operation of successful intelligence and is a key commentator on the semantic web.

Expert System is a leading provider of semantic software which discovers, classifies and interprets text information. Its semantic software, Cogito, has been deployed across most industry sectors and the company’s clients include Eni Group, Pirelli, Microsoft and Telecom Italia.

Social networks and semantic search…
Q: Do you think the explosion in growth of niche sites such as Xing and Peer Trainer will accelerate the demand for semantic-type applications that allow people to travel seamlessly through various social networking services?

Brooke Aker: I would agree with this. Facebook and Myspace are good examples of getting people used to the idea that they can, not just search, but connect to people and content. So they set the stage for users to migrate quickly to Web 3.0 properties where users can search for and connect, analyze, and assemble very specific people and document objects in ways that are uniquely designed by them.

Transforming the way you work…
Q: You recently released a (very useful) presentation on ‘what is semantic search‘. It remains, however, difficult for coal-face researchers (insite readers!) to grasp its significance. What, in your view, are the best examples of semantic search that hold the most promise. I’m thinking here of apps like Juice. Not tools that help publishers – but tools that can currently help people in their day to day work?

Brooke Aker: We have been involved with applications that use semantics and combine search with discovery, or search with analysis. Let me explain….
Because semantics expands and connects similar concepts, from where I begin my search, I may end up in a place I did not expect. Say I run a search for “stock” and ask to limit my search to the concept of stock in the sense of soup. This helps avoid stock as in “inventory” or stock as in “equities.” Now, I tell the system to expand the concept of soup stock and I get bouillon, stock, base, and a completely new word to me called fumet. I can then reduce my search results further by noting Emeril Lagasse is mentioned as a chef in one of the documents extracted. So in the end, I used semantics to search for a recipe on soup stock and ended up in a precise but completely new place with a recipe called “Emeril Lagasse’s classic fish fumet.” The document had no mention of the word soup or stock. This is something I would have otherwise missed.

For search combined with analysis, we often will employ semantics in a modeling sense. Think about a competitor who may be preparing to launch a new product, but the company has not made anything public yet. We know what steps that company must be taking in order to launch a new product: things like ramping up production lines, buying new machines, contracting with ad agencies, hiring new people with specific skills, etc. These actions are likely to be public. Semantics are employed to broadly find these indicators, which feed a model. If enough of the indicators are present, the model concludes a new product is forthcoming. So here, semantics plays a predictive role. Such foreknowledge of such things is many times more valuable than simply knowing the moment something is reported in the press as having already occurred.

Tech stacks…
Q: Many people, I think, assume that the semantic web will usher in a new period of improved search. But, in fact, developments such as the ‘social semantic desktop’ like Nepomuk may accelerate the development of semantic web technology. Do you agree?

Brooke Aker: I agree. The ideal architecture would be to re-index the entire Web semantically and have new browsers to read it. But that seems like a long shot for the time being. So instead, if you embed the semantic processing of every html page the standard browser reads, stores and retrieves locally, you have in effect federated the problem across the Internet. And of course those same special browsers or browser plug-ins could also peer-to-peer share their semantic results if directed by the end user.

Filter failure…
Q: I really like your graph visualising the downside of web2.0 (in your presentation) – as more and more information is mass produced there is a danger that productivity may slide. Do you agree with Clay Shirky’s recent argument that ‘information overload is just filter failure?’ What we’ll all have to get used to is using the right filters at the right time and learn how to maintain them?

Brooke Aker: Filters are a blunt instrument to a more delicate problem. And it implies a lot more work on the users end. This spells failure to me. People want convenience and simplicity, and are already overwhelmed.

About Cogito….
Q: Can you describe some of the practical applications to which your software, Cogito, has been applied (and therefore, where Expert System USA is positioning itself)?

Brooke Aker: One of the best examples we have now is in customer service. We support the online help function of mobile devices using a natural language interface. So users type a question they have about how to operate their devices into the handset. We return 1 precise answer to them. This prevents the user from doing two things. First, they don’t return the device having found it too difficult to operate. This also saves the company the cost of acquiring the customer before they can earn it back. The second thing it does is deflect an inbound call to the customer service center where the average cost is $20. We can give the direct, accurate answer for about a ½ a penny.

Q: In your presentation you draw a green figure that demonstrates how web3.0 will allow people to better ‘filter’ or pick the content that is relevant to them. In your view, what single application/product best demonstrates the power of this technology?

Brooke Aker: Yes, this is the example I gave about the fish fumet. The product we have made to do this is Cogito Focus. It is a corporate search tool that includes crawlers, semantic indexers and a nicely done interface that is not much of a stretch over a conventional search box interface (e.g. little training).

Viable applications….
Q: I was interested to read the “On the cusp” by David Provost. In it he concludes that companies are on the verge of constructing very practical and commercially viable semantic applications. (Provost makes the point that Twine is succeeding because it has ditched semantic terminology and focused on the ‘business mission’. While the terminology isn’t obvious – the semantics under the hood is). Do you agree?

Brooke Aker: Sure I do. Business users don’t care how it works, only that it does work and provides some visible, measurable value. Same is true for consumer-facing applications as well. Can I do something helpful and valuable that I could not do before? That’s what matters.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Journalists and the social web – Oslo Seminar

I am just back from the seminar on Journalists and the Social Web in Oslo organised by the Norwegian Journalist Kristine Lowe, Journalism.co.uk and Journalisten.no. The day went really well with some fascinating discussion and I’d like to thank the hosts for their generous hospitality. I spoke at the seminar on several subjects including Mining Social Networks for Information, Monitoring News and The Semantic Web and journalists. Here are my presentations:

Journalists and the Social Web 1 – Mining for Information

Journalists and the Social Web 2 – Monitoring your Beat

Journalists and the Social Web 3 – Journalists and the Semantic Web

Tags: , , ,

The Semantic Web today – An Interview with John Breslin

Not many people are as close to the heart of the Semantic Web as John Breslin. John is the founder of the Semantically-Interlinked Online Communities (SIOC) project, a member of the W3C Advisory Committee, lecturer at the National University of Ireland and an associate researcher on the semantic web at the Digital Research Institute in Galway.

I caught up with John recently for this analysis of the semantic web and journalism published in journalism.co.uk. We thought John’s points were so interesting that we’ve brought you the full interview in this post.

Niche social networks…
Q: Some have predicted that the rise in the universal social network sites such as Facebook and Myspace will be mirrored soon by the explosion in growth of niche sites such as Xing and Peer Trainer and the expanding interest in ‘enterprise2.0′. In many ways this may accelerate the the demand for semantic-type applications that allow people to travel seamlessly through various social networking services. What do you think?

John Breslin: I think that even though some have argued against the need for niche social networking services (SNSs) due to the widespread use of large sites like Facebook and MySpace, these niche SNSs can provide a breath of fresh air when one wants to escape from the bigger “overcrowded SNS cities”. As long as a niche SNS or community site provides regularly updated and relevant content to a steady or growing set of users, there is no reason that such sites should not survive or even flourish on the Web. As pointed out by Paul Gibler in his online article “The Expanding World of Social Networking”, it is the fine-grained and targeted communities such as CafeMom, BOOMj and PEERtrainer that are experiencing recent growth. This also ties in with the idea of object-centered sociality, where people don’t just connect randomly online but rather through the (niche) interests that they have in common. Mark O’Neill sums it up nicely: “…by organizing networks centrifugally around objects, social networking sites have meaning, even when they do not have 200 million users and even when they are centered around minority interests (like Thomas Kinkade paintings!). The point is that they are centered on objects which are in common.” As you say, a key is to allow people to seamlessly find and navigate through these niche interests, and that’s where projects like OpenID, FOAF and SIOC can help – from the point of view of having a single login that’s tied to your interests which can then be semantically matched to content items created across many communities.

Social network portability…
Q: There are several projects set up to address the issue of social network portability – allowing you to interact with various social networks more easily. In your view, will most people need to get used to the concept of a single global online identity such as FOAF?

John Breslin: I think that people are tired of repeating the same information in multiple places, and through standard signon systems like OpenID and profile representation mechanisms like FOAF, you can allow someone to define their identity and to reuse it wherever they choose to use it.

Tech stacks…
Q: You’ve described how a ‘social networking layer may be folded into tech stacks’ where your web and desktop application layer can tap into an integrated social networking stack. For me, this opened my eyes to how important the shift to the semantic will be. I think many people assume that the semantic web will usher in a new period of improved search. But, in fact, it will utterly change the we we interact with the internet?

John Breslin: A lot of the focus from the public or media regarding the semantic web has been in relation to search. But it’s not solely about finding those relevant objects (people, places, etc.) through “Google killers”, and its not only about the Internet (despite being called Web3.0!), but it’s also about providing ways to allow systems (on the desktop, or the Web, or media servers, whatever) to interoperate with each other as well. The social networking stack is one nice example, and indeed efforts like the Social Semantic Desktop and Social Semantic Web can interoperate through such a stack. It may also be for migration between different collaborative workspaces or social software systems, as we are doing with the SIOC project.

Your online identity…
Q: You’ve also suggested that online communities should provide their data in a common, machine-understandable way and should use common semantics to define this data (SIOC and FOAF). The way semantic services will be deployed is unpredictable but do you envisage people signing up to new social networks and setting up a profile automatically using their FOAF file? In the future, do you think people who want to network with each other will swap FOAF files and these files will include relevant information about social network membership?

John Breslin: Yes, and this is being done to some extent already. But also it’d be nice to not just bring your personal profile and your friends with you (for example, via FOAF) but perhaps your content as well (maybe defined using SIOC). There are some issues related to both transporting your friends (need their permission) and comments attached to your content (may need the permission of those commenters too), but you should at the very least be able to bring what belongs to you (your profile and your content), for example along the guidelines of the “Bill of Rights for Users of the Social Web” by Canter et al.

Meshing of networks…
Q: A practical consequence of SIOC might be that you might do a search in Facebook using the term ‘bog-snorkelling’ and gets results back that may include profile pages that include that term, but also blog results from Technorati, comments from Flickr albums and YouTube videos? Equally, a practical consequence of SKOS, FOAF and SIOC could be that you click on a tag for ‘bog snorkelling’ in Delicious and get results from a range of social network sites?

John Breslin: Exactly! I’m delighted that Yahoo! SearchMonkey have listed SIOC as one of their recommended vocabularies – and that people are now starting to get the idea of being able to retrieve user-generated content items from all or from specific types of social websites (blogs, forums, mailing lists, photo albums) using mechanisms like SIOC and FOAF. Through people defining interests explicitly using something like a foaf:interest field or implicitly by clicking on tags of interest, relevant content can be easily returned from social websites with appropriate dc:subject or sioc:topic metadata.

Practical implications…
Q:A practical result being that you create a new account with a new social network and that SN can identify other people on that Network who are listed in Bob’s defined relationships. Have any social networks already deployed this service?

John Breslin: There are many sites (e.g. Dopplr) that are starting to allow you to bring your friends with you by specifying something like your GMail account details (and then matching e-mail addresses you use) or your Twitter account details (and then retrieving a list of those whose microblogs you follow), but it is certainly useful to have a smaller set of reusable relationship formats that can make this more widespread (and that extends the number of services that you can import from). The Google SocialGraph API is a nice example of something that can enable this, as it allows applications to reuse social graph information extracted from sources all over the Web and represented using the open formats XFN and FOAF.

Searching the semantic layer…
Q: I’m a bit confused by the SIOC RDF Browser and if there are any applications that currently allow one to browse information expressed in RDF and SIOC ontology – I assume you need specific URLs to use this?

The SIOC RDF browser is simply a way to view RDF information in a more human friendly form. One of the motivations for creating this was to enable people to view semantic information easily because it may have different aspects that can be of interest – it may be the same information you see on a normal web page, but it may also contain extra information that is not normally displayed on a web page but is rather hidden or locked into a database and that information may prove useful for some third-party applications (e.g. a modification date, incoming links), or perhaps some extra information can be calculated or inferred for a semantic page (related content on the same topic, tag usage frequencies, etc.)

Semantic search…
Q: From the perspective of the non-technical lay researcher – where does Sindice (the semantic web index) and other semantic search tools fit in?

Sindice can be thought of as a big semantic index of the Web. It allows you to find pointers to relevant pages or URIs where particular keywords are mentioned, where certain property values are used (e.g. pages where a person says their e-mail address is john.breslin@deri.org), or where certain facts or semantic tripples appear. If you’re looking for a “semantic search engine”, it depends on what you need. Sindice gives you pointers to where stuff is, whereas many other engines give you the stuff as well (without you having to go to the source page).
SWSE (also from DERI) and Swoogle allow query capabilities over the collections of all Semantic Web statements – so if you search for Galway, it can show you the relevant statements as well as pointing you to the pages they were obtained from.

But I think the applications of Sindice, i.e. finding pointers to where stuff is, and using that in third-party applications, are quite interesting. For example, the SIOC Widget for WordPress is powered through a combination of distributed SIOC documents and the Sindice index. So, when you are browsing a blog that has this widget installed, you may see little balloons appearing beside commenters names. Clicking on these balloons shows a pop-up with a list of content (posts, comments, topics) that that commenter has created not just on the blog site you are viewing but across a range of SIOC-enabled websites (blogs, forums, mailing lists, whatever) as indexed in Sindice. Here is a picture. So you can see and navigate to the content a person has created across a range of sites from just one place that they post to.

On the cusp…
Q:Moving on to practical applications. I was interested to read the “On the cusp” by David Provost. In it he concludes that companies are on the verge of constructing very practical and commercially viable semantic applications. Do you agree?

John Breslin: I think that we are now beginning to see the real commercial applications of what can be done when all kinds of things on the Web are connected together using semantics. This is obvious in the attention being given to startup companies in this space like Powerset, Metaweb (Freebase) and Radar Networks (Twine), and also since many big companies including Reuters (Calais API), Yahoo! (SearchMonkey) and Google (Social Graph API) have all announced in 2008 what they are doing with semantic data.

There has been a lot of talk this past year about the social graph (notably from Google’s Brad Fitzpatrick), which looks at how people are connected together (friends, colleagues, neighbours, etc.), and how such connections can be leveraged across websites. In the Semantic Web, it is not just people who are connected together in some meaningful way, but documents, events, places, hobbies, pictures, you name it! And it is the commercial applications that exploit these connections that are now becoming interesting. But it is very important that the users aren’t exposed to any RDF or semantic terminology – through usage, they just “get” the fact that everything is interconnected.

And the best product?…
Q: In your view – what are the most exciting semantic product developments to have emerged in the last year?

John Breslin: I really like Radar’s Twine, the “knowledge networking” application that allows users to share, organise, and find information with people they trust. I find Twine very interesting, and as well as using it to gather information about SIOC for regular blog entries I write (“Tales from the SIOC-o-sphere”), I also use it to gather and publish personal interests that I think will be of interest to the public, and for passing on interesting stuff to work colleagues.

Privacy…
Q: What about the privacy angle. Are the privacy safeguards in place capable evolving to meet this challenge? Does the average LiveJournal user know that their profiile has been converted to a FOAF file and is now translatable by any number of new semantic products? Speaking as a journalist, my hunch is that the vast majority of people are going to be surprised and, perhaps, shocked to know that a public comment then make on Livejournal may end up in a database that is searchable by people in Linkedin.

No, certainly people aren’t aware that many sites are making semantic forms of their content available which can be reused elsewhere. Tribe.net recently turned off their FOAF exports after a user complained that his/her profile was being copied for use elsewhere (the original developer team had moved on so the new developers weren’t sympathetic to the possibilities of the Semantic Web). Similar things happened with people blogging and finding that content from their RSS feeds was popping up on other sites. There certainly has to be more thought put into educating users and towards having opt-in-opt-out mechanisms when implementing semantic exports, especially for personal content and profiles.

Thanks to John for his time with this.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Ask.com rearms with semantics, rich media in search war

October 13th, 2008 | No Comments | Posted by Colin Meek in Search engines - advanced, Search tools and tricks

Ask.com have improved their search with the natural language semantic technology Direct Answers from Search (DAFS). This ARS Technica post describes the move. Ask.com rearms with semantics, rich media in search war

Tags: ,

New UK search engine – insite talks to lead programmer

October 10th, 2008 | No Comments | Posted by Colin Meek in Featured, Search engines - advanced, Search tools and tricks

The beta version of the new UK-based search engine MSE360 has attracted praise from both sides of the Atlantic with a three-tier display, clean design and other unique features such as virus alerts. I caught up with its Lead Programmer Daniel Clarke to talk about his plans, what MSE360 can offer journalists and researchers, and how a UK search engine can find elbow room in a crowded market.

Insite: ‘MSE360′ – what does the name mean?

“MSE Stands for Multi Search Engine, 360 for 360 degrees.”

Are you a UK team based in the UK?

“Yeah, we’re based in Kent, UK. All staff are based in the UK and we pride ourselves on that. Britain has some of the best minds when it comes to technology, after all, the internet was born from a British Inventor, Tim Berners-Lee.”

There are lots of search engines out there – some doing a great job and some that are not so good. Some of your friends must have suggested that the market is already crowded. How did you respond?

“Indeed, very much so, the market is at a stage where it’s hard to make any sense of a push, and we’ve been told many times that the market has no room for another search engine. We respond with the facts. 90% of the new search engines that launch do so without bringing new or exciting features. Cuil has a slightly different user interface, but that was limited and the technology really didn’t live up the hype. We don’t know if we shall succeed or die out, but if no one tries then what’s to push the larger search engines from improving? Percentage of search market is irrelevant to us, some people have told us that they will be switching to MSE from other search engines, so as long as we make a few people happy, I think we’ve done our job.”

OK – insite and journalism.co.uk readers are busy people. Why should they turn to MSE360 instead of ask.com? What is your main message to the heavy internet users?

“MSE360 was created to speed up the search process. Why navigate between blog search engines, web search engines, image search and Wikipedia if you can find it in one resource? MSE360 brings a lot of sources together in a simple layout. Our anti-virus features also keep the average user safer. We also, unlike other search engines, store no personal information. The fact of the matter is simple, MSE is like marmite, you’ll either love it or hate it.”

I once spent about and hour looking for someone on Google only to find that person as the top hit when I eventually gave up and used another search engine. What will it take to convince internet users – particularly those in the UK – that Google doesn’t have all the answers?

“Google is entrenched in the minds of the British population, and that’s the main challenge for us. We’ve got to change the perception that Google has all the answers. To do that we will be investing in schools to make sure students are not just informed about Google, but the wider range of search engines (such as Ask, Live and MSE). We also are going to focus an advertising campaign on the fact that Google doesn’t have all the answers. But the main factor in this is the technology; we can advertise as much as we want but unless we focus on improving the search our biggest advertising challenge – word of mouth – won’t succeed.

On your site you say you use your own robots and algorithms but you also use partners. Which other search engines do you partner with?

We’re slowly phasing out the external engines, but we use resources from Yahoo and Live Search.

OK – moving on to your unique selling points. Your service is fast, you flag sites that contain viruses, you allow community results and you offer a clean and easy-to-understand 3 tier layout. What else can users expect to see over the coming months?

We have some exciting new features in store. First of all our indexing methods will be changing to provide better results and we’ll be expanding community results and adding a voting system on all results. We plan to allow full customization of the search – from simple layout changes to algorithm changes. The user interface will be improved as will the speed. In the next month we’ll have user added modals which will allow users to add their own search methods, for example torrents or a certain site. 60% of our features come from user suggestions, so there is plenty to come!

One blog recently suggested that MSE360 doesn’t support advanced operator searches. It was very wrong because I’ve used your engine to carry out some complex searches using the advanced operators I can use on Google. Are there any operators you support that Google doesn’t?

We use the standard operators currently (AND OR ELSE etc) but I think this is something we have to improve on. We’re going to be adding content license operators (eg CC25) and algorithm operators, so the user will be able to find sites without adverts, with ‘x’ degree of adverts, etc. I can’t go into too many details, but we’ve got more coming!

Are you planning to implement an advanced search page?

Our beta version currently has an advanced search page, this will go live in the coming weeks.

Are you planning to incorporate any semantic technology into your search service?

We are, but I’m afraid I can’t go into detail on this area yet. In the coming months we’ll release more details in this exciting new area.

Tags: , , ,

Welcome

Welcome to insite. A www.journalism.co.uk blog that will cover everything related to internet research. insite is written and edited by Colin Meek who delivers training courses in Advanced Internet Research for journalism.co.uk the National Union of Journalists and for other clients on an in-house basis.