Browse > Home / Archive: October 2008

| Subscribe via RSS

Browsers compete on privacy controls

October 31st, 2008 | No Comments | Posted by Colin Meek in Your own privacy

There are very good reasons why journalists need to worry about their privacy more than most and I outlined those reasons in this article for journalism.co.uk a few months ago. So, it’s good to see that browsers are increasingly competing on the way they allow users to protect their online privacy. The Center for Democracy and Technology has published a report on the issue covering Firefox, Internet Explorer, Google’s Chrome and Safari. Here’s a link to the Resource Shelf post on the full report. Privacy Controls: Privacy focus means more choice for consumers protecting their personal data

Tags: , , , ,

Privacy and the Semantic Web

October 31st, 2008 | No Comments | Posted by Colin Meek in Semantic web, Your own privacy

Semantic Web pioneer John Breslin has responded to my articles on Web 3.0 in his post on his Cloudlands blog. His post covers two broad issues. Firstly, he adds some important points about semantic web search and the differences between the various Semantic Web search engines that exist. I am reviewing all of these engines from the perspective of a coal-face journalist so more on that soon.

Second, John argues that the Semantic Web community needs to be ‘very aware’ about the fact that Web 3.0 may see new tools and techniques developed that will make it even easier for journalists and others to access sensitive personal information.

During the recent seminar in Oslo on the Social Web I argued that journalists need to be aware that personal information that can be found on the internet falls into three categories: the information people intentionally publish; information about themselves they have no control over; and, lastly, information you make available to specific sites under certain conditions. My view is that these categories are being blurred and that Web 3.0 is likely to blur these divisions further.

Most importantly, journalists should be aware of a distinction between what people intentionally publish and what they make accessible. There will always be journalists who think any personal information made accessible is fair game, but publications should perhaps think about guidance about what information and content should and should not be re-hashed from social networking sites and under what circumstances. The Semantic Web can only make this more pressing.

John Breslin states: “Educating site owners about what semantic data they may be publishing (knowingly or unknowingly, even if it’s just RSS feeds) is needed, and developers should determine exactly what opt-in or opt-out mechanisms are required before implementing semantic solutions,” and he goes on to argue that the Semantic Web community needs to think more about educating people about the benefits as well as how it can minimise any hazards.

Perhaps the first step in reassuring people about Web 3.0 is for the industry and Semantic Web community to agree on a range of specific privacy guarantees such as the bill of rights for users of the social web before privacy problems start to dominate the headlines (such as the automatic export of FOAF files and the transfer of content from one social network to another without members’ knowledge).

Tags: , , , , , , , ,

Journalists and the social web – Oslo Seminar

I am just back from the seminar on Journalists and the Social Web in Oslo organised by the Norwegian Journalist Kristine Lowe, Journalism.co.uk and Journalisten.no. The day went really well with some fascinating discussion and I’d like to thank the hosts for their generous hospitality. I spoke at the seminar on several subjects including Mining Social Networks for Information, Monitoring News and The Semantic Web and journalists. Here are my presentations:

Journalists and the Social Web 1 – Mining for Information

Journalists and the Social Web 2 – Monitoring your Beat

Journalists and the Social Web 3 – Journalists and the Semantic Web

Tags: , , ,

Seminar on Social Networks and Journalists – Norway

October 24th, 2008 | No Comments | Posted by Colin Meek in Social Networks

I am currently at the Seminar on Social Networks and Journalists in Oslo organised by Norwegian Journalist and blogger Kristine Lowe. I’ll be speaking on Web2.0 and Web3.0 – tools for journalists and I’ll post my presentations here soon. The full seminar programme is available here.

Tags: ,

The Semantic Web today – An Interview with John Breslin

Not many people are as close to the heart of the Semantic Web as John Breslin. John is the founder of the Semantically-Interlinked Online Communities (SIOC) project, a member of the W3C Advisory Committee, lecturer at the National University of Ireland and an associate researcher on the semantic web at the Digital Research Institute in Galway.

I caught up with John recently for this analysis of the semantic web and journalism published in journalism.co.uk. We thought John’s points were so interesting that we’ve brought you the full interview in this post.

Niche social networks…
Q: Some have predicted that the rise in the universal social network sites such as Facebook and Myspace will be mirrored soon by the explosion in growth of niche sites such as Xing and Peer Trainer and the expanding interest in ‘enterprise2.0′. In many ways this may accelerate the the demand for semantic-type applications that allow people to travel seamlessly through various social networking services. What do you think?

John Breslin: I think that even though some have argued against the need for niche social networking services (SNSs) due to the widespread use of large sites like Facebook and MySpace, these niche SNSs can provide a breath of fresh air when one wants to escape from the bigger “overcrowded SNS cities”. As long as a niche SNS or community site provides regularly updated and relevant content to a steady or growing set of users, there is no reason that such sites should not survive or even flourish on the Web. As pointed out by Paul Gibler in his online article “The Expanding World of Social Networking”, it is the fine-grained and targeted communities such as CafeMom, BOOMj and PEERtrainer that are experiencing recent growth. This also ties in with the idea of object-centered sociality, where people don’t just connect randomly online but rather through the (niche) interests that they have in common. Mark O’Neill sums it up nicely: “…by organizing networks centrifugally around objects, social networking sites have meaning, even when they do not have 200 million users and even when they are centered around minority interests (like Thomas Kinkade paintings!). The point is that they are centered on objects which are in common.” As you say, a key is to allow people to seamlessly find and navigate through these niche interests, and that’s where projects like OpenID, FOAF and SIOC can help – from the point of view of having a single login that’s tied to your interests which can then be semantically matched to content items created across many communities.

Social network portability…
Q: There are several projects set up to address the issue of social network portability – allowing you to interact with various social networks more easily. In your view, will most people need to get used to the concept of a single global online identity such as FOAF?

John Breslin: I think that people are tired of repeating the same information in multiple places, and through standard signon systems like OpenID and profile representation mechanisms like FOAF, you can allow someone to define their identity and to reuse it wherever they choose to use it.

Tech stacks…
Q: You’ve described how a ‘social networking layer may be folded into tech stacks’ where your web and desktop application layer can tap into an integrated social networking stack. For me, this opened my eyes to how important the shift to the semantic will be. I think many people assume that the semantic web will usher in a new period of improved search. But, in fact, it will utterly change the we we interact with the internet?

John Breslin: A lot of the focus from the public or media regarding the semantic web has been in relation to search. But it’s not solely about finding those relevant objects (people, places, etc.) through “Google killers”, and its not only about the Internet (despite being called Web3.0!), but it’s also about providing ways to allow systems (on the desktop, or the Web, or media servers, whatever) to interoperate with each other as well. The social networking stack is one nice example, and indeed efforts like the Social Semantic Desktop and Social Semantic Web can interoperate through such a stack. It may also be for migration between different collaborative workspaces or social software systems, as we are doing with the SIOC project.

Your online identity…
Q: You’ve also suggested that online communities should provide their data in a common, machine-understandable way and should use common semantics to define this data (SIOC and FOAF). The way semantic services will be deployed is unpredictable but do you envisage people signing up to new social networks and setting up a profile automatically using their FOAF file? In the future, do you think people who want to network with each other will swap FOAF files and these files will include relevant information about social network membership?

John Breslin: Yes, and this is being done to some extent already. But also it’d be nice to not just bring your personal profile and your friends with you (for example, via FOAF) but perhaps your content as well (maybe defined using SIOC). There are some issues related to both transporting your friends (need their permission) and comments attached to your content (may need the permission of those commenters too), but you should at the very least be able to bring what belongs to you (your profile and your content), for example along the guidelines of the “Bill of Rights for Users of the Social Web” by Canter et al.

Meshing of networks…
Q: A practical consequence of SIOC might be that you might do a search in Facebook using the term ‘bog-snorkelling’ and gets results back that may include profile pages that include that term, but also blog results from Technorati, comments from Flickr albums and YouTube videos? Equally, a practical consequence of SKOS, FOAF and SIOC could be that you click on a tag for ‘bog snorkelling’ in Delicious and get results from a range of social network sites?

John Breslin: Exactly! I’m delighted that Yahoo! SearchMonkey have listed SIOC as one of their recommended vocabularies – and that people are now starting to get the idea of being able to retrieve user-generated content items from all or from specific types of social websites (blogs, forums, mailing lists, photo albums) using mechanisms like SIOC and FOAF. Through people defining interests explicitly using something like a foaf:interest field or implicitly by clicking on tags of interest, relevant content can be easily returned from social websites with appropriate dc:subject or sioc:topic metadata.

Practical implications…
Q:A practical result being that you create a new account with a new social network and that SN can identify other people on that Network who are listed in Bob’s defined relationships. Have any social networks already deployed this service?

John Breslin: There are many sites (e.g. Dopplr) that are starting to allow you to bring your friends with you by specifying something like your GMail account details (and then matching e-mail addresses you use) or your Twitter account details (and then retrieving a list of those whose microblogs you follow), but it is certainly useful to have a smaller set of reusable relationship formats that can make this more widespread (and that extends the number of services that you can import from). The Google SocialGraph API is a nice example of something that can enable this, as it allows applications to reuse social graph information extracted from sources all over the Web and represented using the open formats XFN and FOAF.

Searching the semantic layer…
Q: I’m a bit confused by the SIOC RDF Browser and if there are any applications that currently allow one to browse information expressed in RDF and SIOC ontology – I assume you need specific URLs to use this?

The SIOC RDF browser is simply a way to view RDF information in a more human friendly form. One of the motivations for creating this was to enable people to view semantic information easily because it may have different aspects that can be of interest – it may be the same information you see on a normal web page, but it may also contain extra information that is not normally displayed on a web page but is rather hidden or locked into a database and that information may prove useful for some third-party applications (e.g. a modification date, incoming links), or perhaps some extra information can be calculated or inferred for a semantic page (related content on the same topic, tag usage frequencies, etc.)

Semantic search…
Q: From the perspective of the non-technical lay researcher – where does Sindice (the semantic web index) and other semantic search tools fit in?

Sindice can be thought of as a big semantic index of the Web. It allows you to find pointers to relevant pages or URIs where particular keywords are mentioned, where certain property values are used (e.g. pages where a person says their e-mail address is john.breslin@deri.org), or where certain facts or semantic tripples appear. If you’re looking for a “semantic search engine”, it depends on what you need. Sindice gives you pointers to where stuff is, whereas many other engines give you the stuff as well (without you having to go to the source page).
SWSE (also from DERI) and Swoogle allow query capabilities over the collections of all Semantic Web statements – so if you search for Galway, it can show you the relevant statements as well as pointing you to the pages they were obtained from.

But I think the applications of Sindice, i.e. finding pointers to where stuff is, and using that in third-party applications, are quite interesting. For example, the SIOC Widget for WordPress is powered through a combination of distributed SIOC documents and the Sindice index. So, when you are browsing a blog that has this widget installed, you may see little balloons appearing beside commenters names. Clicking on these balloons shows a pop-up with a list of content (posts, comments, topics) that that commenter has created not just on the blog site you are viewing but across a range of SIOC-enabled websites (blogs, forums, mailing lists, whatever) as indexed in Sindice. Here is a picture. So you can see and navigate to the content a person has created across a range of sites from just one place that they post to.

On the cusp…
Q:Moving on to practical applications. I was interested to read the “On the cusp” by David Provost. In it he concludes that companies are on the verge of constructing very practical and commercially viable semantic applications. Do you agree?

John Breslin: I think that we are now beginning to see the real commercial applications of what can be done when all kinds of things on the Web are connected together using semantics. This is obvious in the attention being given to startup companies in this space like Powerset, Metaweb (Freebase) and Radar Networks (Twine), and also since many big companies including Reuters (Calais API), Yahoo! (SearchMonkey) and Google (Social Graph API) have all announced in 2008 what they are doing with semantic data.

There has been a lot of talk this past year about the social graph (notably from Google’s Brad Fitzpatrick), which looks at how people are connected together (friends, colleagues, neighbours, etc.), and how such connections can be leveraged across websites. In the Semantic Web, it is not just people who are connected together in some meaningful way, but documents, events, places, hobbies, pictures, you name it! And it is the commercial applications that exploit these connections that are now becoming interesting. But it is very important that the users aren’t exposed to any RDF or semantic terminology – through usage, they just “get” the fact that everything is interconnected.

And the best product?…
Q: In your view – what are the most exciting semantic product developments to have emerged in the last year?

John Breslin: I really like Radar’s Twine, the “knowledge networking” application that allows users to share, organise, and find information with people they trust. I find Twine very interesting, and as well as using it to gather information about SIOC for regular blog entries I write (“Tales from the SIOC-o-sphere”), I also use it to gather and publish personal interests that I think will be of interest to the public, and for passing on interesting stuff to work colleagues.

Privacy…
Q: What about the privacy angle. Are the privacy safeguards in place capable evolving to meet this challenge? Does the average LiveJournal user know that their profiile has been converted to a FOAF file and is now translatable by any number of new semantic products? Speaking as a journalist, my hunch is that the vast majority of people are going to be surprised and, perhaps, shocked to know that a public comment then make on Livejournal may end up in a database that is searchable by people in Linkedin.

No, certainly people aren’t aware that many sites are making semantic forms of their content available which can be reused elsewhere. Tribe.net recently turned off their FOAF exports after a user complained that his/her profile was being copied for use elsewhere (the original developer team had moved on so the new developers weren’t sympathetic to the possibilities of the Semantic Web). Similar things happened with people blogging and finding that content from their RSS feeds was popping up on other sites. There certainly has to be more thought put into educating users and towards having opt-in-opt-out mechanisms when implementing semantic exports, especially for personal content and profiles.

Thanks to John for his time with this.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

When you have to cite your online sources

October 20th, 2008 | 3 Comments | Posted by Colin Meek in Featured, investigative strategies, Sorting and Storing

There is no doubt that the many social bookmarking tools that exist often offer great networking services and ways you can monitor areas of interest effectively. But you can run into problems if you rely on a social bookmarking site as a file to store away your online sources. If you are working on a sensitive story that relies on web-sourced evidence then that can evaporate if the site owner pulls the page, alters the relevant text or changes an image.

I worked for years on the BMJ Best Treatments project and out-of-date links to academic sources and public-targeted government advice were a continual headache. One study has found that 13% of internet references in scholarly articles become inactive after only 27 months. But perhaps an even bigger problem is when the link stays intact but the content changes. If your story depends on a particular source and it changes then where do you turn? Does your publication have a policy on this?

Some social bookmarking sites offer a partial solution. Furl.net lets you save the entire page to Furl’s servers where you can access an exact copy of the page without going to the source URL. I always use Furl when I’m working on anything even remotely sensitive. Another tool – WebCite – allows you to copy web pages and store them remotely. The difference here is that WebCite gives you a way of citing a source permanently. When you use WebCite in a published document you cite the original source URL and a WebCite reference and you can be sure the WebCite link won’t change. Readers can then click on the WebCite link for the archived version. WebCite is supported by a range of academic and scientific publishers who already have an incentive to keep the service running. Check out its FAQ for information about its funding and security.

While many journalists find services and tools such as the Way Back Machine useful there is no way you can rely on it to cite a source. WebCite offers a range of other tools including a ‘WebCite This’ button for bloggers. See here for more on that.

Tags: , ,

Government gives more detail on communications surveillance plan | OUT-LAW.COM

October 20th, 2008 | No Comments | Posted by Colin Meek in Your own privacy

Confused by the recent headlines about Orwellian government surveillance of our online lives? The stories relate to the EU’s Data Retention Directive which must be transposed into UK law soon. The UK, or rather the Home Secretary Jacqui Smith has decided that records of our email, mobile and internet activity should be held centrally in a vast database under Government control. If you think that sounds scarey then you’d be right. Want a really sober assessment of the proposal and links to great background then you can’t do better than Outlaw. Government gives more detail on communications surveillance plan | OUT-LAW.COM

Tags: , , ,

German court says IP addresses in server logs are not personal data | OUT-LAW.COM

October 20th, 2008 | 1 Comment | Posted by Colin Meek in Your own privacy

Publishers are allowed to store internet protocol addresses without violating data protection legislation according to a German court ruling. The decision is a blow to privacy campaigners across Europe who have argued that IP addresses can be used to identify specific individuals. Privacy groups have argued that publishers should not be able to store IP addresses to track patterns of use. The court has argued that publishers cannot identify owners of IP addresses without more information from internet service providers and that can only happen – legally – with court permission. German court says IP addresses in server logs are not personal data | OUT-LAW.COM

Tags:

Ask.com rearms with semantics, rich media in search war

October 13th, 2008 | No Comments | Posted by Colin Meek in Search engines - advanced, Search tools and tricks

Ask.com have improved their search with the natural language semantic technology Direct Answers from Search (DAFS). This ARS Technica post describes the move. Ask.com rearms with semantics, rich media in search war

Tags: ,

Databasing My Social Graph with BatchBook – Mashable

October 13th, 2008 | No Comments | Posted by Colin Meek in People, Search tools and tricks

Looking for a better way to organise and reach your contacts? BatchBook might be useful and Mashable have posted this review.Databasing My Social Graph with BatchBook – Mashable

Tags: ,