Lak11 Week 3 and 4 (and 5): Semantic Web, Tools and Corporate Use of Analytics

Two weeks ago I visited Learning Technologies 2011 in London (blog post forthcoming). This meant I had less time to write down some thoughts on Lak11. I did manage to read most of the reading materials from the syllabus and did some experimenting with the different tools that are out there. Here are my reflections on week 3 and 4 (and a little bit of 5) of the course.

The Semantic Web and Linked Data

This was the main topic of week three of the course. Basically the semantic web has a couple of characteristics. It tries to separate the presentation of the data and the data itself. It does this by structuring the data which then allows linking up all the data. The technical way that this is done is through so-called RDF-triples: a subject, a predicate and an object.

Although he is a better writer than speaker, I still enjoyed this video of Tim Berners-Lee (the inventor of the web) explaining the concept of linked data. His point about the fact that we cannot predict what we are going to make with this technology is well taken: “If we end up only building the things I can imagine, we would have failed“.

[youtube=http://www.youtube.com/watch?v=OM6XIICm_qo]

The benefits of this are easy to see. In the forums there was a lot of discussion around whether the semantic web is feasible and whether it is actually necessary to put effort into it. People seemed to think that putting in a lot of human effort to make something easier to read for machines is turning the world upside down. I actually don’t think that is strictly true. I don’t believe we need strict ontologies, but I do think we could define more simple machine readable formats and create great interfaces for inputting data into these formats.

Use cases for analytics in corporate learning

Weeks ago Bert De Coutere started creating a set of use cases for analytics in corporate learning. I have been wanting to add some of my own ideas, but wasn’t able to create enough “thinking time” earlier. This week I finally managed to take part in the discussion. Thinking about the problem I noticed that I often found it difficult to make a distinction between learning and improving performance. In the end I decided not to worry about it. I also did not stick to the format: it should be pretty obvious what kind of analytics could deliver these use cases. These are the ideas that I added:

  • Portfolio management through monitoring search terms
    You are responsible for the project management portfolio learning portfolio. In the past you mostly worried about “closing skill gaps” through making sure there were enough courses on the topic. In recent years you have switched to making sure the community is healthy and you have switched from developing “just in case” learning intervention towards “just in time” learning interventions. One thing that really helps you in doing your work is the weekly trending questions/topics/problems list you get in your mailbox. It is an ever-changing list of things that have been discussed and searched for recently in the project management space. It wasn’t until you saw this dashboard that you noticed a sharp increase in demand for information about privacy laws in China. Because of it you were able to create a document with some relevant links that you now show as a recommended result when people search for privacy and China.
  • Social Contextualization of Content
    Whenever you look at any piece of content in your company (e.g. a video on the internal YouTube, an office document from a SharePoint site or news article on the intranet), you will not only see the content itself, but you will also see which other people in the company have seen that content, what tags they gave it, which passages they highlighted or annotated and what rating they gave the piece of content. There are easy ways for you to manage which “social context” you want to see. You can limit it to the people in your direct team, in your personal network or to the experts (either as defined by you or by an algorithm). You love the “aggregated highlights view” where you can see a heat map overlay of the important passages of a document. Another great feature is how you can play back chronologically who looked at each URL (seeing how it spread through the organization).
  • Data enabled meetings
    Just before you go into a meeting you open the invite. Below the title of the meeting and the location you see the list of participants of the meeting. Next to each participant you see which other people in your network they have met with before and which people in your network they have emailed with and how recent those engagements have been. This gives you more context for the meeting. You don’t have to ask the vendor anymore whether your company is already using their product in some other part of the business. The list also jogs your memory: often you vaguely remember speaking to somebody but cannot seem to remember when you spoke and what you spoke about. This tools also gives you easy access to notes on and recordings of past conversations.
  • Automatic “getting-to-know-yous”
    About once a week you get an invite created by “The Connector”. It invites you to get to know a person that you haven’t met before and always picks a convenient time to do it. Each time you and the other invitee accept one of these invites you are both surprised that you have never met before as you operate with similar stakeholders, work in similar topics or have similar challenges. In your settings you have given your preference for face to face meetings, so “The Connector” does not bother you with those video-conferencing sessions that other people seem to like so much.
  • “Train me now!”
    You are in the lobby of the head office waiting for your appointment to arrive. She has just texted you that she will be 10 minutes late as she has been delayed by the traffic. You open the “Train me now!” app and tell it you have 8 minutes to spare. The app looks at the required training that is coming up for you, at the expiration dates of your certificates and at your current projects and interests. It also looks at the most popular pieces of learning content in the company and checks to see if any of your peers have recommended something to you (actually it also sees if they have recommended it to somebody else, because the algorithm has learned that this is a useful signal too), it eliminates anything that is longer than 8 minutes, anything that you have looked at before (and haven’t marked as something that could be shown again to you) and anything from a content provider that is on your blacklist. This all happens in a fraction of a second after which it presents you with a shortlist of videos for you to watch. The fact that you chose the second pick instead of the first is of course something that will get fed back into the system to make an even better recommendation next time.
  • Using micro formats for CVs
    The way that a simple structured data format has been used to capture all CVs in the central HR management system in combination with the API that was put on top of it has allowed a wealth of applications for this structured data.

There are three more titles that I wanted to do, but did not have the chance to do yet.

  • Using external information inside the company
  • Suggested learning groups to self-organize
  • Linking performance data to learning excellence

Book: Head First Data Analytics

I have always been intrigued by O’Reilly’s Head First series of books. I don’t know any other publisher who is that explicit about how their books try to implement research based good practices like an informal style, repetition and the use of visuals. So when I encountered Data Analysis in the series I decided to give it a go. I wrote the following review on Goodreads:

The “Head First” series has a refreshing ambition: to create books that help people learn. They try to do this by following a set of evidence-based learning principles. Things like repetition, visual information and practice are all incorporated into the book. This good introduction to data analysis, in the end only scratches the surface and was a bit too simplistic for my taste. I liked the refreshers around hypothesis testing, solver optimisation in Excel, simple linear regression, cleaning up data and visualisation. The best thing about the book is how it introduced me to the open source multi-platform statistical package “R”.

Learning impact measurement and Knowledge Advisers

The day before Learning Technologies, Bersin and KnowledgeAdvisors organized a seminar about measuring the impact of learning. David Mallon, analyst at Bersin, presented their High-Impact Measurement framework.

Bersin High-Impact Measurement Framework
Bersin High-Impact Measurement Framework

The thing that I thought was interesting was how the maturity of your measurement strategy is basically a function of how much your learning organization has moved towards performance consulting. How can you measure business impact if your planning and gap analysis isn’t close to the business?

Jeffrey Berk from KnowledgeAdvisors then tried to show how their Metrics that Matter product allows measurement and then dashboarding around all the parts of the Bersin framework. They basically do this by asking participants to fill in surveys after they have attended any kind of learning event. Their name for these surveys is “smart sheets” (an much improved iteration of the familiar “happy sheets”). KnowledgeAdvisors has a complete software as a service based infrastructure for sending out these digital surveys and collating the results. Because they have all this data they can benchmark your scores against yourself or against their other customers (in aggregate of course). They have done all the sensible statistics for you, so you don’t have to filter out the bias on self-reporting or think about cultural differences in the way people respond to these surveys. Another thing you can do is pull in real business data (think things like sales volumes). By doing some fancy regression analysis it is then possible to see what part of the improvement can be attributed with some level of confidence to the learning intervention, allowing you to calculate return on investment (ROI) for the learning programs.

All in all I was quite impressed with the toolset that they can provide and I do think they will probably serve a genuine need for many businesses.

The best question of the day came from Charles Jennings who pointed out to David Mallon that his talk had referred to the increasing importance of learning on the job and informal learning, but that the learning measurement framework only addresses measurement strategies for top-down and formal learning. Why was that the case? Unfortunately I cannot remember Mallon’s answer (which probably does say something about the quality or relevance of it!)

Experimenting with Needlebase, R, Google charts, Gephi and ManyEyes

The first tool that I tried out this week was Needlebase. This tool allows you to create a data model by defining the nodes in the model and their relations. Then you can train it on a web page of your choice to teach it how to scrape the information from the page. Once you have done that Needlebase will go out to collect all the information and will display it in a way that allows you to sort and graph the information. Watch this video to get a better idea of how this works:

[youtube=http://www.youtube.com/watch?v=58Gzlq4zSDk]

I decided to see if I could use Needlebase to get some insights into resources on Delicious that are tagged with the “lak11” tag. Once you understands how it works, it only takes about 10 minutes to create the model and start scraping the page.

I wanted to get answers to the following questions:

  • Which five users have added the most links and what is the distribution of links over users?
  • Which twenty links were added the most with a “lak11” tag?
  • Which twenty links with a “lak11” tag are the most popular on Delicious?
  • Can the tags be put into a tag cloud based on the frequency of their use?
  • In which week were the Delicious users the most active when it came to bookmarking “lak11” resources?
  • Imagine that the answers to the questions above would be all somebody were able to see about this Knowledge and Learning Analytics course. Would they get a relatively balanced idea about the key topics, resources and people related to the course? What are some of the key things that would they would miss?

Unfortunately after I had done all the machine learning (and had written the above) I learned that Delicious explicitly blocks Needlebase from accessing the site. I therefore had to switch plans.

The Twapperkeeper service keeps a copy of all the tweets with a particular tag (Twitter itself only gives access to the last two weeks of messages through its search interface). I manage to train Needlebase to scrape all the tweets, the username, URL to user picture and userid of the person adding the tweet, who the tweet was a reply to, the unique ID of the tweet, the longitude and latitude, the client that was used and the date of the tweet.

I had to change my questions too:

Another great resource that I re-encountered in these weeks of the course was the Rosling’s Gapminder project:

[youtube=http://www.youtube.com/watch?v=BPt8ElTQMIg]

Google has acquired some part of that technology and thus allows a similar kind of visualization with their spreadsheet data. What makes the data smart is the way that it shows three variables (x-axis, y-axis and size of the bubble and how they change over time. I thought hard about how I could use the Twitter data in this way, but couldn’t find anything sensible. I still wanted to play with the visualization. So at the World Bank’s Open Data Initiative I could download data about population size, investment in education and unemployment figures for a set of countries per year (they have a nice iPhone app too). When I loaded that data I got the following result:

Click to be able to play the motion graph
Click to be able to play the motion graph

The last tool I installed and took a look at was Gephi. I first used SNAPP on the forums of week and exported that data into an XML based format. I then loaded that in Gephi and could play around a bit:

Week 1 forum relations in Gephi
Week 1 forum relations in Gephi

My participation in numbers

I will have to add up my participation for the two (to three) weeks, so in week 3 and week 4 of the course I did 6 Moodle posts, tweeted 3 times about Lak11, wrote 1 blogpost and saved 49 bookmarks to Diigo.

The hours that I have played with all the different tools mentioned above are not mentioned in my self-measurement. However, I did really enjoy playing with these tools and learned a lot of new things.

Lak11 Week 1: Introduction to Learning and Knowledge Analytics

Every week I will try and write down some reflections on the Open Online Course: Learning and Knowledge Analytics. These will by written for myself as much as for anybody else, so I have to apologise in advance about the fact that there will be nearly no narrative and a mix between thoughts on the contents of the course and on the process of the course.

So what do I have to write about this week?

My tooling for the course

There is a lot of stuff happening in these distributed courses and keeping up with the course required some setup and preparation on my side (I like to call that my “tooling”). So what tools do I use?

A lot of new materials to read are created every day: Tweets with the #lak11 hashtag, posts in all the different Moodle forums, Google groups and Learninganalytics.net messages from George Siemens and Diigo/Delicious bookmarks. Thankfully all of these information resources create RSS feeds and I have been able to add them all to special-made Lak11 folder in my Google Reader (RSS feed). That folder sorts its messages based on time (oldest first) allowing me some understanding of the temporal aspects of the course and making sure I read a reply after the original message. A couple of times a day I use the excellent MobileRSS reader on my iPad to read through all the messages.

There is quite a lot of reading to do. At the beginning of the week I read through the syllabus and make sure that I download all the PDF files to GoodReader on the iPad. All web articles are stored for later reading using the Instapaper service. I have given both GoodReader and Instapaper Lak11 folders. I do most of the reading of these articles on the train. GoodReader allows me to highlight passages and store bookmarks in the PDF file itself. With Instapaper thus is a bit more difficult: when I read a very interesting paragraph I have to highlight it and email it to myself for later processing.

Each and every resource that I touch for the course gets its own bookmark on Diigo. Next to the relevant tags for the resource I also tag them with lak11 and weekx (where x is the number of the week) and share them to the Learning Analytics group on Diigo. These will provide me with a history of the interesting things I have seen during the course and should help me in writing a weekly reflective post.

So far the “consumer” side of things. As a “producer” I participate in the Moodle forums. I can easily find back all my own posts through my Moodle profile and I hope to use some form of screen-scraper at the end of the course to pull a copy of everything that I have written. I use this Worpress.com hosted blog to write and reflect on the course materials and tag my course-related post with “lak11” so that show up on their own page (and have their own feed in case you are interested). On Twitter I occasionally tweet with #lak11, mostly to refer to a Moodle- or blog post that I have written or to try and ask the group a direct question.

What is missing? The one thing that I don’t use yet is something like a mind mapping or a concept mapping tool. The syllabus recommends VUE and CMAP and one of the assignments each week is to keep updating a map for the course. These tools don’t seem to have an iPad equivalent. There is some good mind mapping tools for the iPad (my favourite is probably iThoughtsHD, watch this space for a mind mapping comparison of iPad apps), but I don’t seem to be able to add using it into my workflow for the course. Maybe I should just try a little harder.

My inability to “skim and dive”

This week I reconfirmed my inability to “skim and dive”. For these things I seem to be an all or nothing guy. There are magazines that I read completely from the first page to the last page (e.g. Wired). This course seems to be one of these things too. I read every single thing. It is a bit much currently, but I expect the volume of Moodle and Twitter messages to go down quite significantly as the course progresses. So if I can just about manage now, it should become relatively easy later on.

The readings of this week

There were quite a few academic papers in the readings of this week. Most of them provided an overview of education datamining or academic/learning analytics. Many of the discussions in these papers seemed quite nominal to me. They probably are good references to keep and have a wealth of bibliographical materials that I could look at at some point in the future. For now, they lacked any true new insights for me and appeared to be pretty basic.

Live sessions

Unfortunately I wasn’t able to attend any of the Elluminate sessions and I haven’t listened to them yet either. I hope to catch up this week with the recordings and maybe even attend the guest speaker live tomorrow evening.

Marginalia

It has been a while since I last actively participated in a Moodle facilitated course. Moodle has again proven to be a very effective host for forum based discussions. One interesting Moodle add-on that I had not seen before is Marginalia a way to annotate forum posts in Moodle itself which can be private or public. Look at the following Youtube video to see it in action.

[blip.tv ?posts_id=4054581&dest=-1]

I wonder if I will use it extensively in the next few weeks.

Hunch

One thing that we were asked to try out as an activity was Hunch. For me it was interesting to see all the different interpretations that people in the course had about how to pick up this task and what the question (What are the educational uses of a Hunch-like tool for learning?) actually meant. A distributed course like this creates a lot of redundancy in the answers. I also noted that people kept repeating a falsehood (needing to use Twitter/Facebook to log in). My explanation of how Hunch could be used by the weary was not really picked up. It is good to be reminded at times that most people in the world do not share my perspective on computers and my literacy with the medium. Thinking otherwise is a hard to escape consequence of living in a techno-bubble with the other “digerati”.

I wrote the following on the topic (in the Moodle forum for week 1):

Indeed the complete US-centricness of the service was the first thing that I noticed. I believe it asked me at some point on what continent I am living. How come it still asks me questions to which I would never have an answer? Are these questions crowdsourced too? Do we get them randomly or do we get certain questions based on our answers? It feels like the former to me.

The recommendations that it gave me seemed to be pretty random too. The occasional hit and then a lot of misses. I had the ambition to try out the top 5 music albums it would recommend me, but couldn’t bear the thought of listening to all that rock. This did sneak a little thought into my head: could it be that I am very special? Am I so eclectic that I can defeat all data mining effort. Am I the Napoleon Dynamite of people? Of course I am not, but the question remains: does this work better for some people than for others.

One other thing that I noticed how the site seemed to use some of the tricks of an astrologer: who wouldn’t like “Insalata Caprese”, seems like a safe recommendation to me.

In the learning domain I could see an application as an Electronic Performance Support System. It would know what I need in my work and could recommend the right website to order business cards (when it sees I go to a conference) or an interesting resource relating to the work that I am doing. Kind of like a new version of Clippy, but one that works.

BTW, In an earlier blogpost I have written about how recommendation systems could turn us all into mussels (although I don’t really believe that).

Corporate represent!

Because of a very good intervention by George Siemens, the main facilitator of the course, we are now starting to have a good discussion about analytics in corporate situations here. The corporate world has learning as a secondary process (very much as a means to a goal) and that creates a slightly different viewpoint. I assume the corporate people will form their own subgroup in some way in this course. Before the end of next week I will attempt to flesh out some more use cases following Bert De Coutere’s examples here.

Bersin/KnowledgeAdvisors Lunch and Learn

At the end of January I will be attending a free Bersin/KnowledgeAdvisors lunch and learn titled Innovation in Learning Measurement – High Impact Measurement Framework in London (this is one day before the Learning Technologies 2011 exhibit/conference). I would love to meet other Lak11 participants there. Will that happen?

My participation in numbers

Every week I will try and give a numerical update about my course participation. This week I bookmarked 33 items on Diigo, wrote 10 Lak11 related tweets, wrote 25 Moodle forums post and 2 blog posts.

Workflow Driven Apps Versus App Driven Workflow

Arjen Vrielink and I write a monthly series titled: Parallax. We both agree on a title for the post and on some other arbitrary restrictions to induce our creative process. This month we write about how the constant flux of new apps and platforms influences your workflow. We do this by (re-)viewing our workflow from different perspectives. After a general introduction we write a paragraph of 200 words each from the perspective of 1. apps, 2. platform and 3. workflow itself. You can read Arjen’s post with the same title here.

Instapaper on my iPhone
Instapaper on my iPhone

To me a workflow is about two things mainly: the ability to capture things and the ability to time-shift. Both of these need to be done effectively and efficiently. So let’s take a look at three separate processes and see how they currently work for me: task/todo management, sharing with others and reading news and interesting articles (not books). So how do I work nowadays for each of these three things?

Workflow
I use Toodledo for my task/todo management. Whenever I “take an action” or think of something that I need to do at some point in the future I fire up Toodledo and jot it down. Each item is put in a folder (private, work, etc.), gets a due date (sometimes with a timed reminder to email if I really cannot forget to do it) and is given a priority (which I usually ignore). At the beginning and end of every day I run through all the tasks and decide in my head what will get done.

For me it important to share what I encounter on the web and my thoughts about that with the rest of the world. I do this in a couple of different ways: explicitly through Twitter, through Twitter by using a Bit.ly sidebar in my Browser, in Yammer if it is purely for work, on this WordPress.com blog, through public bookmarks on Diigo, by sending a direct email or by clicking the share button in Google Reader.

I have subscribed to 300+ RSS feeds and often when I am scanning them and find something interesting and I don’t have the opportunity to read it at that time. I use Instapaper to capture these articles and make them available for easy reading later on. Instapaper doesn’t work with PDF based articles so I send those to a special email address so that I can pick them up with my iPad and save them to GoodReader when it is convenient.

Platform
“Platform” can have multiple meanings. The operating system was often called a platform. When you heavily invested into one platform it would become difficult to do any of your workflows with a different platform (at my employer this has been the case for many years with Microsoft and Exchange: hard to use anything else). Rich web applications have now turned the Internet itself into a workflow platform. This makes the choice for an operating system nearly, if not totally, irrelevant. I regularly use Ubuntu (10.04, too lazy to upgrade so far), Windows Vista (at work) and iOS (both on the iPhone and the iPad). All of the products and services mentioned either have specialised applications for the platform or are usable through any modern web browser. The model I prefer right now is one where there is transparent two-way synching between a central server/service and the different local apps, allowing me access to my latest information even if I am not online (Dropbox for example uses this model and is wonderful).

What I have noticed though, is that I have strong preferences for using a particular platform (actually a particular device) for doing certain tasks. The iPad is my preference for any reading of news or of articles: the “paginate” option on Instapaper is beautiful. Sharing is best done with something that has a decent keyboard and Toodledo is probably used the most with my iPhone because that is usually closest at hand.

Apps
Sharing is a good example of something where the app drives my behaviour very much: the app where I initially encounter the thing I want to share needs to support the sharing means of choice. This isn’t optimal at all: if I read something interesting in MobileRSS on the iPad that I want to share on Yammer, then I usually email the link from MobileRSS to my work email address, once at work I copy it from my mail client into the Browser version of Yammer and add my comments. This is mainly because Yammer (necessarily) has to be a closed off to the rest of the world with its APIs.

Services that create the least hickups in my workflow are those that have a large separation between the content/data of the service and the interface. Google Reader and Toodledo both provide very complete APIs that allow anybody to create an app that accesses the data and displays it in a smart way. The disadvantage of these services is that I am usually dependent on a single provider for the data. In the long term this is probably not sustainable. Things like Unhosted are already pointing to the future: an even stricter separation between data and app. Maybe in that future, the workflow can start driving the app instead of the other way around.