←

→

Back to photostream

bootload

2008FEB010000

Who's pushing their data to 'the cloud'?

"... Just ordered a new laptop and I've decided to try to push as much of my data online as I can to make life more flexible. I'm just curious to see who here does this and what do you use if so? ..."

WHY
I do. Any post I make I have squared away. This allows me to make my site the definitive collection of my data. It also allows google to index it. I have control over my own content and if for some reason a third party site wants to exert control I still have my stuff. Unlike slashdot who didn't allow any tools to save posts. Hence I lost from about 1996 to 2002's worth of comments. [0]

HOW
There are 2 ways of looking at this. You can either generate you "stuff" from a single point [0] at your machine, save it and pump it out OR use the web apps as clients and suck the data back via RSS, Atom, JSON etc.
I've been doing a bit of both. Pushing stuff from my blog engine after I've cached it and now I'm beginning to suck up the various websites I frequent. So for pushing out I have:

* OUT: flickr (blog, tags, images) , twitter (snippets), hackerid (hackernews data), links (various links I save including links to hackernews & export to delicious)
* IN: hackernews (all posts every 15m, friends),
* IN TODO: wordy (words I use), spock (tags), colourlovers (colours), librarything (my library), amazon (new books), lastfm (what I'm currently listening to), delicious (new links I find), twitter (friends), flickr (friends, processed images, text, tags)

Now as you can see that's a lot of data. Some of the things I'm finding:

* It's easier to push than pull if you want an accurate copy because you save before you export
* pulling data means you don't have to write firstly the interface to capture the data and simply call RSS
* not every site has an API or good RSS feed
* Linking data together is not easy except by time though you could try to match by friend (ie: friend is on twitter, flickr, hackernews)
* displaying the data effectively is difficult simply because of the volume and complexity of it. A good example of how to do this is http://friendfeed.com Clear, simple and pretty much allows for good reading.

I'm now at a stumbling block with a templating engine I'm using and so I'm pretty keen to just extract the data as Atom, RSS and JSON as individual feeds or a mashed feed by date and write a Javascript based website to avoid having to deal with heavy weight blog engines. Let the data go free and see how people use it.

The key thing to realise is when you are compiling your data timestamp it at DB level (if you are using one) ISO1606 format, maybe add a tag layer over the top so you can get the benefit of tagging across data layers.
It's turning out to be an interesting project.

Some time later ...

"... I've heard of ISO 8601 timestamps but never 1606 (and neither has google it seems). What is that? ..."

You are dead right. I was wrong. I meant "ISO 8601". Late night post and I should have checked ~ http://www.ietf.org/rfc/rfc3339.txt

Some time later ...

"... From this RFC I learned that: ..."

The bit I use is a bastardisation of "1985-04-12T23:20:50.52Z". A shorter version as a string, say "20070202T1422" stripping out the hyphens, colons and Zulu.

And the another reply ...

"... Actually I don't believe "20070202T1422" is a legal 8601 timestamp ..."

20070202T1422 - but close enough for me in 1 timezone and limited space for display. I have a double use as both a (8601 hack) and as a human readable title. Adding extra "-" and "Z" makes it harder to read and the accuracy for seconds is simply not required. Trade-offs I'm willing to make.

Prat.

Reference

[0] http://goonmail.customer.netspace.net.au/2005DEC131709.html

<<< start

2,351 views

2 faves

3 comments

Uploaded on January 31, 2008

Taken on January 19, 2008

2008FEB010000

Who's pushing their data to 'the cloud'?

"... Just ordered a new laptop and I've decided to try to push as much of my data online as I can to make life more flexible. I'm just curious to see who here does this and what do you use if so? ..."

WHY
I do. Any post I make I have squared away. This allows me to make my site the definitive collection of my data. It also allows google to index it. I have control over my own content and if for some reason a third party site wants to exert control I still have my stuff. Unlike slashdot who didn't allow any tools to save posts. Hence I lost from about 1996 to 2002's worth of comments. [0]

HOW
There are 2 ways of looking at this. You can either generate you "stuff" from a single point [0] at your machine, save it and pump it out OR use the web apps as clients and suck the data back via RSS, Atom, JSON etc.
I've been doing a bit of both. Pushing stuff from my blog engine after I've cached it and now I'm beginning to suck up the various websites I frequent. So for pushing out I have:

* OUT: flickr (blog, tags, images) , twitter (snippets), hackerid (hackernews data), links (various links I save including links to hackernews & export to delicious)
* IN: hackernews (all posts every 15m, friends),
* IN TODO: wordy (words I use), spock (tags), colourlovers (colours), librarything (my library), amazon (new books), lastfm (what I'm currently listening to), delicious (new links I find), twitter (friends), flickr (friends, processed images, text, tags)

Now as you can see that's a lot of data. Some of the things I'm finding:

* It's easier to push than pull if you want an accurate copy because you save before you export
* pulling data means you don't have to write firstly the interface to capture the data and simply call RSS
* not every site has an API or good RSS feed
* Linking data together is not easy except by time though you could try to match by friend (ie: friend is on twitter, flickr, hackernews)
* displaying the data effectively is difficult simply because of the volume and complexity of it. A good example of how to do this is http://friendfeed.com Clear, simple and pretty much allows for good reading.

I'm now at a stumbling block with a templating engine I'm using and so I'm pretty keen to just extract the data as Atom, RSS and JSON as individual feeds or a mashed feed by date and write a Javascript based website to avoid having to deal with heavy weight blog engines. Let the data go free and see how people use it.

The key thing to realise is when you are compiling your data timestamp it at DB level (if you are using one) ISO1606 format, maybe add a tag layer over the top so you can get the benefit of tagging across data layers.
It's turning out to be an interesting project.

Some time later ...

"... I've heard of ISO 8601 timestamps but never 1606 (and neither has google it seems). What is that? ..."

You are dead right. I was wrong. I meant "ISO 8601". Late night post and I should have checked ~ http://www.ietf.org/rfc/rfc3339.txt

Some time later ...

"... From this RFC I learned that: ..."

The bit I use is a bastardisation of "1985-04-12T23:20:50.52Z". A shorter version as a string, say "20070202T1422" stripping out the hyphens, colons and Zulu.

And the another reply ...

"... Actually I don't believe "20070202T1422" is a legal 8601 timestamp ..."

20070202T1422 - but close enough for me in 1 timezone and limited space for display. I have a double use as both a (8601 hack) and as a human readable title. Adding extra "-" and "Z" makes it harder to read and the accuracy for seconds is simply not required. Trade-offs I'm willing to make.

Prat.

Reference

[0] http://goonmail.customer.netspace.net.au/2005DEC131709.html

<<< start