I am researching the possibility of building a portal for an organization. The site would grab about 200 different RSS feeds daily. Some of them will need to be updated hourly. Drupal has extensive aggregator functions, but I would prefer to build this site in EE.
I can’t use Magpie because the pages strain under pulling multiple feeds, but FeedGrab may be able to do what I want. One problem I need to overcome is the automation of grabbing feeds. I have seen in another post a reference to using the EE cron plugin with FeedGrab. Based on what I read, my code might look something like this:
{exp:cron minute="30" hour="" day="" month="*" plugin="feedgrab:FeedGrab"}
{exp:feedgrab url=”path_to_rss_feed” weblog=”id_number” title=”title” date=”pubDate” use=”link|description” fields=”rss_url|rss_body” }
{/exp:cron}
The cron runs once an hour every day and every month. I have the impression from the cron documentation that I need to define both the name of the plugin and the function that needs to be called.
Unfortunately, this doesn’t work, and other variations don’t work, either. Can anyone offer suggestions?
Hello, I’m getting duplicate entries, but only on the 3 newest entries:
Here is my code:
{exp:feedgrab url="http://api.flickr.com/services/feeds/photos_public.gne?id=96729633@N00&tags=unexpectedlyquitcom&lang=en-us&format=rss_200"
weblog="1"
title="title"
date="pubDate"
use="media:content@url|media:content@height|media:content@width|link"
fields="flickrimageurl|flickrimageheight|flickrimagewidth|flickrurl"
unique="date,flickrurl”}
I thought adding the unique setting would fix it but no dice. Even weirder is that it doesn’t always do it.
You can see the site here: http://www.unexpectedlyquit.com
Great news!
At least for anyone interested in grabbing lower-level information from RSS feeds (like klick and me above).
I figured out a way for FeedGrab to be able to grab this information by using two Yahoo Pipes. For my purposes I was able to grab out my FriendFeed comments and feed them into my database.
The whole story, with links to the Pipes so you can mash your own, can be found here:
Integrating My FriendFeed Comments Into My Personal Blog
Hope this saves someone time in the future.
I think you would use Magpie for that (built into EE if I’m not mistaken).
I think the difference is:
FeedGrab is for taking the information in RSS feeds and putting them in your weblog. Magpie is for simply displaying RSS feeds within your template (no information copied over).
I could be wrong, but that is my understanding.
I’m not sure why you are getting duplicates. This is how I import flickr feeds:
{exp:feedgrab
url="http://api.flickr.com/services/feeds/photos_public.gne?id=25509357@N00&format=rss_200"
weblog="1"
title="title"
date="dc:date.Taken"
use="link|media:content@url|media:thumbnail@url|description|guid"
fields="flickr_link|flickr_image|flickr_thumbnail|flickr_description|flickr_guid"
unique="flickr_guid"
category_field="media:category"
category_group="2"
category_delimiter="SPACE"
}
I use the guid field as the unique value. Let me know if this helps.
Just so everyone knows, now that the excitement has died down with my whole Yahoo Pipes solution in combination with FeedGrab - it’s not really working out as well as I thought. There is some caching issue with Yahoo Pipes that isn’t allowing the feeds to be updated. Consequently it doesn’t work so well.
I’m hoping XMLGrab will be the solution, if not I’ll have to write my own plugin (which I’ve never done!)
Regarding Travis’s duplicate issue above (http://ellislab.com/forums/viewthread/37598/P144/#433002):
It appears that the text of values inserted into the database is urlencoded. If you’re using a URL value as your unique identifier, and if the URL includes certain characters, is_entry_unique() will always return true, even if that URL is already present in the database (“%2b” does not equal “+”, for example).
We added a urldecode to the is_entry_unique() function (at appx. line 586-588):
// MODIFIED BY JUSTIN CRAWFORD
//$sql .= " AND " . $name . "=\"" . $DB->escape_str( $post[ $value ] ) . "\"";
$sql .= " AND " . $name . "=\"" . $DB->escape_str( urldecode( $post[ $value ] ) ) . "\"";
-Justin
Update: You can ignore the following post. I found what I needed to fix it.
I built out test pages on our existing site (version 1.6.3), and everything worked great. Then I copied the pages over to a new site (v1.6.4) along with the plugin. Suddenly on the new site I’m getting url output errors.
This code:
<div class=”rss_description”>{rss_body}</div>
Works correctly on the first site. But on the second site, {rss_url} is outputting a complete path including the <a > so that I end up with two <a >s.
I have looked over the admin options for setting url output but don’t see anything that would indicate why I’m getting the errors. Any suggestions?
And I’m with klick above - need to be able to access that lower level data in FriendFeed AND be able to extract the two different titles from Google Reader shared items which currently look like this:
still haven’t figuered that out in detail. 😉
plus having another problem: my delicious feed keeps sticking empty entries (all with timestamp 0100) between the other weblog entries when updating the “plugin call” template.
it also echoes this error code:
Notice: strtotime() [function.strtotime]: Called with empty time parameter in …/xx/plugins/pi.xmlgrab.php on line 489
this causes my feeds to break. if anyone has got a clue what’s happening let me know 😊
ah and the twitter stream has some character encoding problems. i’m on that.
besides that: wonderful plugin! thanks!
best klick
Does this look right?
{exp:cron minute="30" hour="*" day="*" month="*" plugin="feedgrab:FeedGrab"}
{exp:feedgrab url="my-rss"
weblog="70"
title="title"
date="pubDate"
use="link|description"
fields="music-url|music-body" }
{/exp:cron}
I’ve been trying to figure out how the cron plugin works with FeedGrab for the last 3 hours and I haven’t been able to find an answer. Can someone please tell me if this is right because it doesn’t seem to work?
Thanks for any help.
While I can’t tell you if the parameters are correct, I can tell you that the ee:cron only works if the page is visited frequently. Its not a true cron, but just checks the timespan between the page views. so, if you have a test page nobody is visiting, ee:cron will never fire regardless of the time passed by. I use a real cron job to make sure the page is visited regularly.
Markus
Packet Tide owns and develops ExpressionEngine. © Packet Tide, All Rights Reserved.