Commit Graph

28 Commits

Author SHA1 Message Date
Gerhard Schlager a115aae45f Use rchardet instead of charlock_holmes gem 2018-08-01 10:41:20 +02:00
Gerhard Schlager 5d421fb946 FIX: Try respecting charset in HTTP header of RSS feed 2018-08-01 10:41:20 +02:00
Gerhard Schlager ff942ed2f3 FIX: Try detecting encoding of RSS feed 2018-08-01 10:41:20 +02:00
Arpit Jalan 1841dd48dc FIX: revert utf-8 encode changes 2018-05-20 17:50:36 +05:30
Arpit Jalan 9512796ef6 FIX: check for blank response when polling feed 2018-05-18 17:00:44 +05:30
Arpit Jalan 146f634c8f FIX: UTF-8 encode feed response body 2018-05-16 12:59:23 +05:30
Sam d8b4627fc8 we have to define this for tests to pass 2018-02-15 13:30:34 +11:00
Sam b5b866aab3 oops 2018-02-15 13:13:31 +11:00
Sam c89b42c488 PERF: only require the rss library if used
Before:

Total allocated: 257909321 bytes (2514134 objects)
Total retained:  39681579 bytes (343387 objects)

allocated memory by gem
-----------------------------------
  42875979  rss

retained memory by gem
-----------------------------------
   2080188  rss

retained objects by gem
-----------------------------------
     13052  rss

After:

Total allocated: 210562047 bytes (2252030 objects)
Total retained:  37433816 bytes (328635 objects)

----

So, 2 less megabytes on boot and 13000 objects stuck in ruby heaps forever.
2018-02-15 13:11:33 +11:00
Kyle Zhao 5f318a5241 FEATURE: Replace SimpleRSS with Ruby RSS module (#5311)
* SPEC: PollFeedJob parsing atom feed

* add FeedItemAccessor

It is to provide a consistent interface to access a feed item's tag
content.

* add FeedElementInstaller

to install non-standard and non-namespaced feed elements

* FEATURE: replace SimpleRSS with Ruby RSS module

* get FinalDestination and download with Excon

* support namespaced element with FeedElementInstaller
2017-12-06 10:45:09 +11:00
Kyle Zhao ac666ddf17 PollFeed: check 'content:encoded' for content first 2017-10-02 01:16:11 -04:00
Neil Lalonde 9edc490d3f FIX: remove memoized values in jobs 2017-05-22 16:26:30 -04:00
Robin Ward 4db76796b9 FEATURE: Setting to poll feeds more frequently 2017-05-10 14:30:12 -04:00
Robin Ward 6e4ba8a33e Catch RSS Parsing errors 2017-05-09 15:07:06 -04:00
Robin Ward 424fc8e2e2 FIX: Don't raise an error if the RSS endpoint is 404 2016-12-05 12:29:14 -05:00
Matt Palmer 394cd43d77 Scrub only after converting strings to UTF-8
Scrubbing an ASCII-8BIT string isn't ever going to remove anything, because
there's no code point that isn't valid 8-bit ASCII.  Since we'd really
prefer it if everything were UTF-8 anyway, we'll just assume, for now, that
whatever comes out of SimpleRSS is probably UTF-8, and just nuke anything
that isn't a valid UTF-8 codepoint.

Of course, the *real* bug here is that SimpleRSS [unilaterally converts
everything to
ASCII-8BIT](https://github.com/cardmagic/simple-rss/issues/15).  It's
presumably *far* too much to ask that it detects the encoding of the source
RSS feed and marks the parsed strings with the correct encoding...
2016-08-25 16:09:07 +10:00
Guo Xiang Tan bc4087b9bb FIX: RSS description might be `nil`. 2016-03-07 17:42:17 +08:00
Robin Ward 40934e595a FIX: Some RSS feeds do unsafe redirects
There are people who have RSS feeds set up that do HTTPS -> HTTP
redirects which throw errors. Since RSS feeds are all configured
by admins I think it's OK if they allow an unsafe redirect as the
content is public anyway. This will reduce many server side errors.
2015-09-18 13:25:09 -04:00
5minpause 4ee1bc6320 Changes RSS item creation to prevent encoding errors
SimpleRss is unreliable with parsing RSS feeds that contain German Umlauts.
For example this feed http://www.lauffeuer-lb.de/api/v2/articles.xml can't be
parsed by SimpleRss. Discourse's logs are full of

```
Job exception: Wrapped Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
Job exception: incompatible character encodings: ASCII-8BIT and UTF-8
```

The embedding fails because the feed can't be parsed.

This change forces the encoding (using #scrub)  which prevents the numerous
encoding errors.
2015-07-20 14:30:42 +02:00
Neil Lalonde 4e857bfa12 FIX: add http:// to feed_polling_url if it's missing 2014-08-19 17:57:14 -04:00
Sam 198731de23 FIX: 100% cpu while parsing feeds 2014-07-02 13:53:04 +10:00
Justin Leveck a78df3d57d Add custom embed_by_username feature
Feature to allow each imported post to be created using a different discourse
username. A possible use case of this is a multi-author blog where discourse
is being used to track comments. This feature allows authors to receive
updates when someone leaves a comment on one of their articles because each of
the imported posts can be created using the discourse username of the author.
2014-06-09 12:35:38 -07:00
Louis Rose 1574485443 Perform the where(...).first to find_by(...) refactoring.
This refactoring was automated using the command: bundle exec "ruby refactorings/where_dot_first_to_find_by/app.rb"
2014-05-06 14:41:59 +01:00
Sam e1f293ad66 FEATURE: new scheduler
Removed sidetiq, introduced new scheduler

- add basic UI
- add schedule discover
- add scheduling in initializer
2014-02-06 10:26:16 +11:00
Robin Ward ed2e53bb06 FIX: Support feeds with `description` as well as `content` 2014-01-02 14:29:27 -05:00
Robin Ward 4f8aed295a FEATURE: Embeddable Discourse comments, now with simple-rss instead of feedzirra 2013-12-31 15:01:22 -05:00
Robin Ward 62db063e1e Revert "Support for Embeddable Comments via IFRAME" - it depends on Curl
which not every server has. Have to rethink this.

This reverts commit e3e4c62887.
2013-12-31 12:52:31 -05:00
Robin Ward e3e4c62887 Support for Embeddable Comments via IFRAME 2013-12-31 12:26:24 -05:00