Commit Graph

106 Commits

Author SHA1 Message Date
Natalie Tay 489aac3fdd
FIX: Disallow table cells to be weighted actual articles can be main content (#27508)
For Topic Embeds, we would prefer <article> to be the main article in a topic, rather than a table cell <td> with potentially a lot of data. However, in an example URL like here, the table cell (the very large code snippet) is seen as the Topic Embed's article due to the determined content weight by the Readability library we use.

In the newly released 0.7.1 cantino/ruby-readability#94, the library has a new option to exclude the library's default <td> element into content weighting. This is more in line with the original library where they only weighted <p>. So this PR excludes the td, as seen in the tests, to allow the actual article to be seen as the article. This PR also adds the details tag into the allow-list.
2024-06-19 09:50:49 +08:00
Sérgio Saquetim 766231b102
FIX: Prevent crash importing topics on a tagged embeddable host (#27254) 2024-05-30 12:04:36 -03:00
Loïc Guitaut 2a28cda15c DEV: Update to lastest rubocop-discourse 2024-05-27 18:06:14 +02:00
Jean 63b7a36fac
FEATURE: Extend embeddable hosts with Individual tags and author assignments (#26868)
* FEATURE: Extend embeddable hosts with tags and author assignments
2024-05-16 15:47:01 -04:00
Sérgio Saquetim 175d2451a9
DEV: Add `topic_embed_import_create_args` plugin modifier (#26527)
* DEV: Add `topic_embed_import_create_args` plugin modifier

This modifier allows a plugin to change the arguments used when creating
 a new topic for an imported article.

For example: let's say you want to prepend "Imported: " to the title of
every imported topic. You could use this modifier like so:

```ruby
# In your plugin's code
plugin.register_modifier(:topic_embed_import_create_args) do |args|
  args[:title] = "Imported: #{args[:title]}"
  args
end
```

In this example, the modifier is prepending "Imported: " to the `title` in the `create_args` hash. This modified title would then be used when the new topic is created.
2024-04-05 12:37:53 -03:00
Angus McLeod 7dc552c9cc
DEV: Add `import_embed_unlisted` site setting (#26222) 2024-03-27 08:57:43 -04:00
Rafael dos Santos Silva 13735f35fb
FEATURE: Cache embed contents in the database (#25133)
* FEATURE: Cache embed contents in the database

This will be useful for features that rely on the semantic content of topics, like the many AI features



Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
2024-01-05 10:09:31 -03:00
Angus McLeod 95c61b88dc
Apply embed unlisted setting consistently (#24294)
Applies the embed_unlisted site setting consistently across topic embeds, including those created via the WP Discourse plugin. Relatedly, adds a embed exception to can_create_unlisted_topic? check. Users creating embedded topics are not always staff.
2023-12-12 09:35:26 -05:00
Krzysztof Kotlarek 5f20748e40 SECURITY: SSRF vulnerability in TopicEmbed
Block redirects when making the final request in TopicEmbed to prevent Server Side Request Forgery (SSRF)
2023-11-09 13:39:08 +11:00
Blake Erickson 644dded000
SECURITY: Use canonical url for topic embeddings (#22085)
This prevents duplicate topics from being created when using embed_urls
that only differ on query params.
2023-06-13 11:08:08 -06:00
Sam 2ccc5fc66e
FEATURE: add support for figure and figcaption tags in embeddings (#21276)
Many blog posts use these to illustrate and images were previously omitted

Additionally strip superfluous HTML and BODY tags from embed HTML.

This was incorrectly returned from server.
2023-04-27 19:57:06 +10:00
Ted Johansson f3f30d6865
SECURITY: Encode embed url (#21133)
The embed_url in "This is a companion discussion..." could be used for
XSS.

Co-authored-by: Blake Erickson <o.blakeerickson@gmail.com>
2023-04-18 15:05:29 +08:00
Bianca Nenciu ccb345bd88
FEATURE: Update topic/comment embedding parameters (#20181)
This commit implements many changes to topic and comments embedding. It
deprecates the class_name field from EmbeddableHost and suggests using
the className parameter. discourse_username parameter has been
deprecated and it will fetch it from embedded site from the author or
discourse-username meta.

See the updated code sample from Admin > Customize > Embedding page.

* FEATURE: Add className parameter for Discourse embed

* DEV: Hide class_name from EmbeddableHost

* DEV: Deprecate class_name field of EmbeddableHost

* FEATURE: Use either author or discourse-username meta tag

* DEV: Deprecate discourse_username parameter

* DEV: Improve embed code sample
2023-02-28 14:31:59 +02:00
Ted Johansson 25a226279a
DEV: Replace #pluck_first freedom patch with AR #pick in core (#19893)
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.

There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
2023-02-13 12:39:45 +08:00
Daniel Waterworth 666536cbd1
DEV: Prefer \A and \z over ^ and $ in regexes (#19936) 2023-01-20 12:52:49 -06:00
David Taylor 5a003715d3
DEV: Apply syntax_tree formatting to `app/*` 2023-01-09 14:14:59 +00:00
David Taylor 3c81683955 DEV: Rename `UriHelper.escape_uri` to `.normalized_encode`
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
2022-08-09 11:55:25 +01:00
Gerhard Schlager f3b2ee8e1b
FIX: Use default locale for footer of embedded topics (#17760)
The content from the remote site and the footer get cached for 10 minutes, so Discourse should use the default locale instead of the user locale for the footer. Otherwise Discourse might cache the message in a different language.
2022-08-02 20:49:28 +02:00
Loïc Guitaut 357011eb3b DEV: Clean up freedom patches
This patch removes some of our freedom patches that have been deprecated
for some time now.
Some of them have been updated so we’re not shipping code based on an
old version of Rails.
2022-04-06 10:07:14 +02:00
Alan Guo Xiang Tan e4e37257cc FIX: Handle malformed URLs in `TopicEmbed.absolutize_urls`. 2022-01-21 11:18:54 +08:00
Bianca Nenciu cc1b45f58b
FIX: Convert URLs embedded topics to absolute form (#14975)
Sometimes the expanded post contained broken relative URLs because they
were not converted to their absolute form.
2021-11-17 16:39:49 +11:00
Dan Ungureanu 69f0f48dc0
DEV: Fix rubocop issues (#14715) 2021-10-27 11:39:28 +03:00
Yasuo Honda 2944d2cdd6
FIX: Ruby 3 does not freeze interpolated string (#14567)
Ruby 2.7 or earlier `+contents` returns self.dup
when `frozen_string_literal: true`. However, Ruby 3.0 returns self
because this string is interpolated one, which is not frozen anymore.

This commit uses self.dup to return duplicated string regardless Ruby
versions.
https://bugs.ruby-lang.org/issues/17104
2021-10-11 13:20:18 +11:00
Roman Rizzi 4c2d5158c5
FIX: Follow the canonical URL when importing a remote topic. (#14489)
FinalDestination now supports the `follow_canonical` option, which will perform an initial GET request, parse the canonical link if present, and perform a HEAD request to it.

We use this mode during embeds to avoid treating URLs with different query parameters as different topics.
2021-10-01 12:48:21 -03:00
Rafael dos Santos Silva 2e0992c757
DEV: Allow TopicEmbed.import to optionally receive a list of tags (#14301)
This will be used by the rss-polling plugin
2021-09-13 17:01:59 -03:00
Rafael dos Santos Silva ef0471d7ae
DEV: Allow passing cook_method to TopicEmbed.import to override default (#14209)
DEV: Allow passing cook_method to TopicEmbed.import to override default

This will be used in the rss-polling plugin when we want to have
oneboxes on feed content, like youtube for example.
2021-09-01 15:46:39 -03:00
Rafael dos Santos Silva 560c13211a
DEV: Allow passing a category parameter when importing a topic (#14069)
This will be used in the rss pooling plugin to address the feature
request at https://meta.discourse.org/t/-/200644?u=falco
2021-08-17 18:17:07 -03:00
Josh Soref 59097b207f
DEV: Correct typos and spelling mistakes (#12812)
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
Joffrey JAFFEUX ed818a4a19
FIX: prevents malformed href to crash TopicEmbed (#12910)
If the associated page of a remote url passed to `TopicEmber.new(remote_url)` contained a malformed link like: `<a href="(http://foo.bar)">Baz</a>` it would raise an uncaught exception:

```
Job exception: Invalid scheme format: (http
```
2021-04-30 11:10:19 +02:00
Daniel Waterworth 27386ba714
FIX: Kernel.open is deprecated (#12387) 2021-03-12 12:42:32 -06:00
Simon Cossar 3d9eb3e085
Rename options passed to Readability::Document back to whitelist and blacklist (#10340) 2020-07-30 12:56:48 -07:00
Krzysztof Kotlarek e0d9232259
FIX: use allowlist and blocklist terminology (#10209)
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
Roman Rizzi a41476800b
FIX: Don't raise an exception if a topic cannot be retrieved (#9906) 2020-05-28 11:59:20 -03:00
Blake Erickson da839e6d26 SECURITY: Use FinalDestination for topic embeds 2020-05-27 09:26:09 -06:00
Michael Brown d9a02d1336
Revert "Revert "Merge branch 'master' of https://github.com/discourse/discourse""
This reverts commit 20780a1eee.

* SECURITY: re-adds accidentally reverted commit:
  03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
  instead of the 03d26cd6 parent (which contains security fixes)
2020-05-23 00:56:13 -04:00
Jeff Atwood 20780a1eee Revert "Merge branch 'master' of https://github.com/discourse/discourse"
This reverts commit e62a85cf6f, reversing
changes made to 2660c2e21d.
2020-05-22 20:25:56 -07:00
Blake Erickson 03d26cd6f0 SECURITY: ensure embed_url contains valid http(s) uri 2020-05-22 14:54:56 -06:00
Krzysztof Kotlarek 9bff0882c3
FEATURE: Nokogumbo (#9577)
* FEATURE: Nokogumbo

Use Nokogumbo HTML parser.
2020-05-05 13:46:57 +10:00
Robin Ward 98729c8e6e FIX: Allow embed updates of just the title 2020-04-20 14:31:24 -04:00
Robin Ward 56a23c68f1 FIX: Embedded topics couldn't update their titles 2020-04-20 14:27:43 -04:00
Robin Ward b6b92a562c
FEATURE: New site setting `embed_unlisted` (#9391)
If enabled, posts imported to discourse via embeddings will default to
unlisted until they receive a reply.
2020-04-13 15:17:02 -04:00
romanrizzi 55b8620b43 FIX: TopicEmbed#absolutize_urls was trying to modify a frozen string 2020-03-25 12:57:54 -03:00
Penar Musaraj 99fd65328c FIX: Skip absolutizing URLs when source URI is invalid 2020-02-07 10:54:24 -05:00
Sam Saffron 0fb497eb23 DEV: use Discourse.cache over Rails.cache
Discourse.cache is a more consistent method to use and offers clean fallback
if you are skipping redis

This is part of a larger change that both optimizes Discoruse.cache and omits
use of setex on $redis in favor of consistently using discourse cache

Bench does reveal that use of Rails.cache and Discourse.cache is 1.25x slower
than redis.setex / get so a re-implementation will follow prior to porting
2019-11-27 12:36:19 +11:00
Daniel Waterworth 55a1394342 DEV: pluck_first
Doing .pluck(:column).first is a very common pattern in Discourse and in
most cases, a limit cause isn't being added. Instead of adding a limit
clause to all these callsites, this commit adds two new methods to
ActiveRecord::Relation:

pluck_first, equivalent to limit(1).pluck(*columns).first

and pluck_first! which, like other finder methods, raises an exception
when no record is found
2019-10-21 12:08:20 +01:00
Krzysztof Kotlarek 427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
romanrizzi d01c938e1a Revert "FIX: Use #dup instead of #+@ since content could be an instance of Nokogiri::XML::Element."
This reverts commit 50afe59306.
2019-08-09 11:35:22 -03:00
romanrizzi 50afe59306 FIX: Use #dup instead of #+@ since content could be an instance of Nokogiri::XML::Element. 2019-08-09 11:13:09 -03:00
Sam Saffron 2408d55551 FIX: embedding topics would fail with some HTML
When truncating content we try to search for first paragraph, if HTML had
no P it would fallback to first div which may have nested elements.
2019-08-07 12:45:55 +10:00
Kyle Zhao 0e1d6151b9 FIX: Frozen string error in `TopicEmbed.import` (#7938)
When `SiteSetting.embed_truncate` is enabled (by default), the truncated
string is mutatable and does not raise an error.

However, when the setting is disabled, the `contents` string is frozen
and immutable, and will raise a `FrozenError`.
2019-07-25 09:21:01 -04:00