Currently when generating a onebox for Discourse topics, some important
context is missing such as categories and tags.
This patch addresses this issue by introducing a new onebox engine
dedicated to display this information when available. Indeed to get this
new information, categories and tags are exposed in the topic metadata
as opengraph tags.
* DEV: Fix png optimization test flakyness
Update fixture with oxipng 7
This test broke when the pngoptimizer got better so the pre-optimized png in the fixtures was compressed further on upload creation, breaking the expected size.
Previously we would unconditionally fetch all images via HTTP to grab
original sizing from cooked post processor in 2 different spots.
This was wasteful as we already calculate and cache this info in upload records.
This also simplifies some specs and reduces use of mocks.
Raw paths like `/test/path` are not supported natively in the CSP. This commit prepends the site's base URL to these paths. This allows plugins to add 'local' assets to the CSP without needing to hardcode the site's hostname.
* FEATURE: Add case-sensitivity flag to watched_words
Currently, all watched words are matched case-insensitively. This flag
allows a watched word to be flagged for case-sensitive matching.
To allow allow for backwards compatibility the flag is set to false by
default.
* FEATURE: Support case-sensitive creation of Watched Words via API
Extend admin creation and upload of Watched Words to support case
sensitive flag. This lays the ground work for supporting
case-insensitive matching of Watched Words.
Support for an extra column has also been introduced for the Watched
Words upload CSV file. The new column structure is as follows:
word,replacement,case_sentive
* FEATURE: Enable case-sensitive matching of Watched Words
WordWatcher's word_matcher_regexp now returns a list of regular
expressions instead of one case-insensitive regular expression.
With the ability to flag a Watched Word as case-sensitive, an action
can have words of both sensitivities.This makes the use of the global
Regexp::IGNORECASE flag added to all words problematic.
To get around platform limitations around the use of subexpression level
switches/flags, a list of regular expressions is returned instead, one for each
case sensitivity.
Word matching has also been updated to use this list of regular expressions
instead of one.
* FEATURE: Use case-sensitive regular expressions for Watched Words
Update Watched Words regular expressions matching and processing to handle
the extra metadata which comes along with the introduction of
case-sensitive Watched Words.
This allows case-sensitive Watched Words to matched as such.
* DEV: Simplify type casting of case-sensitive flag from uploads
Use builtin semantics instead of a custom method for converting
string case flags in uploaded Watched Words to boolean.
* UX: Add case-sensitivity details to Admin Watched Words UI
Update Watched Word form to include a toggle for case-sensitivity.
This also adds support for, case-sensitive testing and matching of Watched Word
in the admin UI.
* DEV: Code improvements from review feedback
- Extract watched word regex creation out to a utility function
- Make JS array presence check more explicit and readable
* DEV: Extract Watched Word regex creation to utility function
Clean-up work from review feedback. Reduce code duplication.
* DEV: Rename word_matcher_regexp to word_matcher_regexp_list
Since a list is returned now instead of a single regular expression,
change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate
this change.
* DEV: Incorporate WordWatcher updates from upstream
Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches
that aren't at the beginning of the line.
This happened only for languages other than "en" and when `I18n.t` was called without any interpolation keys. The lib still tried to interpolate keys because it interpreted the `overrides` option as interpolation key.
Some product pages on Amazon are using a new HTML structure, meaning the previous Onebox engine was unable to gather the price and/or description. This change should allow these pages to be Oneboxed.
1. `html_doc.css('.Box.md')` always returns a truthy value (e.g. `[]`) so the second branch of the if-elsif never ran
2. `node&.css('text()')` was invalid code that would raise an error
3. Matching on h3 elements is no longer correct with the current html structure returned by GitHub
When attempting to Onebox a page if there is no `meta property="og:description"` tag but there is a `meta name="description"` tag, Onebox should try to use that value.
Similar to site settings, adds support for `refresh` option to theme settings.
```yaml
super_feature_enabled:
type: bool
default: false
refresh: true
```
We are no longer able to display the image returned by Instagram directly within a Discourse site (either in the composer, or within a cooked post within a topic), so:
- Display an image placeholder in the composer preview
- A cooked post should use an iframe to display the Instagram 'embed' content
Usually, when an email is received a user lookup is performed using the
email address found in the `From` header. When an email has an
`X-Original-From` header, if it is equal to `Reply-To` then it uses that
one instead. The comparison was sensitive to whitespaces and other
insignificant characters such as quotes because it reconstructed the
`From` header.
For the fixture added in this commit, it compared the reconstructed
`From` header `John Doe <johndoe@example.com>` with the `Reply-To`
header `"John Doe" <johndoe@example.com>`.
The display name can have quotes around it, which does not work
with our current comparison of a from field (in this case Reply-To)
and another header (X-Original-From), because we are not comparing
the two values in the same way. This causes an issue where the
commit here: b88d8c8 will not
work properly; the forwarded email gets the From address instead
of the Reply-To address as intended.
When forwarding emails into the group inbox, we now use the
original sender email as the from_address since
2ac9fd9dff. However, we have not
been saving the original CC addresses of the forwarded email,
which are needed to include those recipients in on the conversation
when replying via the group inbox.
This commit captures the CC addresses on the incoming email, and
makes sure the emails are created as staged users and added to the
list of topic allowed users so they are included on CC's sent by
the GroupSmtpEmail and other jobs.
When 2ac9fd9dff was done, this
affected the small post that is created when forwarding an email
into the group inbox. Instead of using the name and the email of
the user who forwarded the email, it used the original from email
and name to create the small post. So instead of something like
"Discourse Team forwarded the above email" we ended up with
"John Smith forwarded the above email" which is incorrect.
This fixes the issue by creating a staged user for the forwarding
email address (if such a user does not yet exist) and uses that
for the "forwarded" small post instead.
When emails were forwarded to a group inbox by the email address
of the group, for example when an email ends up in spam and must
be manually forwarded to the group+site@discoursemail.com address,
the OP of the topic ended up being the group's email address instead
of the sender who originally sent the email to the group inbox.
This commit detects that an email has been forwarded using existing
tools, and if the from address matches one of the group incoming
email addresses, then we look at the forwarded email's from address
and use that instead for the incoming email from address as well as
the staged/regular user used for the Topic.user.
This will make it much cleaner to forward emails into a group inbox,
and will prevent issues with PostAlerter where the OP is double-notified
for these emails.
Emails can include the marker in a different language, depending on
site and user settings. The email receiver always looked for the marker
in default language.
When the Reply-To header is present for incoming emails we
want to use it instead of the from address. This is usually the
case when forwarding an email via a mailing list into Discourse.
For now we are only using the Reply-To header if the email has
been forwarded via Google Groups, which is why we are checking the
X-Original-From header too. In future we may want to use the Reply-To
header in more cases.
Fixes two issues:
- ignores invalid XML in custom icon sprite SVG file (and outputs an error if sprite was uploaded via admin UI)
- clears SVG sprite cache when deleting an `icons-sprite` upload in a theme
By default, Twitter will return the URL for the avatar image of the tweet poster as the `og:image` value.
However, if the `user_generated` attribute is true, we should not use this as the avatar URL as this will be an URL of an image in the tweet itself (e.g., an image belonging to a tweeted news story).
User flair was given by user's primary group. This PR separates the
two, adds a new field to the user model for flair group ID and users
can select their flair from user preferences now.
Previously, we were storing custom svg sprite paths in the cache. This is a problem because sprites in themes get stored as uploads, and the returned paths were files in the temporary download cache which could sometimes be cleaned up, resulting in a broken cache.
I previously tried to fix this by skipping the missing files and clearing the cache, but that didn't work out well with CDNs. This PR stores the contents of the files in the custom_svg_sprites cache to avoid the problem of missing temp files.
Also, plugin custom icons are only included if the plugin is enabled.
* FIX: Allow SVG uploads if dimensions are a fraction of a unit
`UploadCreator` counts the number of pixels in an file to determine if it is valid. `pixels` is calculated by multiplying the width and height of the image, as determined by FastImage.
SVG files can have their width/height expressed in a variety of different units of measurement. For example, ‘px’, ‘in’, ‘cm’, ‘mm’, ‘pt’, ‘pc’, etc are all valid within SVG files. If an image has a width of `0.5in`, FastImage may interpret this as being a width of `0`, meaning it will report the `size` as being `0`.
However, we don’t need to concern ourselves with the number of ‘pixels’ in a SVG files, as that is irrelevant for this file format, so we can skip over the check for `pixels == 0` when processing this file type.
* DEV: Speed up getting SVG dimensions
The `-ping` flag prevents the entire image from being rasterized before a result is returned. See:
https://imagemagick.org/script/command-line-options.php#ping
When replying to a user_private_message email originating from
a group PM that does _not_ have a reply key (e.g. when replying
directly to the group's SMTP address), we were mistakenly linking
the new post created from the reply to the OP and the user who
created the topic, based on the first IncomingEmail message ID in
the topic, rather than using the correct reply to user and post number
that the user actually replied to.
We now use the In-Reply-To header to look up the corresponding EmailLog
record when the user who replied was sent a user_private_message email,
and use the post from that as the reply_to_user/post.
This also removes superfluous filtering of incoming_email records. After
already filtering by message_id and then addressed_to_user (which only
returns incoming emails where the to, from, or cc address includes any
of the user's emails), we were filtering again but in the ruby code for
the exact same conditions. After removing this all existing tests still
pass.
IMDb movie links were being rendered as posters. This was because
IMDb was sending `og:type` as `image` randomly in some cases. To
fix this we'll now default all IMDb links as article type. This will
ensure that the IMDb onebox link includes all the information instead
of showing just a poster without any context.
This PR changes the `UserNotification` class to send outbound `user_private_message` using the group's SMTP settings, but only if:
* The first allowed_group on the topic has SMTP configured and enabled
* SiteSetting.enable_smtp is true
* The group does not have IMAP enabled, if this is enabled the `GroupSMTPMailer` handles things
The email is sent using the group's `email_username` as both the `from` and `reply-to` address, so when the user replies from their email it will go through the group's SMTP inbox, which needs to have email forwarding set up to send the message on to a location (such as a hosted site email address like meta@discoursemail.com) where it can be POSTed into discourse's handle_mail route.
Also includes a fix to `EmailReceiver#group_incoming_emails_regex` to include the `group.email_username` so the group does not get a staged user created and invited to the topic (which was a problem for IMAP), as well as updating `Group.find_by_email` to find using the `email_username` as well for inbound emails with that as the TO address.
#### Note
This is safe to merge without impacting anyone seriously. If people had SMTP enabled for a group they would have IMAP enabled too currently, and that is a very small amount of users because IMAP is an alpha product, and also because the UserNotification change has a guard to make sure it is not used if IMAP is enabled for the group. The existing IMAP tests work, and I tested this functionality by manually POSTing replies to the SMTP address into my local discourse.
There will probably be more work needed on this, but it needs to be tested further in a real hosted environment to continue.
* FIX: return an empty result if response from Amazon is missing attributes
Check we have the basic attributes requires to construct a Onebox for Amazon.
This is an attempt to handle scenarios where we receive a valid 200-status response from an Amazon request that does not include the data we’re expecting.
* Update lib/onebox/engine/amazon_onebox.rb
Co-authored-by: Régis Hanol <regis@hanol.fr>
Co-authored-by: Régis Hanol <regis@hanol.fr>
* FIX: Improve GitHub folder regexp in Onebox
It used to match any GitHub URL that was not matched by the other GitHub
Oneboxes and it did not do a good job at handling those. With this
change, the generic Onebox will handle the remaining URLs.
* FEATURE: Add Onebox for GitHub Actions
* FEATURE: Add Onebox for PR check runs
* FIX: Remove image from GitHub folder Oneboxes
It is a generic, auto-generated image which does not provide any value.
* DEV: Add tests
* FIX: Strip HTML comments from PR body
* Move onebox gem in core library
* Update template file path
* Remove warning for onebox gem caching
* Remove onebox version file
* Remove onebox gem
* Add sanitize gem
* Require onebox library in lazy-yt plugin
* Remove onebox web specific code
This code was used in standalone onebox Sinatra application
* Merge Discourse specific AllowlistedGenericOnebox engine in core
* Fix onebox engine filenames to match class name casing
* Move onebox specs from gem into core
* DEV: Rename `response` helper to `onebox_response`
Fixes a naming collision.
* Require rails_helper
* Don't use `before/after(:all)`
* Whitespace
* Remove fakeweb
* Remove poor unit tests
* DEV: Re-add fakeweb, plugins are using it
* Move onebox helpers
* Stub Instagram API
* FIX: Follow additional redirect status codes (#476)
Don’t throw errors if we encounter 303, 307 or 308 HTTP status codes in responses
* Remove an empty file
* DEV: Update the license file
Using the copy from https://choosealicense.com/licenses/gpl-2.0/#
Hopefully this will enable GitHub to show the license UI?
* DEV: Update embedded copyrights
* DEV: Add Onebox copyright notice
* DEV: Add MIT license, convert COPYRIGHT.txt to md
* DEV: Remove an incorrect copyright claim
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
Co-authored-by: jbrw <jamie@goatforce5.org>
Over the years we accrued many spelling mistakes in the code base.
This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change
- comments
- test descriptions
- other low risk areas
Some emails coming in via the mail receiver can still end up
with bad encoding when trying to enqueue the job. This catches
the last encoding issue and forces iso-8559-1 and encodes to
UTF-8 to circumvent the issue.