Usually, when an email is received a user lookup is performed using the
email address found in the `From` header. When an email has an
`X-Original-From` header, if it is equal to `Reply-To` then it uses that
one instead. The comparison was sensitive to whitespaces and other
insignificant characters such as quotes because it reconstructed the
`From` header.
For the fixture added in this commit, it compared the reconstructed
`From` header `John Doe <johndoe@example.com>` with the `Reply-To`
header `"John Doe" <johndoe@example.com>`.
* DEV: Remove HTML setting type and sanitization logic.
We concluded that we don't want settings to contain HTML, so I'm removing the setting type and sanitization logic. Additionally, we no longer allow the global-notice text to contain HTML.
I searched for usages of this setting type in the `all-the-plugins` repo and found none, so I haven't added a migration for existing settings.
* Mark Global notices containing links as HTML Safe.
FinalDestination now supports the `follow_canonical` option, which will perform an initial GET request, parse the canonical link if present, and perform a HEAD request to it.
We use this mode during embeds to avoid treating URLs with different query parameters as different topics.
This commit resolves refactors can_invite_to? to use
can_invite_to_forum? for checking the site-wide permissions and then
perform topic specific checkups.
Similarly, can_invite_to? is always used with a topic object and this is
now enforced.
There was another problem before when `must_approve_users` site setting
was not checked when inviting users to forum, but was checked when
inviting to a topic.
Another minor security issue was that group owners could invite to
group topics even if they did not have the minimum trust level to do
it.
* PERF: Improve database query perf when loading topics for a category.
Instead of left joining the `topics` table against `categories` by filtering with `categories.id`,
we can improve the query plan by filtering against `topics.category_id`
first before joining which helps to reduce the number of rows in the
topics table that has to be joined against the other tables and also
make better use of our existing index.
The following is a before and after of the query plan for a category
with many subcategories.
Before:
```
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
Limit (cost=1.28..747.09 rows=30 width=12) (actual time=85.502..2453.727 rows=30 loops=1)
-> Nested Loop Left Join (cost=1.28..566518.36 rows=22788 width=12) (actual time=85.501..2453.722 rows=30 loops=1)
Join Filter: (category_users.category_id = topics.category_id)
Filter: ((topics.category_id = 11) OR (COALESCE(category_users.notification_level, 1) <> 0) OR (tu.notification_level > 1))
-> Nested Loop Left Join (cost=1.00..566001.58 rows=22866 width=20) (actual time=85.494..2453.702 rows=30 loops=1)
Filter: ((COALESCE(tu.notification_level, 1) > 0) AND ((topics.category_id <> 11) OR (topics.pinned_at IS NULL) OR ((t
opics.pinned_at <= tu.cleared_pinned_at) AND (tu.cleared_pinned_at IS NOT NULL))))
Rows Removed by Filter: 1
-> Nested Loop (cost=0.57..528561.75 rows=68606 width=24) (actual time=85.472..2453.562 rows=31 loops=1)
Join Filter: ((topics.category_id = categories.id) AND ((categories.topic_id <> topics.id) OR (categories.id = 1
1)))
Rows Removed by Join Filter: 13938306
-> Index Scan using index_topics_on_bumped_at on topics (cost=0.42..100480.05 rows=715549 width=24) (actual ti
me=0.010..633.015 rows=464623 loops=1)
Filter: ((deleted_at IS NULL) AND ((archetype)::text <> 'private_message'::text))
Rows Removed by Filter: 105321
-> Materialize (cost=0.14..36.04 rows=30 width=8) (actual time=0.000..0.002 rows=30 loops=464623)
-> Index Scan using categories_pkey on categories (cost=0.14..35.89 rows=30 width=8) (actual time=0.006.
.0.040 rows=30 loops=1)
Index Cond: (id = ANY ('{11,53,57,55,54,56,112,94,107,115,116,117,97,95,102,103,101,105,99,114,106,1
13,104,98,100,96,108,109,110,111}'::integer[]))
-> Index Scan using index_topic_users_on_topic_id_and_user_id on topic_users tu (cost=0.43..0.53 rows=1 width=16) (a
ctual time=0.004..0.004 rows=0 loops=31)
Index Cond: ((topic_id = topics.id) AND (user_id = 1103877))
-> Materialize (cost=0.28..2.30 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=30)
-> Index Scan using index_category_users_on_user_id_and_last_seen_at on category_users (cost=0.28..2.29 rows=1 width
=8) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (user_id = 1103877)
Planning Time: 1.359 ms
Execution Time: 2453.765 ms
(23 rows)
```
After:
```
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.28..438.55 rows=30 width=12) (actual time=38.297..657.215 rows=30 loops=1)
-> Nested Loop Left Join (cost=1.28..195944.68 rows=13443 width=12) (actual time=38.296..657.211 rows=30 loops=1)
Filter: ((categories.topic_id <> topics.id) OR (topics.category_id = 11))
Rows Removed by Filter: 29
-> Nested Loop Left Join (cost=1.13..193462.59 rows=13443 width=16) (actual time=38.289..657.092 rows=59 loops=1)
Join Filter: (category_users.category_id = topics.category_id)
Filter: ((topics.category_id = 11) OR (COALESCE(category_users.notification_level, 1) <> 0) OR (tu.notification_level > 1))
-> Nested Loop Left Join (cost=0.85..193156.79 rows=13489 width=20) (actual time=38.282..657.059 rows=59 loops=1)
Filter: ((COALESCE(tu.notification_level, 1) > 0) AND ((topics.category_id <> 11) OR (topics.pinned_at IS NULL) OR ((topics.pinned_at <= tu.cleared_pinned_at) AND (tu.cleared_pinned_at IS NOT NULL))))
Rows Removed by Filter: 1
-> Index Scan using index_topics_on_bumped_at on topics (cost=0.42..134521.06 rows=40470 width=24) (actual time=38.267..656.850 rows=60 loops=1)
Filter: ((deleted_at IS NULL) AND ((archetype)::text <> 'private_message'::text) AND (category_id = ANY ('{11,53,57,55,54,56,112,94,107,115,116,117,97,95,102,103,101,105,99,114,106,113,104,98,100,96,108,109,110,111}'::integer[])))
Rows Removed by Filter: 569895
-> Index Scan using index_topic_users_on_topic_id_and_user_id on topic_users tu (cost=0.43..1.43 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=60)
Index Cond: ((topic_id = topics.id) AND (user_id = 1103877))
-> Materialize (cost=0.28..2.30 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=59)
-> Index Scan using index_category_users_on_user_id_and_last_seen_at on category_users (cost=0.28..2.29 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (user_id = 1103877)
-> Index Scan using categories_pkey on categories (cost=0.14..0.17 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=59)
Index Cond: (id = topics.category_id)
Planning Time: 1.633 ms
Execution Time: 657.255 ms
(22 rows)
```
* PERF: Optimize index on topics bumped_at.
Replace `index_topics_on_bumped_at` index with a partial index on `Topic#bumped_at` filtered by archetype since there is already another index that covers private topics.
The file size error messages for max_image_size_kb and
max_attachment_size_kb are shown to the user in the KB
format, regardless of how large the limit is. Since we
are going to support uploading much larger files soon,
this KB-based limit soon becomes unfriendly to the end
user.
For example, if the max attachment size is set to 512000
KB, this is what the user sees:
> Sorry, the file you are trying to upload is too big (maximum
size is 512000KB)
This makes the user do math. In almost all file explorers that
a regular user would be familiar width, the file size is shown
in a format based on the maximum increment (e.g. KB, MB, GB).
This commit changes the behaviour to output a humanized file size
instead of the raw KB. For the above example, it would now say:
> Sorry, the file you are trying to upload is too big (maximum
size is 512 MB)
This humanization also handles decimals, e.g. 1536KB = 1.5 MB
Allows creating a bookmark with the `for_topic` flag introduced in d1d2298a4c set to true. This happens when clicking on the Bookmark button in the topic footer when no other posts are bookmarked. In a later PR, when clicking on these topic-level bookmarks the user will be taken to the last unread post in the topic, not the OP. Only the OP can have a topic level bookmark, and users can also make a post-level bookmark on the OP of the topic.
I had to do some pretty heavy refactors because most of the bookmark code in the JS topics controller was centred around instances of Post JS models, but the topic level bookmark is not centred around a post. Some refactors were just for readability as well.
Also removes some missed reminderType code from the purge in 41e19adb0d
The display name can have quotes around it, which does not work
with our current comparison of a from field (in this case Reply-To)
and another header (X-Original-From), because we are not comparing
the two values in the same way. This causes an issue where the
commit here: b88d8c8 will not
work properly; the forwarded email gets the From address instead
of the Reply-To address as intended.
We don't need no stinkin' denormalization! This commit ignores
the topic_id column on bookmarks, to be deleted at a later date.
We don't really need this column and it's better to rely on the
post.topic_id as the canonical topic_id for bookmarks, then we
don't need to remember to update both columns if the bookmarked
post moves to another topic.
When copying an existing upload stub temporary object
on S3 to its final destination we were not copying across
its additional headers such as content-disposition and
cache-control, which led to issues like attachments not
downloading with their original filename when clicking
the download links in posts.
This is because the metadata_directive = REPLACE option
was not being passed to object.copy_from(), so only the
source object's headers were being used. Added an option
for apply_metadata_to_destination to apply this option
conditionally, because we may not always want to replace
this metadata, but we definitely do when copying a temporary
upload.
When a user archives a personal message, they are redirected back to the
inbox and will refresh the list of the topics for the given filter.
Publishing an event to the user results in an incorrect incoming message
because the list of topics has already been refreshed.
This does mean that if a user has two tabs opened, the non-active tab
will not receive the incoming message but at this point we do not think
the technical trade-offs are worth it to support this feature. We
basically have to somehow exclude a client from an incoming message
which is not easy to do.
Follow-up to fc1fd1b416
Watched words of type 'replace' or 'link' replaced the text inside
mentions or hashtags too, which broke these. These types of watched
words must skip any match that has an @ or # before it.
When forwarding emails into the group inbox, we now use the
original sender email as the from_address since
2ac9fd9dff. However, we have not
been saving the original CC addresses of the forwarded email,
which are needed to include those recipients in on the conversation
when replying via the group inbox.
This commit captures the CC addresses on the incoming email, and
makes sure the emails are created as staged users and added to the
list of topic allowed users so they are included on CC's sent by
the GroupSmtpEmail and other jobs.
When 2ac9fd9dff was done, this
affected the small post that is created when forwarding an email
into the group inbox. Instead of using the name and the email of
the user who forwarded the email, it used the original from email
and name to create the small post. So instead of something like
"Discourse Team forwarded the above email" we ended up with
"John Smith forwarded the above email" which is incorrect.
This fixes the issue by creating a staged user for the forwarding
email address (if such a user does not yet exist) and uses that
for the "forwarded" small post instead.
The seed-fu gem resets the sequence on all the tables it touches. In some situations, this can cause primary keys to be re-used. This commit introduces a freedom patch which ensures seed-fu only touches the sequence when it is less than the id of one of the seeded records.
PresenceChannel aims to be a generic system for allow the server, and end-users, to track the number and identity of users performing a specific task on the site. For example, it might be used to track who is currently 'replying' to a specific topic, editing a specific wiki post, etc.
A few key pieces of information about the system:
- PresenceChannels are identified by a name of the format `/prefix/blah`, where `prefix` has been configured by some core/plugin implementation, and `blah` can be any string the implementation wants to use.
- Presence is a boolean thing - each user is either present, or not present. If a user has multiple clients 'present' in a channel, they will be deduplicated so that the user is only counted once
- Developers can configure the existence and configuration of channels 'just in time' using a callback. The result of this is cached for 2 minutes.
- Configuration of a channel can specify permissions in a similar way to MessageBus (public boolean, a list of allowed_user_ids, and a list of allowed_group_ids). A channel can also be placed in 'count_only' mode, where the identity of present users is not revealed to end-users.
- The backend implementation uses redis lua scripts, and is designed to scale well. In the future, hard limits may be introduced on the maximum number of users that can be present in a channel.
- Clients can enter/leave at will. If a client has not marked itself 'present' in the last 60 seconds, they will automatically 'leave' the channel. The JS implementation takes care of this regular check-in.
- On the client-side, PresenceChannel instances can be fetched from the `presence` ember service. Each PresenceChannel can be used entered/left/subscribed/unsubscribed, and the service will automatically deduplicate information before interacting with the server.
- When a client joins a PresenceChannel, the JS implementation will automatically make a GET request for the current channel state. To avoid this, the channel state can be serialized into one of your existing endpoints, and then passed to the `subscribe` method on the channel.
- The PresenceChannel JS object is an ember object. The `users` and `count` property can be used directly in ember templates, and in computed properties.
- It is important to make sure that you `unsubscribe()` and `leave()` any PresenceChannel objects after use
An example implementation may look something like this. On the server:
```ruby
register_presence_channel_prefix("site") do |channel|
next nil unless channel == "/site/online"
PresenceChannel::Config.new(public: true)
end
```
And on the client, a component could be implemented like this:
```javascript
import Component from "@ember/component";
import { inject as service } from "@ember/service";
export default Component.extend({
presence: service(),
init() {
this._super(...arguments);
this.set("presenceChannel", this.presence.getChannel("/site/online"));
},
didInsertElement() {
this.presenceChannel.enter();
this.presenceChannel.subscribe();
},
willDestroyElement() {
this.presenceChannel.leave();
this.presenceChannel.unsubscribe();
},
});
```
With this template:
```handlebars
Online: {{presenceChannel.count}}
<ul>
{{#each presenceChannel.users as |user|}}
<li>{{avatar user imageSize="tiny"}} {{user.username}}</li>
{{/each}}
</ul>
```
This is unnecessary, as when the temporary key is created
in S3Store we already include the s3_bucket_folder_path, and
the key will always start with temp/ to assist with lifecycle
rules for multipart uploads.
This was affecting Discourse.store.object_from_path,
Discourse.store.signed_url_for_path, and possibly others.
See also: e0102a5
This pull request introduces the endpoints required, and the JavaScript functionality in the `ComposerUppyUpload` mixin, for direct S3 multipart uploads. There are four new endpoints in the uploads controller:
* `create-multipart.json` - Creates the multipart upload in S3 along with an `ExternalUploadStub` record, storing information about the file in the same way as `generate-presigned-put.json` does for regular direct S3 uploads
* `batch-presign-multipart-parts.json` - Takes a list of part numbers and the unique identifier for an `ExternalUploadStub` record, and generates the presigned URLs for those parts if the multipart upload still exists and if the user has permission to access that upload
* `complete-multipart.json` - Completes the multipart upload in S3. Needs the full list of part numbers and their associated ETags which are returned when the part is uploaded to the presigned URL above. Only works if the user has permission to access the associated `ExternalUploadStub` record and the multipart upload still exists.
After we confirm the upload is complete in S3, we go through the regular `UploadCreator` flow, the same as `complete-external-upload.json`, and promote the temporary upload S3 into a full `Upload` record, moving it to its final destination.
* `abort-multipart.json` - Aborts the multipart upload on S3 and destroys the `ExternalUploadStub` record if the user has permission to access that upload.
Also added are a few new columns to `ExternalUploadStub`:
* multipart - Whether or not this is a multipart upload
* external_upload_identifier - The "upload ID" for an S3 multipart upload
* filesize - The size of the file when the `create-multipart.json` or `generate-presigned-put.json` is called. This is used for validation.
When the user completes a direct S3 upload, either regular or multipart, we take the `filesize` that was captured when the `ExternalUploadStub` was first created and compare it with the final `Content-Length` size of the file where it is stored in S3. Then, if the two do not match, we throw an error, delete the file on S3, and ban the user from uploading files for N (default 5) minutes. This would only happen if the user uploads a different file than what they first specified, or in the case of multipart uploads uploaded larger chunks than needed. This is done to prevent abuse of S3 storage by bad actors.
Also included in this PR is an update to vendor/uppy.js. This has been built locally from the latest uppy source at d613b849a6. This must be done so that I can get my multipart upload changes into Discourse. When the Uppy team cuts a proper release, we can bump the package.json versions instead.
When emails were forwarded to a group inbox by the email address
of the group, for example when an email ends up in spam and must
be manually forwarded to the group+site@discoursemail.com address,
the OP of the topic ended up being the group's email address instead
of the sender who originally sent the email to the group inbox.
This commit detects that an email has been forwarded using existing
tools, and if the from address matches one of the group incoming
email addresses, then we look at the forwarded email's from address
and use that instead for the incoming email from address as well as
the staged/regular user used for the Topic.user.
This will make it much cleaner to forward emails into a group inbox,
and will prevent issues with PostAlerter where the OP is double-notified
for these emails.
Currently, pinned topics are ordered by the `bumped_at` column. This behavior is not desired because it gives admins no control over the order of pinned topics. This PR makes pinned topics ordered by the `pinned_at` column. A topic that is pinned last appears first in topic lists. If an admin wants an already pinned topic to appear first in the list of pinned topics, they'll have to unpin that topic and pin it again.
Meta topic: https://meta.discourse.org/t/how-do-i-set-the-order-of-pinned-topics/16935/23?u=osama.
Emails can include the marker in a different language, depending on
site and user settings. The email receiver always looked for the marker
in default language.
Allow admins to configure exceptions to our Rails rate limiter.
Configuration happens in the environment variables, and work with both
IPs and CIDR blocks.
Example:
```
env:
DISCOURSE_MAX_REQS_PER_IP_EXCEPTIONS: >-
14.15.16.32/27
216.148.1.2
```
In 2018 check was added that TL1 welcome message is sent unless user already has BasicBadge granted.
I think we should also check if BasicBadge is even enabled. Otherwise, each time group is assigned to a user and trust level is recalculated, they will receive a welcome message.
This disallows putting URLs in topic titles for TL0 users, which means that:
If a TL-0 user puts a link into the title, a topic featured link won't be generated (as if it was disabled in the site settings)
Server methods for creating and updating topics will be refusing featured links when they are called by TL-0 users
TL-0 users won't be able to put any link into the topic title. For example, the title "Hey, take a look at https://my-site.com" will be rejected.
Also, it improves a bit server behavior when creating or updating feature links on topics in the categories with disabled featured links. Before the server just silently ignored a featured link field that was passed to him, now it will be returning 422 response.
* FIX: Update draft count when sequence is increased
Sometimes users ended up having a draft count higher than the actual
number of drafts.
* FIX: Do not update draft count twice
The call to DraftSequence.next! above already does it.
Inlining secure images with the same name was not possible because they
were indexed by filename. If an email contained two files with the same
name, only the first image was used for both of them. The other file
was still attached to the email.
When the Reply-To header is present for incoming emails we
want to use it instead of the from address. This is usually the
case when forwarding an email via a mailing list into Discourse.
For now we are only using the Reply-To header if the email has
been forwarded via Google Groups, which is why we are checking the
X-Original-From header too. In future we may want to use the Reply-To
header in more cases.
* FEATURE: Onebox can match engines based on the content_type
`FinalDestination` now returns the `content_type` of a resolved URL.
`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.
`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.
This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.