Previously to_tsquery would split terms and join with &
In PG 14 terms are split and use <-> which means followed directly by.
In PG 13:
discourse_test=# SELECT to_tsquery('english', '''hello world''');
to_tsquery
---------------------
'hello' & 'world'
(1 row)
In PG 14:
discourse_test=# SELECT to_tsquery('english', '''hello world''');
to_tsquery
---------------------
'hello' <-> 'world'
(1 row)
Change is very unobtrosive, we simply amend our to_tsquery to behave like
it used to behave and make no use of the `<->` operator
More detail at: https://akorotkov.github.io/blog/2021/05/22/pg-14-query-parsing/
Note that plainto_tsquery used elsewhere in Discourse keeps the exact
same function.
This also corrects a faulty test that was passing by a fluke on older
version of PG
Currently, when doing `@mention` for users we have 0 tolerance for typos and misspellings.
With this patch, if a user search doesn't return enough results we go and use `pg_trgm` features to try and find more matches based on trigrams of usernames and names.
It also introduces GiST indexes on those fields in order to improve performance of this search, going from 130ms down to 15ms in my tests.
This is all gated in a feature flag and can be enabled by running `SiteSetting.user_search_similar_results = true` in the rails console.
Posts with self-mentions aren't updated with username updates. This happens
because mention `UserAction` entries aren't logged for self-mentions.
This change updates the lookup of `Post` and `PostRevision` with mentions to bypass
`UserAction` entries.
1. What is the problem here?
When a user's reviewables count changes, the changes are published via
MessageBus in a background Sidekiq job which means there is a delay before the
client receives the MessageBus message with the updated count. During
the time the reviewables count for a user has been updated and the time
when the client receives the MessageBus message with the updated count,
a user may view the reviewables list in the user menu. When that happens, the number of
reviewables in the list may be out of sync with the count shown.
2. What is the fix?
Going forward, the response for the `ReviewablesController#user_menu_list` action will include the user's reviewables count as
the `reviewables_count` attribute. This is then used by the client side
to update the user's reviewables count to ensure that the reviewables
list and count are kept in sync.
Previous regex did not allow for cases where a lexeme contains a : (colon)
This can happen when parsing URLs. New algorithm allows for this.
Test was amended to more clearly call out index problems
Due to the way templates work, the incorrect variable (user instead of item) was not causing any error, and just failing silently to display the avatar.
This commit is also providing a basic spec for completion of users and groups.
* DEV: Rnemae channel path to just c
Also swap the channel id and channel slug params to be consistent with core.
* linting
* channel_path
* Drop slugify helper and channel route without slug
* Request slug and route models through the channel model if possible
* DEV: Pass messageId as a dynamic segment instead of a query param
* Ensure change is backwards-compatible
* drop query param from oneboxes
* Correctly extract channelId from routes
* Better route organization using siblings for regular and near-message
* Ensures sessions are unique even when using parallelism
* prevents didReceiveAttrs to clear input mid test
* we disable animations in capybara so sometimes the message was barely showing
* adds wait
* ensures finished loading
* is it causing more harm than good?
* this check is slowing things for no reason
* actually target the button
* more resilient select chat message
* apply similar fix to bookmark
* fix
---------
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
We've had a couple of problems with the R2 gem where it generated a broken RTL CSS bundle that caused a badly broken layout when Discourse is used in an RTL language, see a3ce93b and 5926386. For this reason, we're replacing R2 with `rtlcss` that can handle modern CSS features better than R2 does.
`rltcss` is written in JS and available as an npm package. Calling the `rltcss` from rubyland is done via the `rtlcss_wrapper` gem which contains a distributable copy of the `rtlcss` package and loads/calls it with Mini Racer. See https://github.com/discourse/rtlcss_wrapper for more details.
Internal topic: t/76263.
`--d-hover` is calculated to be equivalent to primary-100 in light mode, or primary-low in dark mode
`--d-selected` is calculated to be equivalent to primary-low in light mode, or primary-100 in dark mode
`lib/color_math` is introduced to provide some utilities for making these calculations.
## Why do we need this change?
When loading the ember app, [MessageBus does not start polling immediately](f31f0b70f8/app/assets/javascripts/discourse/app/initializers/message-bus.js (L71-L81)) and instead waits for `document.readyState` to be `complete`. What this means is that if there are new messages being created while we have yet to start polling, those messages will not be received by the client.
With sidebar being the default navigation menu, the counts derived from `topic-tracking-state.js` on the client side is prominently displayed on every page. Therefore, we want to ensure that we are not dropping any messages on the channels that `topic-tracking-state.js` subscribes to.
## What does this change do?
This includes the `MessageBus.last_id`s for the MessageBus channels which `topic-tracking-state.js` subscribes to as part of the preloaded data when loading a page. The last ids are then used when we subscribe the MessageBus channels so that messages which are published before MessageBus starts polling will not be missed.
## Review Notes
1. See https://github.com/discourse/message_bus#client-support for documentation about subscribing from a given message id.
When we introduce new color scheme colors, they are not immediately persisted to the database for all color schemes. Previously, this meant that they would be unavailable in the admin UI for editing. The only way to work with the new colors was to create a new color scheme.
This commit updates the serializer so that all colors are serialized, even if they are not yet persisted to the database for the current scheme. This means that they now show up in the admin UI and can be edited.
* DEV: Change default bootstrap min users for private sites
Private sites should have a lower min users to escape bootstrap mode.
* reset back to 50 if site is changed to public, added some tests
* fix formatting
* Remove comment
* Move constant declaration
* Update config/initializers/014-track-setting-changes.rb
Shaving a bit of repetition
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
* Remove commented out code
* stree
---------
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
The new `prioritize_exact_search_match` can be used to force the search
algorithm to prioritize exact term matches in title when ranking results.
This is scoped narrowly to titles for cases such as a topic titled:
"organisation chart" and a search of "org chart".
If we scoped this wider, all discussion about "org chart" would float to
the top and leave a very common title de-prioritized.
This is a hidden site setting and it has some performance impact due
to double ranking.
That said, performance impact is somewhat mitigated cause ranking on
title alone is a very cheap operation.
Currently `Topic#pm_topic_count` is a count of all personal messages tagged for a given tag. As a result, any user with access to PM tags can poll a sensitive tag to determine if a new personal message has been created using that tag even if the user does not have access to the personal message. We classify this as a minor leak in sensitive information.
With this commit, `Topic#pm_topic_count` is hidden from users by default unless the `display_personal_messages_tag_counts` site setting is enabled.
* FEATURE: allow restricting duplication in search index
This introduces the site setting `max_duplicate_search_index_terms`.
Using this number we limit the amount of duplication in our search index.
This allows us to more correctly weight title searches, so bloated posts
don't unfairly bump to the top of search results.
This feature is completely disabled by default and behind a site setting
We will experiment with it first. Note entire search index must be rebuilt
for it to take effect.
---------
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
In some situations, these HTTP calls would cause some cache to warmup and send a `/distributed_hash` message-bus message. We can avoid tracking those by passing a specific channel name to `track_publish`.
Using a shared channel with per-message permissions means that every client is updated with the channel's 'last_id', even if there are no messages available to them. Per-user channel names avoid this problem - the last_id will only be incremented when there is a message for the given user.
This commits adds a database migration to limit the user status to 100
characters, limits the user status in the UI and makes sure that the
emoji is valid.
Follow up to commit b6f75e231c.
This simplifies the crawler-linkback-list to only be a point of reference to the actual DiscussionForumPosting objects.
See "Summary page": https://developers.google.com/search/docs/advanced/structured-data/carousel?hl=en#summary-page
> [It] defines an ItemList, where each ListItem has only three properties: @type (set to ListItem), position (the position in the list), and url (the URL of a page with full details about that item).
This replaces the position declared as `#123` with the more simple version `123`.
The property position may be of type Integer or Text. A value of type Integer, or more precise of type Text which simply casts to integer, is sufficient here.
See: https://schema.org/position
In category-view the topic-list already uses this notation for the position of topics:
`<meta itemprop="position" content="123">`
Many users seems surprised by prefix matching in search leading to
unexpected results.
Over the years we always would return results starting with a search term
and not expect exact matches.
Meaning a search for `abra` would find `abracadabra`
This introduces the Site Setting `enable_search_prefix_matching` which
defaults to true. (behavior unchanged)
We plan to experiment on select sites with exact matches to see if the
results are less surprising
When the `tags_listed_by_group` site setting is disable, we were seeing
the N+1 queries problem when multiple `CategoryTag` records are listed.
This commit fixes that by ensuring that we are not filtering through the
category `tags` association after the association has been eager loaded.
* FIX: Ensure soft-deleted topics can be deleted
The topic was not found during the deletion process because it was
deleted and `@post.topic` was nil.
* DEV: Use @topic instead of finding the topic every time
Under some situations, we would inadvertently return a public (unauthenticated) result to an authenticated API request. This commit adds the `Api-Key` header to our anonymous cache bypass logic.
This assertion was failing in internal builds. I can repro locally if I
set `foobarbaz` to be created after `quxbarbaz`.
For now, I think this complication in the test is unnecessary, hence this
removes the `quxbarbaz` case.
The check used to be necessary because we validated the referrer too and
this bypass was a workaround a bug that is present in some browsers that
do not send the correct referrer.
When creating a group membership request, there is no character
limit on the 'reason' field. This can be potentially be used by
an attacker to create enormous amount of data in the database.
Co-authored-by: Ted Johansson <ted@discourse.org>
Currently we don’t have an association between reviewables and posts.
This sometimes leads to inconsistencies in the DB as a post can have
been deleted but an associated reviewable is still present.
This patch addresses this issue simply by adding a new association to
the `Post` model and by using the `dependent: :destroy` option.
This is just cleaning up a TODO I had to add more specs
to this controller -- there are more thorough tests on the
actual HashtagService class and the type-specific hashtag
classes.
In dev/prod, these are absorbed by unicorn. Most commonly, they occur when a client interrupts a message-bus long-polling request.
Also reverts the EPIPE workaround introduced in 011c9b9973
There was an issue where if hashtag-cooked HTML was sent
to the ExcerptParser without the keep_svg option, we would
end up with empty </use> and </svg> tags on the parts of the
excerpt where the hashtag was, in this case when a post
push notification was sent.
Fixed this, and also added a way to only display a plaintext
version of the hashtag for cases like this via PrettyText#excerpt.
We've had the UploadReference table for some time now in core,
but it was added after ChatUpload was and chat was just never
moved over to this new system.
This commit changes all chat code dealing with uploads to create/
update/delete/query UploadReference records instead of ChatUpload
records for consistency. At a later date we will drop the ChatUpload
table, but for now keeping it for data backup.
The migration + post migration are the same, we need both in case
any chat uploads are added/removed during deploy.
When checking whether an existing upload should be secure
based on upload references, do not count deleted posts, since
there is still a reference attached to them. This can lead to
issues where e.g. an upload is used for a post then later on
a custom emoji.
This commit allows us to set the channel slug when creating new chat
channels. As well as this, it introduces a new `SlugsController` which can
generate a slug using `Slug.for` and a name string for input. We call this
after the user finishes typing the channel name (debounced) and fill in
the autogenerated slug in the background, and update the slug input
placeholder.
This autogenerated slug is used by default, but if the user writes anything
else in the input it will be used instead.
When EmbeddableHost is configured for a specific category and that category is deleted, then EmbeddableHost should be deleted as well.
In addition, migration was added to fix existing data.
Currently, `Tag#topic_count` is a count of all regular topics regardless of whether the topic is in a read restricted category or not. As a result, any users can technically poll a sensitive tag to determine if a new topic is created in a category which the user has not excess to. We classify this as a minor leak in sensitive information.
The following changes are introduced in this commit:
1. Introduce `Tag#public_topic_count` which only count topics which have been tagged with a given tag in public categories.
2. Rename `Tag#topic_count` to `Tag#staff_topic_count` which counts the same way as `Tag#topic_count`. In other words, it counts all topics tagged with a given tag regardless of the category the topic is in. The rename is also done so that we indicate that this column contains sensitive information.
3. Change all previous spots which relied on `Topic#topic_count` to rely on `Tag.topic_column_count(guardian)` which will return the right "topic count" column to use based on the current scope.
4. Introduce `SiteSetting.include_secure_categories_in_tag_counts` site setting to allow site administrators to always display the tag topics count using `Tag#staff_topic_count` instead.
This fixes a longstanding issue for sites with the
secure_uploads setting enabled. What would happen is a scenario
like this, since we did not check all places an upload could be
linked to whenever we used UploadSecurity to check whether an
upload should be secure:
* Upload is created and used for site setting, set to secure: false
since site setting uploads should not be secure. Let's say favicon
* Favicon for the site is used inside a post in a private category,
e.g. via a Onebox
* We changed the secure status for the upload to true, since it's been
used in a private category and we don't check if it's originator
was a public place
* The site favicon breaks :'(
This was a source of constant consternation. Now, when an upload is _not_
being created, and we are checking if an existing upload should be
secure, we now check to see what the first record in the UploadReference
table is for that upload. If it's something public like a site setting,
then we will never change the upload to `secure`.
Prior to this change, we were parsing `Post#cooked` every time we
serialize a post to extract the usernames of mentioned users in the
post. However, the only reason we have to do this is to support
displaying a user's status beside each mention in a post on the client side when
the `enable_user_status` site setting is enabled. When
`enable_user_status` is disabled, we should avoid having to parse
`Post#cooked` since there is no point in doing so.
Since the new hashtag format has been added, we want site
admins to be able to rebake old posts with the old hashtag
format. This can now be done with `rake hashtags:mark_old_format_for_rebake`
which goes and marks posts with the old cooked version of hashtags
in this format for rebake:
```
<a class=\"hashtag\" href=\"/c/ux/14\">#<span>ux</span></a>
```
c.f. https://meta.discourse.org/t/what-rebake-is-required-for-the-new-autocomplete-styling/249642/12
This commit fixes the following issue:
* User creates a post
* Akismet or some other thing like requiring posts to be approved puts
the post in the review queue, deleting it
* Admin approves the post
* Email is never sent to mailing list mode subscribers
We intentionally do not enqueue this for every single post when
recovering a topic (i.e. recovering the first post) since the topics
could have a lot of posts with emails already sent, and we don't want
to clog sidekiq with thousands of notify jobs.
Our JS files reference sourcemaps relative to their current path. On sites with non-S3 CDN setups, we use a special path for brotli assets (39a524aa). This caused the sourcemap requests to 404.
This commit fixes the issue by allowing the `.map` files to be accessed under `/brotli_asset/*`.
TL4 users can already list and unlist topics, but they can't see
the unlisted topics. This change brings this to par by allowing
TL4 users to also see unlisted topics.
When the `tags_listed_by_group` site setting is enabled, we were seeing
the N+1 queries problem when multiple `TagGroup` records are listed.
This commit fixes that by ensuring that we are not filtering through the
`tags` association after the association has been eager loaded.
So it can easily be overwritten in a plugin for example.
### Added more tests to provide better coverage
We previously only had `u.silenced_till IS NULL` but I made it consistent with pretty much every other places where we check for "active" users.
These two new lines do change the query a tiny bit though.
**Before**
- You could not get the badge if you were currently silenced (no matter what period is being checked)
- You could get the badge if you were suspended 😬
**After**
- You can't get the badge if you were silenced during the past year
- You can't get the badge if you were suspended during the past year
### Improved the performance of the query by using `NOT EXISTS` instead of `LEFT JOIN / COUNT() = 0`
There is no difference in behaviour between
```sql
LEFT JOIN user_badges AS ub ON ub.user_id = u.id AND ...
[...]
HAVING COUNT(ub.*) = 0
```
and
```sql
NOT EXISTS (SELECT 1 FROM user_badges AS ub WHERE ub.user_id = u.id AND ...)
```
The only difference is performance-wise. The `NOT EXISTS` is 10-30% faster on very large databases (aka. posts and users in X millions). I checked on 3 of the largest datasets I could find.
Added in c2013865d7,
this migration was supposed to only turn off the hashtag
setting for existing sites (since that was the old default)
but its doing it for new ones too because we run all migrations
on new sites.
Instead, we should only run this if the first migration was
only just created, meaning its a new site.
When the thread is aborted, an exception is raised before the `start` of a job is set, and therefore raises an exception in the `ensure` block. This commit checks that `start` exists, and also adds `abort_on_exception=true` so that this issue would have caused test failures.
The `enable_new_notifications_menu` site setting allows sites that have
`navigation_menu` set to `legacy` to use the redesigned notifications
menu before switching to the new sidebar navigation menu.
This patch introduces a cookies rotator as indicated in the Rails
upgrade guide. This allows to migrate from the old SHA1 digest to the
new SHA256 digest.
The latter can be called directly from the Topic page object,
so we can remove some duplication between the two. There are
levels of page objects (e.g. entire page, component, complete flow)
and its perfectly valid to call one from another.
We've been doing some work to support new keyword argument semantics in Ruby 3. As part of that we made some changes to `DiscourseEvent::TestHelper`. The backwards compatibility fix doesn't work if the method is called with an empty hash as the final argument. This fix adds a valid option to the final hash in the particular test.
In "GlobalSetting.redirect_avatar_requests" mode, when the application gets
an avatar request it returns a "redirect" to the S3 CDN.
This shields the application from caching avatars and downloading from S3.
However clients will make 2 requests per avatar. (one to get redirect,
second to get avatar)
A one hour cache on a redirect means there may be an increase in CDN
traffic, given more clients will ask for the redirect every hour.
This may also lead to an increase in origin requests to the application.
To mitigate lets cache the CDN URL for 1 day.
The downside is that any changes to S3 CDN need extra care to allow for
the extra 1 day delay. (leave data around for 1 extra day)
This will give us some aggregate stats on the defer queue performance.
It is limited to 100 entries (for safety) which is stored in an LRU cache.
Scheduler::Defer.stats can then be used to get an array that denotes:
- number of runs and completions (queued, finished)
- error count (errors)
- total duration (duration)
We can look later at exposing these metrics to gain visibility on the reason
the defer queue is clogged.
0403cda1d1 introduced a regression where
topics in non read-restricted categories have its TopicTrackingState
MessageBus messages published with the `group_ids: [nil]` option. This
essentially means that no one would be able to view the message.
Currently when generating a onebox for Discourse topics, some important
context is missing such as categories and tags.
This patch addresses this issue by introducing a new onebox engine
dedicated to display this information when available. Indeed to get this
new information, categories and tags are exposed in the topic metadata
as opengraph tags.
When a topic belongs to category that is read restricted but permission
has not been granted to any groups, publishing ceratin topic tracking state
updates for the topic will result in the `MessageBus::InvalidMessageTarget` error being raised
because we're passing `nil` to `group_ids` which is not support by
MessageBus.
This commit ensures that for said category above, we will publish the
updates to the admin groups.
```
class Jobs::DummyDelayedJob < Jobs::Base
def execute(args = {})
end
end
RSpec.describe "Jobs.run_immediately!" do
before { Jobs.run_immediately! }
it "explodes" do
current_user = Fabricate(:user)
Jobs.enqueue_in(1.seconds, :dummy_delayed_job)
sign_in(current_user)
end
end
```
The test above will fail with the following error if `ActiveRecord::Base.connection_handler.clear_active_connections!` is called before the configured Capybara server checks out a connection from the connection pool.
```
ActiveRecord::ActiveRecordError:
Cannot expire connection, it is owned by a different thread: #<Thread:0x00007f437391df58@puma srv tp 001 /home/tgxworld/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/gems/puma-6.0.2/lib/puma/thread_pool.rb:106 sleep_forever>. Current thread: #<Thread:0x00007f437d6cfc60 run>.
```
We're not exactly sure if this is an ActiveRecord bug or not but we've
invested too much time into investigating this problem. Fundamentally,
we also no longer understand why `ActiveRecord::Base.connection_handler.clear_active_connections!` is being called in an ensure block
within `Jobs::Base#perform` which was added in
ceddb6e0da 10 years ago. This
commit moves the logic for running jobs immediately out of the
`Jobs::Base#perform` method into another `Jobs::Base#perform_immediately` method such that
`ActiveRecord::Base.connection_handler.clear_active_connections!` is not
called. This change will only impact the test environment.