- Reduce duplication of terms in post index from unlimited to 6. This will
result in reduced index size and reduced weighting for posts containing
a huge amount of duplicate terms. (Eg: a post containing "sam sam sam sam
sam sam sam sam", will index as "sam sam sam sam sam sam", only including
the word up to 6 times.) This corrects a flaw where title weighting could
be ignored.
- Prioritize exact matches of words in titles. Our search always performs
a prefix match. However we want to give special weight to exact title matches
meaning that a search for "sum" will find topics such as "the sum of us" vs
"summer in spring".
- Pick up fixes to our search algorithm which are missing from old indexes.
Specifically pick up the fix that indexes URLs properly. (`https://happy.com`
was stemmed to `happi` in keywords and then was not searchable)
see also:
https://meta.discourse.org/t/refinements-to-search-being-tested-on-meta/254158
Indexing will take a while and work in batches, in the background.
* FIX: do not notify admins on suppressed categories
Avoid notifying admins on categories where they are not explicitly members
in cases where SiteSetting.suppress_secured_categories_from_admin is
enabled.
This helps keep notification stream clean and avoids admins mistakenly
being invited to discussions that should be suppressed
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.
There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
The algorithm failed to find the correct category by slug when there are multiple sub-sub-categories with the same child-category name and the first child doesn't have the correct grandchild.
So, searching for "child / grandchild" worked in the following case, it found (3):
- (1) parent 1
- (2) child
- (3) grandchild
- (4) parent 2
- (5) child
- (6) grandchild
But it failed to find the grandchild in the following case:
- (1) parent 1
- (2) child
- (4) parent 2
- (5) child
- (6) grandchild
And this also fixes a flaky spec by forcing categories to always order by by `parent_category_id` and `id`.
This makes it possible to partly revert 60990aab55
If a post contains domain with a word that stems to a non prefix single
words will not match it.
For example: in happy.com, `happy` stems to `happi`. Thus searches for happy
will not find URLs with it included.
This bloats the index a tiny bit, but impact is limited.
Will require a full reindex of search to take effect.
When we are done refining search we can consider a full version bump.
Previous regex did not allow for cases where a lexeme contains a : (colon)
This can happen when parsing URLs. New algorithm allows for this.
Test was amended to more clearly call out index problems
* FEATURE: allow restricting duplication in search index
This introduces the site setting `max_duplicate_search_index_terms`.
Using this number we limit the amount of duplication in our search index.
This allows us to more correctly weight title searches, so bloated posts
don't unfairly bump to the top of search results.
This feature is completely disabled by default and behind a site setting
We will experiment with it first. Note entire search index must be rebuilt
for it to take effect.
---------
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
Using a shared channel with per-message permissions means that every client is updated with the channel's 'last_id', even if there are no messages available to them. Per-user channel names avoid this problem - the last_id will only be incremented when there is a message for the given user.
There was an issue where if hashtag-cooked HTML was sent
to the ExcerptParser without the keep_svg option, we would
end up with empty </use> and </svg> tags on the parts of the
excerpt where the hashtag was, in this case when a post
push notification was sent.
Fixed this, and also added a way to only display a plaintext
version of the hashtag for cases like this via PrettyText#excerpt.
Currently, `Tag#topic_count` is a count of all regular topics regardless of whether the topic is in a read restricted category or not. As a result, any users can technically poll a sensitive tag to determine if a new topic is created in a category which the user has not excess to. We classify this as a minor leak in sensitive information.
The following changes are introduced in this commit:
1. Introduce `Tag#public_topic_count` which only count topics which have been tagged with a given tag in public categories.
2. Rename `Tag#topic_count` to `Tag#staff_topic_count` which counts the same way as `Tag#topic_count`. In other words, it counts all topics tagged with a given tag regardless of the category the topic is in. The rename is also done so that we indicate that this column contains sensitive information.
3. Change all previous spots which relied on `Topic#topic_count` to rely on `Tag.topic_column_count(guardian)` which will return the right "topic count" column to use based on the current scope.
4. Introduce `SiteSetting.include_secure_categories_in_tag_counts` site setting to allow site administrators to always display the tag topics count using `Tag#staff_topic_count` instead.
* Firefox now finally returns PerformanceMeasure from performance.measure
* Some TODOs were really more NOTE or FIXME material or no longer relevant
* retain_hours is not needed in ExternalUploadsManager, it doesn't seem like anywhere in the UI sends this as a param for uploads
* https://github.com/discourse/discourse/pull/18413 was merged so we can remove JS test workaround for settings
Using a shared channel means that every user receives an update to the 'last_id' when *any* other user is logged out. If many users are being programmatically logged out at the same time, this can cause a very large number of message-bus polls.
This commit switches to use a user-specific channel, which means that each user has its own 'last id' which will only increment when they are logged out
We were using the `for_input: true` param when calling
DiscourseTagging, which is really meant for selecting tags
in the UI, which often need a parent tag selected first
before the child tags in tag group will show. We just
want to show all tags regardless of grouping in hashtag
search.`
We generally do not return muted child categories to the user
if they have muted the parent category, this commit respects that
rule for CategoryHashtagDataSource
* DEV: Skip push notifications for active online users
Currently, users with active push subscriptions get push notifications
regardless of their "presence" on the site.
This change introduces a `push_notification_time_window_mins`
site setting which is used in conjunction with a user's `last_seen_at` to
determine if push notifications should be sent. A user is considered to
be actively online if their `last_seen_at` is within `push_notification_time_window_mins`
minutes. `push_notification_time_window_mins` is set to 10 by default.
* DEV: Remove client param for push_notification_time_window_mins site setting
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
* UX: added fadeout + hashtag styling
UX: add full name to autocomplete
UX: autocomplete mentions styling
UX: emoji styling user status
UX: autocomplete emoji
* DEV: Move hashtag tag counts into new secondary_text prop
* FIX: Add is-online style to mention users via chat
UX: make is-online avatar styling globally available
* DEV: Fix specs
* DEV: Test fix
Co-authored-by: Martin Brennan <martin@discourse.org>
Follow up to a review in #18937, this commit changes the HashtagAutocompleteService to no longer use class variables to register hashtag data sources or types in context priority order. This is to address multisite concerns, where one site could e.g. have chat disabled and another might not. The filtered plugin registers I added will not be included if the plugin is disabled.
We were changing the user's user_option.bookmark_auto_delete_preference
to whatever they changed it to in the bookmark modal to use as default
for future bookmarks. However this was leading to a lot of confusion
since if you wanted to set it for one bookmark you had to remember to
change it back on the next one.
This commit removes that automatic functionality, and instead moves
the bookmark auto delete preference to User Preferences > Interface
in an explicit dropdown.
This commit adds a new notification that gets sent to admins when the site gets new features after an upgrade/deploy. Clicking on the notification takes the admin to the admin dashboard at `/admin` where they can see the new features under the "New Features" section.
Internal topic: t/87166.
This introduces another "section" of queries to the
hashtag autocomplete search, which returns results for
each type that start with the search term. So now results
will be in this order, and within these sections ordered
by the types in priority order:
1. Exact matches sorted by type
2. "starts with" sorted by type
3. Everything else sorted by type then name within type
When user is watching category or tag (watching or watching first post) notifications are moved to other tab.
To achieve that and distinguish between post create to directly watched topics and indirectly watched topics, new notification type called `watching_category_or_tag` was introduced.
* FIX: Use Category.secured(guardian) for hashtag datasource
Follow up to comments in #19219, changing the category
hashtag datasource to use Category.secured(guardian) instead
of Site.new(guardian).categories here since the latter does
more work for not much benefit, and the query time is the
same. Also eliminates some Hash -> Model back and forth
busywork. Add some more specs too.
* FIX: Server-side hashtag lookup cooking user loading
When we were using the PrettyText.options.currentUser
and parsing back and forth with JSON for the hashtag
lookups server-side, we had a bug where the user's
secure categories were not loaded since we never actually
loaded a User model from the database, only parsed it
from JSON.
This commit fixes the issue by instead using the
PretyText.options.userId and looking up the user directly
from the database when calling hashtag_lookup via the
PrettyText::Helpers code when cooking server-side. Added
the missing spec to check for this as well.
This commit allows us to type # in the UI and present autocomplete
results immediately with the following logic for the topic composer,
and reversed for the chat composer:
* Categories the user can access and has not muted sorted by `topic_count`
* Tags the user can access and has not muted sorted by `topic_count`
* Chat channels the user is a member of sorted by `messages_count`
So in effect, we allow searching for hashtags without a search term.
To do this we add a new `search_without_term` to each data source so
each one can define how it wants to handle this logic.
When looking up hashtags which were conflicting (e.g.
management::tag and management) where the user did
not have permission for one of them, we ended up returning
the one they did have permission to (e.g. the tag) twice
because of the way the lookup fallback code worked. This
fixes the issue, and another related one where the
::type was not added to the found item's .ref, and
so the hashtag replacement on the client was not working
correctly.
* FIX: Save only visible fields from the sidebar page
* FIX: Do not reset seen popups when set to false
If the option was unchecked, but it was not changed at all by the user
it was still sent to the server as a 'false' value which reset all seen
popups. This removes that behavior and resetting the list of seen popups
must be done using the "skip new user tips" button.
The centralization helps in reducing code duplication in our code base
and more importantly, centralizing logic for guardian checks into a
single spot.
This commit introduce a new API for registering callbacks, which we'll execute when a user gets destroyed, and the `delete_posts` opt is true. The chat plugin registers one callback and queues a job to destroy every message from that user in batches.
When searching for categories it is possible for
a child category to have a slug that matches the term
exactly, but will not be found by .lookup since we
don't return these categories unless the ref matches
parent:child.
Introduces a search_sort method to each hashtag data
source so they can provide their custom sort logic of
results, in category's case putting all matching slugs
to the top regardless of parent/child relationship
then sorting by text.
This changes the hashtag search to first do a lookup to find
results where the slug exactly matches the
search term. Now when we search for hashtags, the
exact matches will be found first and put at the top of
the results.
`ChatChannelFetcher` has also been modified here to allow
for more options for performance -- we do not need to
query DM channels for secured IDs when looking up or searching
channels for hashtags, since they should never show in
results there (they have no slugs). Nor do we need to include
the channel archive records.
Also changes the limit of hashtag results to 20 by default
with a hidden site setting, and makes it so the scroll for the
results is overflowed.
Adds the description as a title="" attribute on the hashtag
autocomplete search items for tags, categories, and channels.
These descriptions can be seen by the user since they are
able to see the results that are returned by the search via
Guardian checks.
The user attributes are not updated between clients and that is a
problem with user tips because the same user tip will be displayed
multiple times, once for every client.
The tag ordering was inconsistent, because we were not
passing the correct order option to DiscourseTagging.filter_allowed_tags.
The order would change based on the limit provided. Now,
we can have a consistent order which is term exact match -> topic count ->
name.
This commit fleshes out and adds functionality for the new `#hashtag` search and
lookup system, still hidden behind the `enable_experimental_hashtag_autocomplete`
feature flag.
**Serverside**
We have two plugin API registration methods that are used to define data sources
(`register_hashtag_data_source`) and hashtag result type priorities depending on
the context (`register_hashtag_type_in_context`). Reading the comments in plugin.rb
should make it clear what these are doing. Reading the `HashtagAutocompleteService`
in full will likely help a lot as well.
Each data source is responsible for providing its own **lookup** and **search**
method that returns hashtag results based on the arguments provided. For example,
the category hashtag data source has to take into account parent categories and
how they relate, and each data source has to define their own icon to use for the
hashtag, and so on.
The `Site` serializer has two new attributes that source data from `HashtagAutocompleteService`.
There is `hashtag_icons` that is just a simple array of all the different icons that
can be used for allowlisting in our markdown pipeline, and there is `hashtag_context_configurations`
that is used to store the type priority orders for each registered context.
When sending emails, we cannot render the SVG icons for hashtags, so
we need to change the HTML hashtags to the normal `#hashtag` text.
**Markdown**
The `hashtag-autocomplete.js` file is where I have added the new `hashtag-autocomplete`
markdown rule, and like all of our rules this is used to cook the raw text on both the clientside
and on the serverside using MiniRacer. Only on the server side do we actually reach out to
the database with the `hashtagLookup` function, on the clientside we just render a plainer
version of the hashtag HTML. Only in the composer preview do we do further lookups based
on this.
This rule is the first one (that I can find) that uses the `currentUser` based on a passed
in `user_id` for guardian checks in markdown rendering code. This is the `last_editor_id`
for both the post and chat message. In some cases we need to cook without a user present,
so the `Discourse.system_user` is used in this case.
**Chat Channels**
This also contains the changes required for chat so that chat channels can be used
as a data source for hashtag searches and lookups. This data source will only be
used when `enable_experimental_hashtag_autocomplete` is `true`, so we don't have
to worry about channel results suddenly turning up.
------
**Known Rough Edges**
- Onebox excerpts will not render the icon svg/use tags, I plan to address that in a follow up PR
- Selecting a hashtag + pressing the Quote button will result in weird behaviour, I plan to address that in a follow up PR
- Mixed hashtag contexts for hashtags without a type suffix will not work correctly, e.g. #ux which is both a category and a channel slug will resolve to a category when used inside a post or within a [chat] transcript in that post. Users can get around this manually by adding the correct suffix, for example ::channel. We may get to this at some point in future
- Icons will not show for the hashtags in emails since SVG support is so terrible in email (this is not likely to be resolved, but still noting for posterity)
- Additional refinements and review fixes wil
* FEATURE: API to update user's discourse connect external id
This adds a special handling of updates to DiscourseConnect external_id
in the general user update API endpoint.
Admins can create, update or delete a user SingleSignOn record using
PUT /u/:username.json
{
"external_ids": {
"discourse_connect": "new-external-id"
}
}
The problem was reported as a problem with changing theme in user preferences, after saving a new theme the previously set user status was disappearing (https://meta.discourse.org/t/user-status/240335/42). Turned out though that the problem was more wide, changing pretty much any setting in user preferences apart from user status itself led to clearing the status.
The previous sidebar default tags and categories implementation did not
allow for a user to configure their sidebar to have no categories or
tags. This commit changes how the defaults are applied. When a user is being created,
we create the SidebarSectionLink records based on the `default_sidebar_categories` and
`default_sidebar_tags` site settings. SidebarSectionLink records are
only created for categories and tags which the user has visibility on at
the point of user creation.
With this change, we're also adding the ability for admins to apply
changes to the `default_sidebar_categories` and `default_sidebar_tags`
site settings historically when changing their site setting. When a new
category/tag has been added to the default, the new category/tag will be
added to the sidebar for all users if the admin elects to apply the changes historically.
Like wise when a tag/category is removed, the tag/category will be
removed from the sidebar for all users if the admin elects to apply the
changes historically.
Internal Ref: /t/73500
Related to aeee7ed.
Before the change in aeee7ed, notifications for direct replies to your posts and notifications for replies in watched topics looked the same in the notifications menu -- they both used the arrow icon.
We decided in aeee7ed to distinguish them by changing "watched topics" notifications to use the bell icon because it was confusing for users who watch topics to see the same icon for direct replies and "watched topics". However, that change also means that non-power/new users who receive replies to topics _they create_ will get notifications with the bell icon because technically they're watching the topic, but the arrow icon is more appropriate for this case because we use it throughout the app to indicate "replies".
This commit adds a special-case so that if a user is watching a topic AND the topic is created by them, they receive notifications with the arrow icon (type `replied`) instead of the bell icon (type `posted`) for new posts in the topic.
Internal topic: t/79051.