If the post ids keep loading, we might end up in a situations where
we're always loading the same post ids over and over again without
indexing anything new.
Follow up to daeda80ada.
The default ranking options ranks by the number of matches which is
highly problematic when posts are stuffed with a keyword. The ranking
will now be divided by the document length which is a much fairer way to
rank.
This commit fixes the follow quality issue with `PostSearchData#raw_data`:
1. URLs are being tokenized and links with similar href and characters
are being duplicated in the raw data.
`Post#cooked`:
```
<p><a href=\"https://meta.discourse.org/some.png\" class=\"onebox\" target=\"_blank\" rel=\"nofollow noopener\">https://meta.discourse.org/some.png</a></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png discourse org/some png https://meta.discourse.org/some.png discourse org/some png
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png meta discourse org
```
2. Ligthbox being included in search pollutes the
`PostSearchData#raw_data` unncessarily.
From 28 March 2018 to 28 March 2019, searches for the term `image` on
`meta.discourse.org` had a click through rate of 2.1%. Non-lightboxed images are not included in indexing for search yet we were indexing content within a lightbox. Also, search for terms like `image` was affected we were using `Pasted image` as the filename for
uploads that were pasted.
`Post#cooked`
```
<p>Let me see how I can fix this image<br>\n<div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://meta.discourse.org/some.png\" title=\"some.png\" rel=\"nofollow noopener\"><img src=\"https://meta.discourse.org/some.png\" width=\"275\" height=\"299\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#far-image\"></use></svg><span class=\"filename\">some.png</span><span class=\"informations\">1750×2000</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#discourse-expand\"></use></svg>\n</div></a></div></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image some.png png https://meta.discourse.org/some.png discourse org/some png some.png png 1750×2000
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image
```
In terms of indexing performance, we now have to parse the given HTML
through nokogiri twice. However performance is not a huge worry here since a string length of 194170 takes only 30ms
to scrub plus the indexing takes place in a background job.
Includes support for flags, reviewable users and queued posts, with REST API
backwards compatibility.
Co-Authored-By: romanrizzi <romanalejandro@gmail.com>
Co-Authored-By: jjaffeux <j.jaffeux@gmail.com>
Previously we would bypass touching `Topic.updated_at` for whispers and post
recovery / deletions.
This meant that certain types of caching can not be done where we rely on
this information for cache accuracy.
For example if we know we have zero unread topics as of yesterday and whisper
is made I need to bump this date so the cache remains accurate
This is only half of a larger change but provides the groundwork.
Confirmed none of our serializers leak out Topic.updated_at so this is safe
spot for this info
At the moment edits still do not change this but it is not relevant for the
unread cache.
This commit also cleans up some specs to use the new `eq_time` matcher for
millisecond fidelity comparison of times
Previously `freeze_time` would fudge this which is not that clean.
- The test_email job is removed, because it was always being run synchronously (not in sidekiq)
- 34b29f62 added a bypass for critical emails, to match the spec. This removes the bypass, and removes the spec.
- This adapts the specs for 72ffabf6, so that they check for emails being sent
- This reimplements c2797921, allowing test emails to be sent even when emails are disabled
Fixes two issues:
1. Redirecting to an external origin's path after login did not work
2. User would be erroneously redirected to the external origin after logout
https://meta.discourse.org/t/109755
* This is causing certain posts to appear in searches incorrectly as `PostSearchData#raw_data` contains the outdated title, category name and tag names.
* Remove use of 0 in favor of `TrustLevel.levels[:newuser]`.
* Consolidate two tests into a single one.
* Test that disabling the feature works.
* Avoid loading full ActiveRecord object in test when we only need to
know the existence of the record.
* improved emoji support
- always optimize images as part of the task
- use the unicode standard ordering/naming for sections
* UX: more height for when there are recently used
Migrates email user options to a new data structure, where `email_always`, `email_direct` and `email_private_messages` are replace by
* `email_messages_level`, with options: `always`, `only_when_away` and `never` (defaults to `always`)
* `email_level`, with options: `always`, `only_when_away` and `never` (defaults to `only_when_away`)
* First take
* Add support for sprites in themes
Automatically register any custom icons added via themes or plugins
* Fix theme sprite caching
* Simplify test
* Update lib/svg_sprite/svg_sprite.rb
Co-Authored-By: pmusaraj <pmusaraj@gmail.com>
* Fix /svg-sprite/search request
* FEATURE: Add `IgnoredUsersSummary` daily job
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8).
We want to:
1. Send an automatic group PM that goes out to moderators
2. When {x} users have Ignored the same user, threshold defined by a site setting, default of 5
3. Only send this message every X days which is defined by another site setting
It is not a setting, and only relevant in specs. The new API is:
```
Jobs.run_later! # jobs will be thrown on the queue
Jobs.run_immediately! # jobs will run right away, avoid the queue
```
If the existing email address for a user ends in `.invalid`, we should take the email address from an authentication payload, and replace the invalid address. This typically happens when we import users from a system without email addresses.
This commit also adds some extensibility so that plugin authenticators can define `always_update_user_email?`
Since uploads site settings are now backed by an actual upload, we don't
have to reach over the network just to fetch the favicon. Instead, we
can just read the upload directly from disk.
When using the api and you provide an http header based api key any other
auth based information (username, external_id, or user_id) passed in as
query params will not be used and vice versa.
Followup to f03b293e6a
* FEATURE: Add `Top Ignored Users` report
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8), and also part of [this PR](https://github.com/discourse/discourse/pull/7144).
We want to send a System Message daily when a specific count threshold for an ignored is reached. To make this system message informative, we want to link to a report for the Top Ignored Users too.
We can only be sure that an email is sent when we get a mailer in
`ActionMailer::Deliveries`. A couple of tests were actually incorrect
because it didn't flow through our email sender where there are more
conditions in determining whether an email is sent or not.
Previously if you wanted to have jobs execute in test mode, you'd have
to do `SiteSetting.queue_jobs = false`, because the opposite of queue
is to execute.
I found this very confusing, so I created a test helper called
`run_jobs_synchronously!` which is much more clear about what it does.
* FEATURE: Account for `ignored_users` when merging two users
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8).
When we merge two users, we need to account for merging their list of `ignored_users` too.
- Notices are visible only by poster and trust level 2+ users.
- Notices are not generated for non-human or staged users.
- Notices are deleted when post is deleted.
It seems that due to jobs being asynchronous and wrapping code in a
DistributedMutex that by the time we run the
`UserAvatar#update_gravatar!` job that the user/user email might be
destroyed.
This patch checks before a call to `user.email_hash` to make sure
the user and primary email exist to prevent the exception. If not
present, the job exits as there's nothing to do because we are
probably running after the user was destroyed for some reason.
Now you can also make authenticated API requests by passing the
`api_key` and `api_username` in the HTTP header instead of query params.
The new header values are: `Api-key` and `Api-Username`.
Here is an example in cURL:
``` text
curl -i -sS -X POST "http://127.0.0.1:3000/categories" \
-H "Content-Type: multipart/form-data;" \
-H "Api-Key: 7aa202bec1ff70563bc0a3d102feac0a7dd2af96b5b772a9feaf27485f9d31a2" \
-H "Api-Username: system" \
-F "name=7c1c0ed93583cba7124b745d1bd56b32" \
-F "color=49d9e9" \
-F "text_color=f0fcfd"
```
There is also support for `Api-User-Id` and `Api-User-External-Id`
instead of specifying the username along with the key.
If you reply to an email with the word "mute" a topic will be muted
If you reply to an email with the word "track" a topic will be tracked
If you reply to an email with the word "watch" a topic will be watched
These ninja command can help advanced mailing list ex-users, saves a trip
to the website
Mods require visibility to everyone group cause category dialogs need to
know about this.
If the site setting `allow moderators to create categories` will not function
without this
Note there is no security expansion of rights here, the group is technically
empty anyway and it always looks exactly the same on all discourse instances
This is a common pattern we see in tests. The `id` of the upload
is used to create the URL and we assume the `id` will always be
in a certain range which depends on the database.
Uses github.com/discourse/moment-timezone-names-translations to translate timezone names.
Plugins can also provide their own timezone name translations.
Previously with had `in:title` and `in:first` search shortcuts for
searching in first post or title only. They are a bit of handful to type.
This add 2 shortcuts (t and f) for searching titles of first posts.
This commit also cleans up all advanced filters, they were not properly
regex terminated allowing for weird clauses like `in:firstinator` acting
the same as `in:first`
Attempt to force NGINX to include content length when doing X-SendFile
This does not seem to be required when bypassing NGINX.
Without this header some CDNs may have issues caching
When a new post is triggered via message bus post stream will attempt to load
it, previously the `/topic/TOPIC_ID/posts.json` would unconditionally include
suggested topics, this caused excessive load on the server.
New pattern defaults to exclude suggested and related topics from this API
unless people explicitly ask for suggested.
When the S3 store was enabled, we were only applying the S3 CDN.
So all images stored locally, like the emojis, were never put on the local CDN.
Fixed a bunch of CookedPostProcessor test by adding a call to 'optimize_urls'
in order to get final URLs.
I also removed the unnecessary PrettyText.add_s3_cdn method since this is already
handled in the CookedPostProcessor.
Do not allow `/u/search/users.json` to list any group matches unless a
specific `term` is specified in the API call.
Adding groups should always be done when an actual search term exists,
blank search is only supported for users within a topic
Following this change when a user hits `@` and is replying to a topic they
will see usernames of people who were last seen and participated in the topic
This is somewhat experimental, we may tweak this, or make it optional.
Also, a regression in a423a938 where hitting TAB would eat a post you were writing:
Eg this would eat a post:
``` text
@hello, testing 123 <tab>
```
It was getting caught in a `DistributedMutex` deadlock (twice!), which
meant this test was taking 120s to run.
I'm not sure why queue jobs was turned off here, because when I turn it
on the test passes and takes <2s instead.
If a theme setting contained invalid SCSS, it would cause an error 500 on the site, with no way to recover. This commit stops loading theme settings in the core stylesheets, and instead only loads the color scheme variables. This change also makes `common/foundation/variables.scss` available to themes without an explicit import.
Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
Co-authored-by: David Taylor <david@taylorhq.com>
This gives more control over the request. In particular we can easily
lookup DNS dynamically, instead of only upon NGINX startup.
Previously, NGINX was looking up IP for the letter avatar service and
caching the CDN IP address, this caused issues if CDN changed IP, in
which letter avatars would be broken till a container restarted.
NGINX config has been updated to add caching. This change will require
a container rebuild.
The proxy will now function in development environments, so the patch
for `letter_avatar_proxy` has been removed.
We had a missing formats: string on our render partial that caused logs to
spam when CSS files got 404s.
Due to magic discourse_public_exceptions.rb was actually returning the
correct 404 cause it switched format when rendering the error.
This adjusts 53d592ad by @tgxworld
- Adds Sidekiq.upause_all! to unpause all sites
- Adds Sidekiq.paused_dbs to list dbs that are currently paused
- Handles some edge cases where unpause thread could extend expiry on
sites that were unpaused from a different process
- Ensures tests always terminates background thread used for pause
keepalive