Considering document length in search introduced too much variance in
our search results such that it makes certain searches better but at the
same time made certain searches worst. Instead, we want to have a more
determistic way of ranking search so that it is easier to reason about
why a post is rank higher in search than another.
The long term plan to tackle repeated terms is to restrict the number of
positions for a given lexeme in our search index.
```
discourse_development=# SELECT alias, lexemes FROM TS_DEBUG('www.discourse.org');
alias | lexemes
-------+---------------------
host | {www.discourse.org}
discourse_development=# SELECT TO_TSVECTOR('www.discourse.org');
to_tsvector
-----------------------
'www.discourse.org':1
```
Given the above lexeme, we will inject additional lexeme by splitting
the host on `.`. The actual tsvector stored will look something like
```
tsvector
---------------------------------------
'discourse':1 'discourse.org':1 'org':1 'www':1 'www.discourse.org':1
```
`Nokogiri::HTML.fragment` is a huge hack (a comment in the source code
admits this). The current behavior of `Email::Styles` is to try to
emulate `fragment` using nokogumbo, but it misses some edge cases. In
particular, meta tags in a email template don't make it through to the
final email.
Instead of treating the provided HTML as an indeterminate fragment, this
commit makes `Email::Styles` treat the HTML as a complete document. This
means that the generated HTML for an email will now always contain top
level structure (a doctype, html, head and body tags).
This new behavior is behind a hidden site setting for now and defaults
off.
Previously we would include this section, unfortunately
1. It is usually elided in gmail
2. It can make the emails longer and more confusing
3. Omission is a feature, it means people need to visit site to get context
There is a feature in search where we take over from the tokenizer
in postgres and attempt to inject more words into search.
So for example: sam.i.am will inject the words i and am.
This is not ideal cause there are many edge cases and this can
cause extreme index bloat.
This is an opening move commit to make it configurable, over the
next few weeks we will evaluate and decide if we disable this by
default or simply remove.
Adds new hidden site settings for rate limits:
30 for logged in users, 15 for anon
Adds an anon cache for searching, caches results of searches for 1 minute
* FEATURE: notify admins about old credentials
Security and API keys should be renewed periodically.
This additional notification should help admins keep their Discourse safe and secure.
* FEATURE: notify admins about old credentials
Security and API keys should be renewed periodically.
This additional notification should help admins keep their Discourse safe and secure.
This reverts commit 20780a1eee.
* SECURITY: re-adds accidentally reverted commit:
03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
instead of the 03d26cd6 parent (which contains security fixes)
Adds a new topic_excerpt_maxlength site setting.
* When topic excerpt is requested for a post, use the new topic_excerpt_maxlength site setting to limit the size of the excerpt
* Remove code for getting/setting Post.excerpt_size as it is not used anywhere
Some providers don't implement the Expect: 100-continue support,
which results in a mismatch in the object signature.
With this settings, users can disable the header and use such providers.
* Rename all instances of bookmarkWithReminder and bookmark_with_reminder to just bookmark
* Delete old bookmark code at the same time
* Add migration to remove the bookmarkWithReminder post menu item if people have it set in site settings
This adds a site setting (default off) to optionally show a user's local time and timezone in their user card. For example, I live in Brisbane, and if at 3:30PM my time I were to open a user who lives in California's card I would see 22:30 (PST).
This reverts commit 6f9177e2ed.
We decided on a completely different approach to the problem.
Instead we will let blocked emails be treated as canonical.
The main thrust of this PR is to take all the conditional checks based on the `enable_bookmarks_with_reminders` away and only keep the code from the `true` path, making bookmarks with reminders the core bookmarks feature. There is also a migration to create `Bookmark` records out of `PostAction` bookmarks for a site.
### Summary
* Remove logic based on whether enable_bookmarks_with_reminders is true. This site setting is now obsolete, the old bookmark functionality is being removed. Retain the setting and set the value to `true` in a migration.
* Use the code from the rake task to create a database migration that creates bookmarks from post actions.
* Change the bookmark report to read from the new table.
* Get rid of old endpoints for bookmarks
* Link to the new bookmarks list from the user summary page
The new `enforce_canonical_emails` site setting ensures that emails in the
canonical form are unique.
This mean that if `s.a.m+1@gmail.com` is registered `sam@gmail.com` will
not be allowed.
The commit contains a blanket "tag strip" (stripping everything after +)
it also contains special handling of a "dot strip" for googlemail and gmail.
The setting only impacts new registrations after `enforce_canonical_emails`
The setting is default false so it will not impact any existing installs.
If the feature is enabled, staff members can construct a URL and publish a
topic for others to browse without the regular Discourse chrome.
This is useful if you want to use Discourse like a CMS and publish
topics as articles, which can then be embedded into other systems.
* FEATURE: add setting `auto_approve_email_domains` to auto approve users
This commit adds a new site setting `auto_approve_email_domains` to
auto approve users based on their email address domain.
Note that if a domain already exists in `email_domains_whitelist` then
`auto_approve_email_domains` needs to be duplicated there as well,
since users won’t be able to register with email address that is
not allowed in `email_domains_whitelist`.
* Update config/locales/server.en.yml
Co-Authored-By: Robin Ward <robin.ward@gmail.com>
On some sites when bootstrapping communities it is helpful to bootstrap
with a "light weight" invite code.
Use the site setting `invite_code` to set a global invite code.
In this case the administrator can share the code with
a community which is very easy to remember and then anyone who has
that code can easily register accounts.
People without the invite code are not allowed account registration.
Global invite codes are less secure than indevidual codes, in that they
tend to leak in the community however in some cases when starting a brand
new community the security guarantees of invites are not needed.
Add enable_bookmark_at_desktop_reminders site setting default to false a new hidden site setting to hide the "At Desktop" reminder option so we can restrict this further until it is polished.