Google insists on indexing pages so it can figure out if they
can be removed from the index.
see: https://support.google.com/webmasters/answer/6332384?hl=en
This change ensures the we have special behavior for Googlebot
where we allow indexing, but block the actual indexing via
X-Robots-Tag
Expand SiteSetting.allow_index_in_robots_txt so it also adds a
noindex header if set to false.
This makes sure that nothing is indexed even if it somehow reaches
Google.
1. Total 6 attempts per day per user
2. Total of 5 per unique email/login that is not found per hour
3. If an admin blocks an IP that IP can not request a reset
There were two constants here, `INLINE_ONEBOX_LOADING_CSS_CLASS` and
`INLINE_ONEBOX_CSS_CLASS` that were both longer than the strings they
were DRYing up: `inline-onebox-loading` and `inline-onebox`
I normally appreciate constants, but in this case it meant that we had
a lot of JS imports resulting in many more lines of code (and CPU cycles
spent figuring them out.)
It also meant we had an `.erb` file and had to invoke Ruby to create the
JS file, which meant the app was harder to port to Ember CLI.
I removed the constants. It's less DRY but faster and simpler, and
arguably the loss of DRYness is not significant as you can still search
for the `inline-onebox-loading` and `inline-onebox` strings easily if
you are refactoring.
We now show an options gear icon next to the bookmark name.
When expanded we show the "delete bookmark when reminder sent" option. The value of this checkbox is saved in local storage for the user.
If this is ticked, when a reminder is sent for the bookmark the bookmark itself is deleted. This is so people can use the reminder functionality by itself.
Also remove the blue alert reminder section from the "Edit Bookmark" modal as it just added clutter, because the user can already see they had a reminder set:
Adds a default false boolean column `delete_when_reminder_sent` to bookmarks.
Locale files get precompiled after deployment and they contained translations from the `default_locale`. That's especially bad in multisites, because the initial `default_locale` is `en_US`. Sites where the `default_locale` isn't `en_US` could see missing translations. The same thing could happen when users are allowed to chose a different locale.
This change simplifies the logic by not using the `default_locale` in the locale chain. It always falls back to `en` in case of missing translations.
The failover spec is very fragile and tests specific implementation
vs actual behavior
We rely on a different script during the build process to test
failover operates correctly
We were sharing `Discourse` both as an application object and a
namespace which complicated things for Ember CLI. This patch
moves raw templates into `__DISCOURSE_RAW_TEMPLATES` and adds
a couple helper methods to create/remove them.
This introduces new APIs for obtaining optimized thumbnails for topics. There are a few building blocks required for this:
- Introduces new `image_upload_id` columns on the `posts` and `topics` table. This replaces the old `image_url` column, which means that thumbnails are now restricted to uploads. Hotlinked thumbnails are no longer possible. In normal use (with pull_hotlinked_images enabled), this has no noticeable impact
- A migration attempts to match existing urls to upload records. If a match cannot be found then the posts will be queued for rebake
- Optimized thumbnails are generated during post_process_cooked. If thumbnails are missing when serializing a topic list, then a sidekiq job is queued
- Topic lists and topics now include a `thumbnails` key, which includes all the available images:
```
"thumbnails": [
{
"max_width": null,
"max_height": null,
"url": "//example.com/original-image.png",
"width": 1380,
"height": 1840
},
{
"max_width": 1024,
"max_height": 1024,
"url": "//example.com/optimized-image.png",
"width": 768,
"height": 1024
}
]
```
- Themes can request additional thumbnail sizes by using a modifier in their `about.json` file:
```
"modifiers": {
"topic_thumbnail_sizes": [
[200, 200],
[800, 800]
],
...
```
Remember that these are generated asynchronously, so your theme should include logic to fallback to other available thumbnails if your requested size has not yet been generated
- Two new raw plugin outlets are introduced, to improve the customisability of the topic list. `topic-list-before-columns` and `topic-list-before-link`
This ensures that at a minimum you are notified once a day of
repeat edits by the same user.
Long term we may consider winding this down to say 1 hour or
making it configurable.
Due to a refactor in e90f9e5cc4 we stopped notifying on edits if
a user liked a post and then edited.
The like could have happened a long time ago so this gets extra
confusing.
This change makes the suppression more deliberate. We only want
to suppress quite/link/mention if the user already got a reply
notification.
We can expand this suppression if it is not enough.
* When hovering over the bookmark icon for a post, show the name of the bookmark at the end of the tooltip _if_ it has been set.
* Order bookmarks by `updated_at DESC` in the user list and show that instead of created at.
Recently, we added feature that we are sending `/muted` to users who muted specific topic just before `/latest` so the client knows to ignore those messages - https://github.com/discourse/discourse/pull/9482
Same `/muted` message should be included when the post is edited
TLDR; this commit vastly improves how whitespaces are handled when converting from HTML to Markdown.
It also adds support for converting HTML <tables> to markdown tables.
The previous 'remove_whitespaces!' method was traversing the whole HTML tree and used a heuristic to remove
leading and trailing whitespaces whenever it was appropriate (ie. mostly before and after HTML block elements)
It was a good idea, but it was very limited and leaded to bad conversion when the html had leading whitespaces on several lines for example.
One such example can be found [here](https://meta.discourse.org/t/86782).
For various reasons, most of the whitespaces in a HTML file is ignored when the page is being displayed in a browser.
The rules that the browsers follow are the [CSS' White Space Processing Rules](https://www.w3.org/TR/css-text-3/#white-space-rules).
They can be quite complicated when you take into account RTL languages and other various tidbits but they boils down to the following:
- Collapse whitespaces down to one space (0x20) inside an inline context (ie. nodes/tags that are being displaying on the same line)
- Remove any leading/trailing whitespaces inside an inline context
One quick & dirty way of getting this 90% solved would be to do 'HTML.gsub!(/[[:space:]]+/, " ")'.
We would also need to hoist <pre> elements in order to not mess with their whitespaces.
Unfortunately, this solution let some whitespaces creep around HTML tags which leads to more '.strip!' calls than I can bear.
I decided to "emulate" the browser's handling of whitespaces and came up with a solution in 4 parts
1. remove_not_allowed!
The HtmlToMarkdown library is recursively "visiting" all the nodes in the HTML in order to convert them to Markdown.
All the nodes that aren't handled by the library (eg. <script>, <style> or any non-textual HTML tags) are "swallowed".
In order to reduce the number of nodes visited, the method 'remove_not_allowed!' will automatically delete all the nodes
that have no "visitor" (eg. a 'visit_<tag>' method) defined.
2. remove_hidden!
Similar purpose as the previous method (eg. reducing number of nodes visited), there's no point trying to convert something that is hidden.
The 'remove_hidden!' method removes any nodes that was hidden using the "hidden" HTML attribute, some CSS or with a width or height equal to 0.
3. hoist_line_breaks!
The 'hoist_line_breaks!' method is there to handle <br> tags. I know those tiny <br> don't do much but they can be quite annoying.
The <br> tags are inline elements but they visually work like a block element (ie. they create a new line).
If you have the following HTML "<i>Foo<br>Bar</i>", it ends up visually similar to "<i>Foo</i><br><i>Bar</i>".
The latter being much more easy to process than the former, so that's what this method is doing.
The "hoist_line_breaks" will hoist <br> tags out of inline tags until their parent is a block element.
4. remove_whitespaces!
The "remove_whitespaces!" is where all the whitespace removal is happening. It's broken down into 4 methods as well
- remove_whitespaces!
- is_inline?
- collapse_spaces!
- remove_trailing_space!
The 'remove_whitespace!' method is recursively walking the HTML tree (skipping <pre> tags).
If a node has any children, they will be chunked into groups of inline elements vs block elements.
For each chunks of inline elements, it will call the "collapse_space!" and "remove_trailing_space!" methods.
For each chunks of block elements, it will call "remote_whitespace!" to keep walking the HTML tree recursively.
The "is_inline?" method determines whether a node is part of a inline context.
A node is inline iif it's a text node or it's an inline tag, but not <br>, and all its children are also inline.
The "collapse_spaces!" method will collapse any kind of (white) space into a single space (" ") character, even accros tags.
For example, if we have " Foo \n<i> Bar </i>\t42", it will return "Foo <i>Bar </i>42".
Finally, the "remove_trailing_space!" method is there to remove any trailing space that might creep in at the end of the inline chunk.
This solution is not 100% bullet-proof.
It does not support RTL languages at all and has some caveats that I felt were not worth the work to get properly fixed.
FIX: better detection of hidden elements when converting HTML to Markdown
FIX: take into account the 'allowed_href_schemes' site setting when converting HTML <a> to Markdown
FIX: added support for 'mailto:' scheme when converting <a> from HTML to Markdown
FIX: added support for <img> dimensions when converting from HTML to Markdown
FIX: added support for <dl>, <dd> and <dt> when converting from HTML to Markdown
FIX: added support for multilines emphases, strongs and strikes when converting from HTML to Markdown
FIX: added support for <acronym> when converting from HTML to Markdown
DEV: remove unused 'sanitize' gem
Wow, did you just read all that?! Congratz, here's a cookie: 🍪.
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.
For files with `# frozen_string_literal: true`
```
puts %w{a b}[0].frozen?
=> true
puts "hi".frozen?
=> true
puts "a #{1} b".frozen?
=> true
puts ("a " + "b").frozen?
=> false
puts (-("a " + "b")).frozen?
=> true
```
For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
* Rename all instances of bookmarkWithReminder and bookmark_with_reminder to just bookmark
* Delete old bookmark code at the same time
* Add migration to remove the bookmarkWithReminder post menu item if people have it set in site settings
Make sure the topic_user.bookmarked column is set correctly when user bookmarks/unbookmarks any post in a topic. For example, you bookmarked a post in the topic that was not the OP, the bookmark icon in the topic list would not be shown. Same if deleting a bookmark for the last bookmarked post in a topic, the bookmark icon in the topic list would not be removed.
Previously this was only setting it to true if bookmarking the OP/topic, which was not correct -- we want to show the icon on the topic list if any post is bookmarked.
Also set to false if unbookmarking the last bookmarked post in the topic.
Also in this PR is a migration to correct any out of sync topic_user.bookmarked columns, based on the new logic.
Previously we relied entirely on levenshtein_distance_spammer_emails site
setting to handle "similar looking" emails.
This commit improves the situation by always preferring to block (and check)
canonical emails.
This means that if:
`samevil+test@domain.com` is blocked the system will block `samevil@domain.com`
This means that `samevil+2@domain.com` (ad infinitum) will be blocked
This reverts commit 6f9177e2ed.
We decided on a completely different approach to the problem.
Instead we will let blocked emails be treated as canonical.
* When copying the markdown for an image between posts, we were not adding the srcset and data-small-image attributes which are done by calling optimize_image! in cooked post processor
* Refactored the code which was confusing in its current state (the consider_for_reuse method was super confusing) and fixed the issue