It's very simple import script and currently imports only the following content:
* Users
* Messages as Discourse topics/posts
* Attachments
Each channel can be mapped to a category and tags. It uses regular expressions to convert formatted messages ("rich text") into Markdown used by Discourse. In the future we could convert the `blocks` attribute from each message into Markdown instead of applying regular expressions on the `text` attribute.
Various migration scripts define a normalize_raw method to do custom processing of post contents before storing it in the Post.raw and other fields.
They normally do not handle nil inputs, but it's a relatively common occurrence in data dumps.
Since this method is used from various points in the migration script, as it stands, the experience of using a migration script is that it will fail multiple times at different points, forcing you to fix the data or apply logic hacks every time then restarting.
This PR generalizes handling of nil input by returning a <missing> string.
Pros:
no more messy repeated crashes + restarts
consistency
Cons:
it might hide data issues
OTOH we can't print a warning on that method because it will flood the console since it's called from inside loops.
* FIX: zendesk import script: support nil inputs in normalize_raw
* FIX: return '<missing>' instead of empty string; do it for all methods
This commit implements many changes to topic and comments embedding. It
deprecates the class_name field from EmbeddableHost and suggests using
the className parameter. discourse_username parameter has been
deprecated and it will fetch it from embedded site from the author or
discourse-username meta.
See the updated code sample from Admin > Customize > Embedding page.
* FEATURE: Add className parameter for Discourse embed
* DEV: Hide class_name from EmbeddableHost
* DEV: Deprecate class_name field of EmbeddableHost
* FEATURE: Use either author or discourse-username meta tag
* DEV: Deprecate discourse_username parameter
* DEV: Improve embed code sample
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.
There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
This bug is actually a Drupal issue where some edited posts have their `created` and `changed` timestamps set to the same value. But even when that happens in Drupal it still maintains the correct post order in an affected thread. This PR makes the Discourse importer also maintain the original Drupal comment order by sorting comments in the source DB by their `cid`, which is sequential and never changes. More details from this post onward:
https://meta.discourse.org/t/large-drupal-forum-migration-importer-errors-and-limitations/246939/24?u=rahim123
While load testing our user creation code path in production, we
identified that executing the DB statement to update the `Group#user_count` column within a
transaction is creating a bottleneck for us. This is because the
creation of a user and addition of the user to the relevant groups are
done in a transaction. When we execute the DB statement to update
`Group#user_count` for the relevant group, a row level lock is held
until the transaction completes. This row level lock acts like a global
lock when the server is creating users that will be added to the same
group in quick succession.
Instead of updating the counter cache within a transaction which the
default ActiveRecord `counter_cache` option does, we simply update the
counter cache outside of the committing transaction.
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
* Allow taking table prefix from env var
* FIX: remove unused column references
The columns `filedata` and `extension` are not present in a v4.2.4
database, and they aren't used in the method anyways.
* FIX: report progress for tables without imported_id
* FIX: effectively check for AR validation errors
NOTE: other migration scripts also have this problem; see /t/58202
* FIX: properly count Posts when importing attachments
* FIX: improve logging
* Remove leftover comment
* FIX: show progress when exporting Permalink file
* PERF: stream Permalink file
The current way results in tons of memory usage; write once per line instead
* Document fixes needed
* WIP - deduplicate category names
* Ignore non alphanumeric chars for grouping
* FIX: properly deduplicate user emails by merging accounts
* FIX: don't merge empty UserEmails
* Improve logging
* Merge users AFTER fixing primary key sequences
* Parallelize user merging
* Save duplicated users structure for debugging purposes
* Add progress logging for the (multiple hour) user merging step
* handle polls with duplicate items
* handle polls with incorrect poll_option_total values
* handle group IDs in personal messages
* support for version 3.3
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.
No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
A bit of a mixed bag, this addresses several edge areas of bookmarks and makes them compatible with polymorphic bookmarks (hidden behind the `use_polymorphic_bookmarks` site setting). The main ones are:
* ExportUserArchive compatibility
* SyncTopicUserBookmarked job compatibility
* Sending different notifications for the bookmark reminders based on the bookmarkable type
* Import scripts compatibility
* BookmarkReminderNotificationHandler compatibility
This PR also refactors the `register_bookmarkable` API so it accepts a class descended from a `BaseBookmarkable` class instead. This was done because we kept having to add more and more lambdas/properties inline and it was very messy, so a factory pattern is cleaner. The classes can be tested independently as well.
Some later PRs will address some other areas like the discourse narrative bot, advanced search, reports, and the .ics endpoint for bookmarks.
This allows text editors to use correct syntax coloring for the heredoc sections.
Heredoc tag names we use:
languages: SQL, JS, RUBY, LUA, HTML, CSS, SCSS, SH, HBS, XML, YAML/YML, MF, ICS
other: MD, TEXT/TXT, RAW, EMAIL
We validate the *format* of email addresses in many places with a match against
a regex, often with very slightly different syntax.
Adding a separate EmailAddressValidator simplifies the code in a few spots and
feels cleaner.
Deprecated the old location in case someone is using it in a plugin.
No functionality change is in this commit.
Note: the regex used at the moment does not support using address literals, e.g.:
* localpart@[192.168.0.1]
* localpart@[2001:db8::1]
* Optional import of custom user fields from phpBB 3.1+
* Optional import of likes from phpBB3
Requires the phpBB "Thanks for posts" extension
* Fix import of bookmarks from phpBB3
* Update `created_at` of existing user
* Support mapping of phpBB forums to existing Discourse categories
This is in addition to the ability of merging phpBB forums and importing into newly created Discourse categories.
1. bbcode hashes don't always have exactly 8 characters.
2. colors aren't always hex values, it can be a color string ("red", "blue", etc).
3. The closing tag of smileys doesn't always include a `:` character (the start of the regex was already right for this particular issue)
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
Setting a random value in the interval 1 week ago ... now works better
because this spreads digest scheduling over a week because digests are
sent one week from the date of the last digest.