discourse-ai/lib/summarization/entry_point.rb

# frozen_string_literal: true

module DiscourseAi
  module Summarization
    class EntryPoint
      def inject_into(plugin)
        foldable_models = [
          Models::OpenAi.new("open_ai:gpt-4", max_tokens: 8192),
          Models::OpenAi.new("open_ai:gpt-4-32k", max_tokens: 32_768),
          Models::OpenAi.new("open_ai:gpt-4-turbo", max_tokens: 100_000),
          Models::OpenAi.new("open_ai:gpt-4o", max_tokens: 100_000),
          Models::OpenAi.new("open_ai:gpt-3.5-turbo", max_tokens: 4096),
          Models::OpenAi.new("open_ai:gpt-3.5-turbo-16k", max_tokens: 16_384),
          Models::Gemini.new("google:gemini-pro", max_tokens: 32_768),
          Models::Gemini.new("google:gemini-1.5-pro", max_tokens: 800_000),
        ]

        claude_prov = "anthropic"
        if DiscourseAi::Completions::Endpoints::AwsBedrock.correctly_configured?("claude-2")
          claude_prov = "aws_bedrock"
        end

        foldable_models << Models::Anthropic.new("#{claude_prov}:claude-2", max_tokens: 200_000)
        foldable_models << Models::Anthropic.new(
          "#{claude_prov}:claude-instant-1",
          max_tokens: 100_000,
        )
        foldable_models << Models::Anthropic.new(
          "#{claude_prov}:claude-3-haiku",
          max_tokens: 200_000,
        )
        foldable_models << Models::Anthropic.new(
          "#{claude_prov}:claude-3-sonnet",
          max_tokens: 200_000,
        )

        foldable_models << Models::Anthropic.new(
          "#{claude_prov}:claude-3-opus",
          max_tokens: 200_000,
        )

        mixtral_prov = "hugging_face"
        if DiscourseAi::Completions::Endpoints::Vllm.correctly_configured?(
             "mistralai/Mixtral-8x7B-Instruct-v0.1",
           )
          mixtral_prov = "vllm"
        end

        foldable_models << Models::Mixtral.new(
          "#{mixtral_prov}:mistralai/Mixtral-8x7B-Instruct-v0.1",
          max_tokens: 32_000,
        )

        # TODO: Roman, we need to de-register custom LLMs on destroy from summarization
        # strategy and clear cache
        # it may be better to pull all of this code into Discourse AI cause as it stands
        # the coupling is making it really hard to reason about summarization
        #
        # Auto registration and de-registration needs to be tested

        #LlmModel.all.each do |model|
        #  foldable_models << Models::CustomLlm.new(
        #    "custom:#{model.id}",
        #    max_tokens: model.max_prompt_tokens,
        #  )
        #end

        foldable_models.each do |model|
          plugin.register_summarization_strategy(Strategies::FoldContent.new(model))
        end

        #plugin.add_model_callback(LlmModel, :after_create) do
        #  new_model = Models::CustomLlm.new("custom:#{self.id}", max_tokens: self.max_prompt_tokens)

        #  if ::Summarization::Base.find_strategy("custom:#{self.id}").nil?
        #    plugin.register_summarization_strategy(Strategies::FoldContent.new(new_model))
        #  end
        #end
      end
    end
  end
end
FEATURE: Chat channel summarization. (#32) * start summary module * chat channel summarization * FEATURE: modal for channel summarization --------- Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com> 2023-04-04 10:24:09 -04:00			`# frozen_string_literal: true`

			`module DiscourseAi`
			`module Summarization`
			`class EntryPoint`
			`def inject_into(plugin)`
DEV: Better strategies for summarization (#88) * DEV: Better strategies for summarization The strategy responsibility needs to be "Given a collection of texts, I know how to summarize them most efficiently, using the minimum amount of requests and maximizing token usage". There are different token limits for each model, so it all boils down to two different strategies: Fold all these texts into a single one, doing the summarization in chunks, and then build a summary from those. Build it by combining texts in a single prompt, and truncate it according to your token limits. While the latter is less than ideal, we need it for "bart-large-cnn-samsum" and "flan-t5-base-samsum", both with low limits. The rest will rely on folding. * Expose summarized chunks to users 2023-06-27 11:26:33 -04:00			`foldable_models = [`
UX: Validations to LLM-backed features (except AI Bot) (#436) * UX: Validations to Llm-backed features (except AI Bot) This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs. Validations are: * You must choose a model before enabling the feature. * You must turn off the feature before setting the model to blank. * You must configure each model settings before being able to select it. * Add provider name to summarization options * vLLM can technically support same models as HF * Check we can talk to the selected model * Check for Bedrock instead of anthropic as a site could have both creds setup 2024-01-29 14:04:25 -05:00			`Models::OpenAi.new("open_ai:gpt-4", max_tokens: 8192),`
			`Models::OpenAi.new("open_ai:gpt-4-32k", max_tokens: 32_768),`
FEATURE: remove gpt-4-turbo-0125 preview swap with gpt-4-turbo (#568) Open AI just released gpt-4-turbo (with vision) This change stops using the old preview model and swaps with the officially released gpt-4-turbo To come is an implementation of vision. 2024-04-10 08:53:20 -04:00			`Models::OpenAi.new("open_ai:gpt-4-turbo", max_tokens: 100_000),`
FEATURE: GPT4o support and better auditing (#618) - Introduce new support for GPT4o (automation / bot / summary / helper) - Properly account for token counts on OpenAI models - Track feature that was used when generating AI completions - Remove custom llm support for summarization as we need better interfaces to control registration and de-registration 2024-05-13 23:28:46 -04:00			`Models::OpenAi.new("open_ai:gpt-4o", max_tokens: 100_000),`
UX: Validations to LLM-backed features (except AI Bot) (#436) * UX: Validations to Llm-backed features (except AI Bot) This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs. Validations are: * You must choose a model before enabling the feature. * You must turn off the feature before setting the model to blank. * You must configure each model settings before being able to select it. * Add provider name to summarization options * vLLM can technically support same models as HF * Check we can talk to the selected model * Check for Bedrock instead of anthropic as a site could have both creds setup 2024-01-29 14:04:25 -05:00			`Models::OpenAi.new("open_ai:gpt-3.5-turbo", max_tokens: 4096),`
			`Models::OpenAi.new("open_ai:gpt-3.5-turbo-16k", max_tokens: 16_384),`
			`Models::Gemini.new("google:gemini-pro", max_tokens: 32_768),`
FEATURE: Gemini 1.5 pro support and Claude Opus bedrock support (#580) - Updated AI Bot to only support Gemini 1.5 (used to support 1.0) - 1.0 was removed cause it is not appropriate for Bot usage - Summaries and automation can now lean on Gemini 1.5 pro - Amazon added support for Claude 3 Opus, added internal support for it on bedrock 2024-04-17 01:37:19 -04:00			`Models::Gemini.new("google:gemini-1.5-pro", max_tokens: 800_000),`
DEV: Better strategies for summarization (#88) * DEV: Better strategies for summarization The strategy responsibility needs to be "Given a collection of texts, I know how to summarize them most efficiently, using the minimum amount of requests and maximizing token usage". There are different token limits for each model, so it all boils down to two different strategies: Fold all these texts into a single one, doing the summarization in chunks, and then build a summary from those. Build it by combining texts in a single prompt, and truncate it according to your token limits. While the latter is less than ideal, we need it for "bart-large-cnn-samsum" and "flan-t5-base-samsum", both with low limits. The rest will rely on folding. * Expose summarized chunks to users 2023-06-27 11:26:33 -04:00			`]`

UX: Validations to LLM-backed features (except AI Bot) (#436) * UX: Validations to Llm-backed features (except AI Bot) This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs. Validations are: * You must choose a model before enabling the feature. * You must turn off the feature before setting the model to blank. * You must configure each model settings before being able to select it. * Add provider name to summarization options * vLLM can technically support same models as HF * Check we can talk to the selected model * Check for Bedrock instead of anthropic as a site could have both creds setup 2024-01-29 14:04:25 -05:00			`claude_prov = "anthropic"`
			`if DiscourseAi::Completions::Endpoints::AwsBedrock.correctly_configured?("claude-2")`
			`claude_prov = "aws_bedrock"`
			`end`

			`foldable_models << Models::Anthropic.new("#{claude_prov}:claude-2", max_tokens: 200_000)`
			`foldable_models << Models::Anthropic.new(`
			`"#{claude_prov}:claude-instant-1",`
			`max_tokens: 100_000,`
			`)`
FEATURE: friendlier reply behavior in bot PMs (#535) - Stop replying as bot, when human replies to another human - Reply as correct persona when replying directly to a persona - Fix paper cut where suppressing notifications was not doing so 2024-03-19 05:15:12 -04:00			`foldable_models << Models::Anthropic.new(`
			`"#{claude_prov}:claude-3-haiku",`
			`max_tokens: 200_000,`
			`)`
			`foldable_models << Models::Anthropic.new(`
			`"#{claude_prov}:claude-3-sonnet",`
			`max_tokens: 200_000,`
			`)`

FEATURE: Gemini 1.5 pro support and Claude Opus bedrock support (#580) - Updated AI Bot to only support Gemini 1.5 (used to support 1.0) - 1.0 was removed cause it is not appropriate for Bot usage - Summaries and automation can now lean on Gemini 1.5 pro - Amazon added support for Claude 3 Opus, added internal support for it on bedrock 2024-04-17 01:37:19 -04:00			`foldable_models << Models::Anthropic.new(`
			`"#{claude_prov}:claude-3-opus",`
			`max_tokens: 200_000,`
			`)`
UX: Validations to LLM-backed features (except AI Bot) (#436) * UX: Validations to Llm-backed features (except AI Bot) This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs. Validations are: * You must choose a model before enabling the feature. * You must turn off the feature before setting the model to blank. * You must configure each model settings before being able to select it. * Add provider name to summarization options * vLLM can technically support same models as HF * Check we can talk to the selected model * Check for Bedrock instead of anthropic as a site could have both creds setup 2024-01-29 14:04:25 -05:00
			`mixtral_prov = "hugging_face"`
			`if DiscourseAi::Completions::Endpoints::Vllm.correctly_configured?(`
			`"mistralai/Mixtral-8x7B-Instruct-v0.1",`
			`)`
			`mixtral_prov = "vllm"`
			`end`

			`foldable_models << Models::Mixtral.new(`
			`"#{mixtral_prov}:mistralai/Mixtral-8x7B-Instruct-v0.1",`
			`max_tokens: 32_000,`
			`)`

FEATURE: GPT4o support and better auditing (#618) - Introduce new support for GPT4o (automation / bot / summary / helper) - Properly account for token counts on OpenAI models - Track feature that was used when generating AI completions - Remove custom llm support for summarization as we need better interfaces to control registration and de-registration 2024-05-13 23:28:46 -04:00			`# TODO: Roman, we need to de-register custom LLMs on destroy from summarization`
			`# strategy and clear cache`
			`# it may be better to pull all of this code into Discourse AI cause as it stands`
			`# the coupling is making it really hard to reason about summarization`
			`#`
			`# Auto registration and de-registration needs to be tested`

			`#LlmModel.all.each do \|model\|`
			`# foldable_models << Models::CustomLlm.new(`
			`# "custom:#{model.id}",`
			`# max_tokens: model.max_prompt_tokens,`
			`# )`
			`#end`
HACK: Llama3 support for summarization/AI helper. (#616) There are still some limitations to which models we can support with the `LlmModel` class. This will enable support for Llama3 while we sort those out. 2024-05-13 14:54:42 -04:00
DEV: Better strategies for summarization (#88) * DEV: Better strategies for summarization The strategy responsibility needs to be "Given a collection of texts, I know how to summarize them most efficiently, using the minimum amount of requests and maximizing token usage". There are different token limits for each model, so it all boils down to two different strategies: Fold all these texts into a single one, doing the summarization in chunks, and then build a summary from those. Build it by combining texts in a single prompt, and truncate it according to your token limits. While the latter is less than ideal, we need it for "bart-large-cnn-samsum" and "flan-t5-base-samsum", both with low limits. The rest will rely on folding. * Expose summarized chunks to users 2023-06-27 11:26:33 -04:00			`foldable_models.each do \|model\|`
			`plugin.register_summarization_strategy(Strategies::FoldContent.new(model))`
			`end`
HACK: Llama3 support for summarization/AI helper. (#616) There are still some limitations to which models we can support with the `LlmModel` class. This will enable support for Llama3 while we sort those out. 2024-05-13 14:54:42 -04:00
FEATURE: GPT4o support and better auditing (#618) - Introduce new support for GPT4o (automation / bot / summary / helper) - Properly account for token counts on OpenAI models - Track feature that was used when generating AI completions - Remove custom llm support for summarization as we need better interfaces to control registration and de-registration 2024-05-13 23:28:46 -04:00			`#plugin.add_model_callback(LlmModel, :after_create) do`
			`# new_model = Models::CustomLlm.new("custom:#{self.id}", max_tokens: self.max_prompt_tokens)`
HACK: Llama3 support for summarization/AI helper. (#616) There are still some limitations to which models we can support with the `LlmModel` class. This will enable support for Llama3 while we sort those out. 2024-05-13 14:54:42 -04:00
FEATURE: GPT4o support and better auditing (#618) - Introduce new support for GPT4o (automation / bot / summary / helper) - Properly account for token counts on OpenAI models - Track feature that was used when generating AI completions - Remove custom llm support for summarization as we need better interfaces to control registration and de-registration 2024-05-13 23:28:46 -04:00			`# if ::Summarization::Base.find_strategy("custom:#{self.id}").nil?`
			`# plugin.register_summarization_strategy(Strategies::FoldContent.new(new_model))`
			`# end`
			`#end`
FEATURE: Chat channel summarization. (#32) * start summary module * chat channel summarization * FEATURE: modal for channel summarization --------- Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com> 2023-04-04 10:24:09 -04:00			`end`
			`end`
			`end`
			`end`