Show HN: Paramount is an OSS package that captures expert feedback on LLM chats

https://github.com/ask-fini/paramount

paramount

Paramount lets your expert agents evaluate AI chats, enabling:

  • quality assurance
  • ground truth capturing
  • automated regression testing

Usage

Example usage

Getting Started

  1. Install the package:
  1. Decorate your AI function:
@paramount.record()
def my_ai_function(message_history, new_question): # Inputs
    # <LLM invocations happen here>
    new_message = {'role': 'user', 'content': new_question}
    updated_history = message_history + [new_message]
    return updated_history  # Outputs.
  1. After my_ai_function(...) has run several times, launch the Paramount UI to evaluate results:

Your SMEs can now evaluate recordings and track accuracy improvements over time.

Paramount runs completely offline in your private environment.

Usage

After installation, run python example.py for a minimal working example.

Configuration

In order to set up successfully, define which input and output parameters represent the chat list used in the LLM.

This is done via the paramount.toml configuration file that you add in your project root dir.

It will be autogenerated for you with defaults if it doesn't already exist on first run.

[record]
enabled = true
function_url = "http://localhost:9000"  # The url to your LLM API flask app, for replay
[db]
type = "csv" # postgres also available
	[db.postgres]
	connection_string = ""
[api]
endpoint = "http://localhost" # url and port for paramount UI/API
port = 9001
split_by_id = false # In case you have several bots and want to split them by ID
identifier_colname = ""
[ui]  # These are display elements for the UI
# For the table display - define which columns should be shown
meta_cols = ['recorded_at']
input_cols = ['args__message_history', 'args__new_question']  # Matches my_ai_function() example
output_cols = ['1', '2']  # 1 and 2 are indexes for llm_answer and llm_references in example above
# For the chat display - describe how your chat structure is set up. This example uses OpenAI format.
chat_list = "output__1"  # Matches output updated_history. Must be a list of dicts to display chat format
chat_list_role_param = "role"  # Key in list of dicts describing the role in the chat
chat_list_content_param = "content"  # Key in list of dicts describing the content

It is also possible to describe references via config but is not shown here for simplicity.

See paramount.toml.example for more info.

For Developers

The deeper configuration instructions about the client & server can be seen here.

Docker

By using Dockerfile.server, you can containerize and deploy the whole package (including the client).

With Docker, you will need to mount the paramount.toml file dynamically into the container for it to work.

docker build -t paramount-server -f Dockerfile.server . # or make docker-build-server
docker run -dp 9001:9001 paramount-server # or make docker-run-server

License

This project is under GPL License.

{
"by": "hakimk",
"descendants": 0,
"id": 40247993,
"kids": [
40247994
],
"score": 9,
"text": "Hey HN, Hakim here from Fini (YC S22). We&#x27;ve seen first hand how AI chat projects pan out, and so have released an OSS library to ensure the industry gets more tools for improving outcomes.<p>Many AI chat projects are scrapped due to persistent inaccuracies in LLM responses. Paramount is an open-source Python package designed to bridge the gap between LLM-generated and ideal responses by incorporating expert feedback directly into the evaluation process. It provides a robust framework for recording LLM function outputs (ground truth data) and facilitates agent evaluations, reducing the time to identify and correct errors.<p>Developers can integrate Paramount with a decorator that logs LLM interactions into a CSV or database, followed by a straightforward UI for expert review. This process accelerates the debugging and validation phase of your project and de-risks your launch.",
"time": 1714746157,
"title": "Show HN: Paramount is an OSS package that captures expert feedback on LLM chats",
"type": "story",
"url": "https://github.com/ask-fini/paramount"
}
{
"author": "ask-fini",
"date": null,
"description": "Agent accuracy measurements for LLMs. Contribute to ask-fini/paramount development by creating an account on GitHub.",
"image": "https://opengraph.githubassets.com/47674fb200f81102ce1fa878a042905da3f9d6902963cd5fcff199880fe4367b/ask-fini/paramount",
"logo": "https://logo.clearbit.com/github.com",
"publisher": "GitHub",
"title": "GitHub - ask-fini/paramount: Agent accuracy measurements for LLMs",
"url": "https://github.com/ask-fini/paramount"
}
{
"url": "https://github.com/ask-fini/paramount",
"title": "GitHub - ask-fini/paramount: Agent accuracy measurements for LLMs",
"description": "paramount Paramount lets your expert agents evaluate AI chats, enabling: quality assurance ground truth capturing automated regression testing Usage Getting Started Install the package: Decorate your AI...",
"links": [
"https://github.com/ask-fini/paramount"
],
"image": "https://opengraph.githubassets.com/47674fb200f81102ce1fa878a042905da3f9d6902963cd5fcff199880fe4367b/ask-fini/paramount",
"content": "<div><article><p></p><h2>paramount</h2><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount#paramount\"></a><p></p>\n<p>Paramount lets your expert agents evaluate AI chats, enabling:</p>\n<ul>\n<li>quality assurance</li>\n<li>ground truth capturing</li>\n<li>automated regression testing</li>\n</ul>\n<p></p><h3>Usage</h3><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount#usage\"></a><p></p>\n<p><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount/blob/main/usage.gif\"><img src=\"https://github.com/ask-fini/paramount/raw/main/usage.gif\" alt=\"Example usage\" /></a></p>\n<p></p><h3>Getting Started</h3><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount#getting-started\"></a><p></p>\n<ol>\n<li>Install the package:</li>\n</ol>\n<ol>\n<li>Decorate your AI function:</li>\n</ol>\n<div><pre><span>@<span>paramount</span>.<span>record</span>()</span>\n<span>def</span> <span>my_ai_function</span>(<span>message_history</span>, <span>new_question</span>): <span># Inputs</span>\n <span># &lt;LLM invocations happen here&gt;</span>\n <span>new_message</span> <span>=</span> {<span>'role'</span>: <span>'user'</span>, <span>'content'</span>: <span>new_question</span>}\n <span>updated_history</span> <span>=</span> <span>message_history</span> <span>+</span> [<span>new_message</span>]\n <span>return</span> <span>updated_history</span> <span># Outputs.</span></pre></div>\n<ol>\n<li>After <code>my_ai_function(...)</code> has run several times, launch the Paramount UI to evaluate results:</li>\n</ol>\n<p>Your SMEs can now evaluate recordings and track accuracy improvements over time.</p>\n<p>Paramount runs completely offline in your private environment.</p>\n<p></p><h3>Usage</h3><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount#usage-1\"></a><p></p>\n<p>After installation, run <code>python example.py</code> for a minimal working example.</p>\n<p></p><h3>Configuration</h3><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount#configuration\"></a><p></p>\n<p>In order to set up successfully, define which input and output parameters represent the chat list used in the LLM.</p>\n<p>This is done via the <code>paramount.toml</code> configuration file that you add in your project root dir.</p>\n<p>It will be autogenerated for you with defaults if it doesn't already exist on first run.</p>\n<div><pre>[<span>record</span>]\n<span>enabled</span> = <span>true</span>\n<span>function_url</span> = <span><span>\"</span>http://localhost:9000<span>\"</span></span> <span><span>#</span> The url to your LLM API flask app, for replay</span>\n[<span>db</span>]\n<span>type</span> = <span><span>\"</span>csv<span>\"</span></span> <span><span>#</span> postgres also available</span>\n\t[<span>db</span>.<span>postgres</span>]\n\t<span>connection_string</span> = <span><span>\"</span><span>\"</span></span>\n[<span>api</span>]\n<span>endpoint</span> = <span><span>\"</span>http://localhost<span>\"</span></span> <span><span>#</span> url and port for paramount UI/API</span>\n<span>port</span> = <span>9001</span>\n<span>split_by_id</span> = <span>false</span> <span><span>#</span> In case you have several bots and want to split them by ID</span>\n<span>identifier_colname</span> = <span><span>\"</span><span>\"</span></span>\n[<span>ui</span>] <span><span>#</span> These are display elements for the UI</span>\n<span><span>#</span> For the table display - define which columns should be shown</span>\n<span>meta_cols</span> = [<span><span>'</span>recorded_at<span>'</span></span>]\n<span>input_cols</span> = [<span><span>'</span>args__message_history<span>'</span></span>, <span><span>'</span>args__new_question<span>'</span></span>] <span><span>#</span> Matches my_ai_function() example</span>\n<span>output_cols</span> = [<span><span>'</span>1<span>'</span></span>, <span><span>'</span>2<span>'</span></span>] <span><span>#</span> 1 and 2 are indexes for llm_answer and llm_references in example above</span>\n<span><span>#</span> For the chat display - describe how your chat structure is set up. This example uses OpenAI format.</span>\n<span>chat_list</span> = <span><span>\"</span>output__1<span>\"</span></span> <span><span>#</span> Matches output updated_history. Must be a list of dicts to display chat format</span>\n<span>chat_list_role_param</span> = <span><span>\"</span>role<span>\"</span></span> <span><span>#</span> Key in list of dicts describing the role in the chat</span>\n<span>chat_list_content_param</span> = <span><span>\"</span>content<span>\"</span></span> <span><span>#</span> Key in list of dicts describing the content</span></pre></div>\n<p>It is also possible to describe references via config but is not shown here for simplicity.</p>\n<p>See <code>paramount.toml.example</code> for more info.</p>\n<p></p><h4>For Developers</h4><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount#for-developers\"></a><p></p>\n<p>The deeper configuration instructions about the <code>client</code> &amp; <code>server</code> can be seen <a target=\"_blank\" href=\"https://github.com/ask-fini/paramount/blob/main/paramount/README.md\">here</a>.</p>\n<p></p><h3>Docker</h3><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount#docker\"></a><p></p>\n<p>By using <code>Dockerfile.server</code>, you can containerize and deploy the whole package (including the client).</p>\n<p>With Docker, you will need to mount the <code>paramount.toml</code> file dynamically into the container for it to work.</p>\n<div><pre>docker build -t paramount-server -f Dockerfile.server <span>.</span> <span><span>#</span> or make docker-build-server</span>\ndocker run -dp 9001:9001 paramount-server <span><span>#</span> or make docker-run-server</span></pre></div>\n<p></p><h3>License</h3><a target=\"_blank\" href=\"https://github.com/ask-fini/paramount#license\"></a><p></p>\n<p>This project is under <a target=\"_blank\" href=\"https://github.com/ask-fini/paramount/blob/main/LICENSE\">GPL License</a>.</p>\n</article></div>",
"author": "",
"favicon": "https://github.githubassets.com/favicons/favicon.svg",
"source": "github.com",
"published": "",
"ttr": 85,
"type": "object"
}