<p>コメント欄は spam だらけなので、ご用の方は x.com/takagiwa_m まで。</p>

To content | To menu | To search

Tag - PrivateGPT

Entries feed - Comments feed

Wednesday, February 28 2024

PrivateGPT を Windows で試した。

Chat with RTX を自宅で動かしてみたら結構使えそう。

まだ会社の PC に Chat with RTX に対応できる GPU ボードもないし、もうちょっとなんとかなって欲しいところもあるので、同等のものを構築していくしかなさそうだ。

Windows での環境構築は LM Studio が易しそう。GPT4All も易しそう。ただいずれも Chat with RTX のような機能を最初から持っているわけではないらしい。

キーワードは「RAG (Retrieval-Augmented Generation)」らしい。

How to create a private ChatGPT that interacts with your local documents で案内されている PrivateGPT はそういうのに使えるらしい。

ひとまず Anaconda をインストール。

PrivateGPT の Installation に従ってインストール。

Anaconda Powershell Prompt を起動。

(base) PS E:\Projects>python --version
Python 3.11.5

(base) PS E:\Projects>git clone https://github.com/imartinez/privateGPT
Cloning into 'privateGPT'...
remote: Enumerating objects: 1510, done.
remote: Counting objects: 100% (23/23), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 1510 (delta 2), reused 8 (delta 0), pack-reused 1487
Receiving objects: 100% (1510/1510), 1.69 MiB | 791.00 KiB/s, done.
Resolving deltas: 100% (819/819), done.

(base) PS E:\Projects\privateGPT> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -

Retrieving Poetry metadata

# Welcome to Poetry!

This will download and install the latest version of Poetry,
a dependency and package manager for Python.

It will add the `poetry` command to Poetry's bin directory, located at:

C:\Users\xxxxxxxx\AppData\Roaming\Python\Scripts

You can uninstall at any time by executing this script with the --uninstall option,
and these changes will be reverted.

Installing Poetry (1.8.1)
Installing Poetry (1.8.1): Creating environment
Installing Poetry (1.8.1): Installing Poetry
Installing Poetry (1.8.1): Creating script
Installing Poetry (1.8.1): Done

Poetry (1.8.1) is installed now. Great!

To get started you need Poetry's bin directory (C:\Users\xxxxxxxx\AppData\Roaming\Python\Scripts) in your `PATH`
environment variable.

Alternatively, you can call Poetry explicitly with `C:\Users\xxxxxxxx\AppData\Roaming\Python\Scripts\poetry`.

You can test that everything is set up by executing:

`poetry --version`

パスを通してあげてからバージョン確認。

(base) PS E:\Projects\privateGPT> $ENV:PATH += ";C:\Users\xxxxxxxx\AppData\Roaming\Python\Scripts"
(base) PS E:\Projects\privateGPT> poetry --version
Poetry (version 1.8.1)

MinGW のインストールでは「mingw32-gcc-g++」にチェックを入れて Installation → Apply した。こちらもパスが通っていなかったので追加してあげる。

Visual Studio 2022 はパスの設定が面倒だったので、あとでコマンドプロンプトに戻る。

Anaconda Prompt を開く。

(base) E:\Projects\privateGPT>"D:\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.5.1
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'

(base) E:\Projects\privateGPT>python --version
Python 3.11.5

(base) E:\Projects\privateGPT>poetry --version
Poetry (version 1.8.1)

(base) E:\Projects\privateGPT>cl --version
Microsoft(R) C/C++ Optimizing Compiler Version 19.35.32215 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

cl : コマンド ライン warning D9002 : 不明なオプション '--version' を無視します。
cl : コマンド ライン error D8003 : ソース ファイル名がありません

(base) E:\Projects\privateGPT>cmake --version
cmake version 3.25.1-msvc1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

(base) E:\Projects\privateGPT>gcc --version
gcc (MinGW.org GCC-6.3.0-1) 6.3.0
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(base) E:\Projects\privateGPT>poetry install --with ui
Creating virtualenv private-gpt--sQCGbRe-py3.11 in C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs
Installing dependencies from lock file

Package operations: 156 installs, 1 update, 0 removals
...
Installing the current project: private-gpt (0.2.0)

(base) E:\Projects\privateGPT>poetry run python -m private_gpt
16:34:10.531 [INFO    ] private_gpt.settings.settings_loader - Starting application with profiles=['default']
16:34:14.922 [INFO    ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=local
Traceback (most recent call last):
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 798, in get
    return self._context[key]
           ~~~~~~~~~~~~~^^^^^
KeyError: <class 'private_gpt.ui.ui.PrivateGptUi'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 798, in get
    return self._context[key]
           ~~~~~~~~~~~~~^^^^^
KeyError: <class 'private_gpt.server.ingest.ingest_service.IngestService'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 798, in get
    return self._context[key]
           ~~~~~~~~~~~~~^^^^^
KeyError: <class 'private_gpt.components.llm.llm_component.LLMComponent'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\llms\llama_cpp.py", line 102, in __init__
    from llama_cpp import Llama
ModuleNotFoundError: No module named 'llama_cpp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "E:\Projects\privateGPT\private_gpt\__main__.py", line 5, in <module>
    from private_gpt.main import app
  File "E:\Projects\privateGPT\private_gpt\main.py", line 11, in <module>
    app = create_app(global_injector)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\launcher.py", line 50, in create_app
    ui = root_injector.get(PrivateGptUi)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 974, in get
    provider_instance = scope_instance.get(interface, binding.provider)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 800, in get
    instance = self._get_instance(key, provider, self.injector)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 811, in _get_instance
    return provider.get(injector)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 264, in get
    return injector.create_object(self._cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 998, in create_object
    self.call_with_injection(init, self_=instance, kwargs=additional_kwargs)
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1031, in call_with_injection
    dependencies = self.args_to_inject(
                   ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1079, in args_to_inject
    instance: Any = self.get(interface)
                    ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 974, in get
    provider_instance = scope_instance.get(interface, binding.provider)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 800, in get
    instance = self._get_instance(key, provider, self.injector)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 811, in _get_instance
    return provider.get(injector)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 264, in get
    return injector.create_object(self._cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 998, in create_object
    self.call_with_injection(init, self_=instance, kwargs=additional_kwargs)
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1031, in call_with_injection
    dependencies = self.args_to_inject(
                   ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1079, in args_to_inject
    instance: Any = self.get(interface)
                    ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 974, in get
    provider_instance = scope_instance.get(interface, binding.provider)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 800, in get
    instance = self._get_instance(key, provider, self.injector)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 811, in _get_instance
    return provider.get(injector)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 264, in get
    return injector.create_object(self._cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 998, in create_object
    self.call_with_injection(init, self_=instance, kwargs=additional_kwargs)
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1040, in call_with_injection
    return callable(*full_args, **dependencies)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\components\llm\llm_component.py", line 38, in __init__
    self.llm = LlamaCPP(
               ^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\llms\llama_cpp.py", line 104, in __init__
    raise ImportError(
ImportError: Could not import llama_cpp library.Please install llama_cpp with `pip install llama-cpp-python`.See the full installation guide for GPU support at `https://github.com/abetlen/llama-cpp-python`

エラーになった。またこの環境ではポート 8001 が他で使用中だったので、Makefile、settings-sagemaker.yaml、settings.yaml の 8001 を 8002 に書き換えた。

エラー対策で続きを入れてみる。

(base) E:\Projects\privateGPT>poetry install --with local
Installing dependencies from lock file

Package operations: 7 installs, 0 updates, 0 removals

  - Installing scipy (1.11.4)
  - Installing threadpoolctl (3.2.0)
  - Installing diskcache (5.6.3)
  - Installing scikit-learn (1.3.2)
  - Installing torchvision (0.16.2)
  - Installing llama-cpp-python (0.2.23)
  - Installing sentence-transformers (2.2.2)

Installing the current project: private-gpt (0.2.0)

(base) E:\Projects\privateGPT>poetry run python scripts/setup

もう一度実行してみる。

(base) E:\Projects\privateGPT>poetry run python -m private_gpt

ファイルのダウンロードがされて、これで http://localhost:8002 を開いたら UI にアクセスできた。

ただ何か送ると「AttributeError: 'NoneType' object has no attribute 'split'」というエラーになる。

17:04:53.533 [INFO    ]            uvicorn.access - 127.0.0.1:52251 - "GET /queue/data?session_hash=nxrm6zcbks HTTP/1.1" 200
Traceback (most recent call last):
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\route_utils.py", line 231, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\blocks.py", line 1594, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\blocks.py", line 1188, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 513, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 639, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\chat_interface.py", line 487, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 513, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 506, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 489, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\ui\ui.py", line 127, in _chat
    all_messages = [*build_history(), new_message]
                     ^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\ui\ui.py", line 109, in build_history
    *[
     ^
  File "E:\Projects\privateGPT\private_gpt\ui\ui.py", line 114, in <listcomp>
    content=interaction[1].split(SOURCES_SEPARATOR)[0],
            ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
17:04:53.589 [INFO    ]            uvicorn.access - 127.0.0.1:52251 - "POST /run/predict HTTP/1.1" 200

一旦 PrivateGPT を再起動し、Mode に LLM Chat を選択して入れ直したら返事をしてくれた。「hello」と入れたら

Hello! How may I assist you today? I'm here to help answer any questions you have to the best of my ability. Please keep in mind that I cannot provide speculative or made-up information, and must always follow instructions given to me. Let me know if you have a specific question or topic you'd like me to help with.

「こんにちは」と入れたら

Hello there! I see you've greeted me in Japanese. That's great! If you have any questions or need assistance with something, feel free to ask. I'll do my best to help you in a respectful and honest manner. Let me know if you need anything specific. Is there a particular topic or question you have in mind?

日本語ということはわかるらしい。

試しに日本語の電子書籍の PDF を読み込ませて、ざっくり問い合わせてみた。英語で回答してくれたけれど、Chat with RTX 程には分析してはくれなかったらしい。

多分モデル?とか、日本語対応のいろいろを試してみる必要がありそう。要調査

で……これはどこかのフォルダを指定してあらかじめ学習、という形式ではないらしい。アップロードフォームがある。それは面倒。

Vector Stores として他のプログラムを使っているらしい。これに大量に流し込めればいいんだろうか。ただちょっと見た感じではファイルの流し込み方がわからなかった。

あとは応答がとても遅いけれど、CPU (Core i5 13500K) で処理しているので、GPU を入れれば改善するはず。

探してみたら、フォルダまるごと登録というものがあるらしい。 LINK

(base) E:\Projects\privateGPT>poetry run python scripts\ingest_folder.py D:\Books --watch --log-file ingestLog.txt

Traceback (most recent call last):
  File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 102, in <module>
    worker.ingest_folder(root_path, args.ignored)
  File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 38, in ingest_folder
    self._ingest_all(self._files_under_root_folder)
  File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 42, in _ingest_all
    self.ingest_service.bulk_ingest([(str(p.name), p) for p in files_to_ingest])
  File "E:\Projects\privateGPT\private_gpt\server\ingest\ingest_service.py", line 92, in bulk_ingest
    documents = self.ingest_component.bulk_ingest(files)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_component.py", line 127, in bulk_ingest
    documents = IngestionHelper.transform_file_into_documents(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 30, in transform_file_into_documents
    documents = IngestionHelper._load_file_to_documents(file_name, file_data)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 51, in _load_file_to_documents
    return reader_cls().load_data(file_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\readers\file\docs_reader.py", line 30, in load_data
    pdf = pypdf.PdfReader(fp)
          ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_reader.py", line 352, in __init__
    self._encryption.verify(pwd) == PasswordType.NOT_DECRYPTED
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_encryption.py", line 953, in verify
    key, rc = self.verify_v4(pwd) if self.V <= 4 else self.verify_v5(pwd)
                                                      ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_encryption.py", line 990, in verify_v5
    key = AlgV5.verify_owner_password(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_encryption.py", line 532, in verify_owner_password
    AlgV5.calculate_hash(R, password, o_value[32:40], u_value[:48])
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_encryption.py", line 577, in calculate_hash
    e = aes_cbc_encrypt(k[:16], k[16:32], k1 * 64)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_crypt_providers\_fallback.py", line 89, in aes_cbc_encrypt
    raise DependencyError(_DEPENDENCY_ERROR_STR)
pypdf.errors.DependencyError: cryptography>=3.1 is required for AES algorithm

エラーになった。cryptography だと Requirement already satisfied になる。LINK によると pycryptodome を入れるらしい?

(base) E:\Projects\privateGPT>pip install pycryptodome

同じエラーで止まった。

ちゃんと報告されていた

(base) E:\Projects\privateGPT>poetry add cryptography

というかよく考えたら日本語をちゃんと認識したし PDF の中身も英訳して答えていたから、どこかに出力言語の設定がある気がする。

あとは Chat with RTX で使ったモデル?と使われているベクトル化?のを合わせてあげると同じような精度の回答が得られるのでは。

2023/Feb/29 追記

PrivateGPT は出力言語設定ではなくモデルで変えるらしい。

そういえば Chat with RTX も日本語化けてた。日本語対応モデルを使ってもまだ化けるらしいので、ベクトル化?embeddings で日本語対応でないとだめかな?

そして一晩放っておいたらまたエラー。

Traceback (most recent call last):
  File "C:\Users\takagiwa\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\readers\file\epub_reader.py", line 21, in load_data
    import ebooklib
ModuleNotFoundError: No module named 'ebooklib'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 102, in <module>
    worker.ingest_folder(root_path, args.ignored)
  File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 38, in ingest_folder
    self._ingest_all(self._files_under_root_folder)
  File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 42, in _ingest_all
    self.ingest_service.bulk_ingest([(str(p.name), p) for p in files_to_ingest])
  File "E:\Projects\privateGPT\private_gpt\server\ingest\ingest_service.py", line 92, in bulk_ingest
    documents = self.ingest_component.bulk_ingest(files)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_component.py", line 127, in bulk_ingest
    documents = IngestionHelper.transform_file_into_documents(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 30, in transform_file_into_documents
    documents = IngestionHelper._load_file_to_documents(file_name, file_data)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 51, in _load_file_to_documents
    return reader_cls().load_data(file_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\takagiwa\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\readers\file\epub_reader.py", line 25, in load_data
    raise ImportError(
ImportError: Please install extra dependencies that are required for the EpubReader: `pip install EbookLib html2text`

同じように入れてみた。

(base) E:\Projects\privateGPT>poetry add EbookLib html2text

とはいえこのままでは1ファイルからの処理しかできないようなので、ひとまずここまで。違うシステムを探してみよう。