PrivateGPT を Windows で試した。
By takagiwa on Wednesday, February 28 2024, 22:04 - LLM - Permalink
Chat with RTX を自宅で動かしてみたら結構使えそう。
まだ会社の PC に Chat with RTX に対応できる GPU ボードもないし、もうちょっとなんとかなって欲しいところもあるので、同等のものを構築していくしかなさそうだ。
Windows での環境構築は LM Studio が易しそう。GPT4All も易しそう。ただいずれも Chat with RTX のような機能を最初から持っているわけではないらしい。
キーワードは「RAG (Retrieval-Augmented Generation)」らしい。
How to create a private ChatGPT that interacts with your local documents で案内されている PrivateGPT はそういうのに使えるらしい。
ひとまず Anaconda をインストール。
PrivateGPT の Installation に従ってインストール。
Anaconda Powershell Prompt を起動。
(base) PS E:\Projects>python --version Python 3.11.5 (base) PS E:\Projects>git clone https://github.com/imartinez/privateGPT Cloning into 'privateGPT'... remote: Enumerating objects: 1510, done. remote: Counting objects: 100% (23/23), done. remote: Compressing objects: 100% (22/22), done. remote: Total 1510 (delta 2), reused 8 (delta 0), pack-reused 1487 Receiving objects: 100% (1510/1510), 1.69 MiB | 791.00 KiB/s, done. Resolving deltas: 100% (819/819), done. (base) PS E:\Projects\privateGPT> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python - Retrieving Poetry metadata # Welcome to Poetry! This will download and install the latest version of Poetry, a dependency and package manager for Python. It will add the `poetry` command to Poetry's bin directory, located at: C:\Users\xxxxxxxx\AppData\Roaming\Python\Scripts You can uninstall at any time by executing this script with the --uninstall option, and these changes will be reverted. Installing Poetry (1.8.1) Installing Poetry (1.8.1): Creating environment Installing Poetry (1.8.1): Installing Poetry Installing Poetry (1.8.1): Creating script Installing Poetry (1.8.1): Done Poetry (1.8.1) is installed now. Great! To get started you need Poetry's bin directory (C:\Users\xxxxxxxx\AppData\Roaming\Python\Scripts) in your `PATH` environment variable. Alternatively, you can call Poetry explicitly with `C:\Users\xxxxxxxx\AppData\Roaming\Python\Scripts\poetry`. You can test that everything is set up by executing: `poetry --version`
パスを通してあげてからバージョン確認。
(base) PS E:\Projects\privateGPT> $ENV:PATH += ";C:\Users\xxxxxxxx\AppData\Roaming\Python\Scripts" (base) PS E:\Projects\privateGPT> poetry --version Poetry (version 1.8.1)
MinGW のインストールでは「mingw32-gcc-g++」にチェックを入れて Installation → Apply した。こちらもパスが通っていなかったので追加してあげる。
Visual Studio 2022 はパスの設定が面倒だったので、あとでコマンドプロンプトに戻る。
Anaconda Prompt を開く。
(base) E:\Projects\privateGPT>"D:\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat" ********************************************************************** ** Visual Studio 2022 Developer Command Prompt v17.5.1 ** Copyright (c) 2022 Microsoft Corporation ********************************************************************** [vcvarsall.bat] Environment initialized for: 'x64' (base) E:\Projects\privateGPT>python --version Python 3.11.5 (base) E:\Projects\privateGPT>poetry --version Poetry (version 1.8.1) (base) E:\Projects\privateGPT>cl --version Microsoft(R) C/C++ Optimizing Compiler Version 19.35.32215 for x64 Copyright (C) Microsoft Corporation. All rights reserved. cl : コマンド ライン warning D9002 : 不明なオプション '--version' を無視します。 cl : コマンド ライン error D8003 : ソース ファイル名がありません (base) E:\Projects\privateGPT>cmake --version cmake version 3.25.1-msvc1 CMake suite maintained and supported by Kitware (kitware.com/cmake). (base) E:\Projects\privateGPT>gcc --version gcc (MinGW.org GCC-6.3.0-1) 6.3.0 Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(base) E:\Projects\privateGPT>poetry install --with ui Creating virtualenv private-gpt--sQCGbRe-py3.11 in C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs Installing dependencies from lock file Package operations: 156 installs, 1 update, 0 removals ... Installing the current project: private-gpt (0.2.0) (base) E:\Projects\privateGPT>poetry run python -m private_gpt 16:34:10.531 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default'] 16:34:14.922 [INFO ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=local Traceback (most recent call last): File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 798, in get return self._context[key] ~~~~~~~~~~~~~^^^^^ KeyError: <class 'private_gpt.ui.ui.PrivateGptUi'> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 798, in get return self._context[key] ~~~~~~~~~~~~~^^^^^ KeyError: <class 'private_gpt.server.ingest.ingest_service.IngestService'> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 798, in get return self._context[key] ~~~~~~~~~~~~~^^^^^ KeyError: <class 'private_gpt.components.llm.llm_component.LLMComponent'> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\llms\llama_cpp.py", line 102, in __init__ from llama_cpp import Llama ModuleNotFoundError: No module named 'llama_cpp' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "E:\Projects\privateGPT\private_gpt\__main__.py", line 5, in <module> from private_gpt.main import app File "E:\Projects\privateGPT\private_gpt\main.py", line 11, in <module> app = create_app(global_injector) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\launcher.py", line 50, in create_app ui = root_injector.get(PrivateGptUi) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 974, in get provider_instance = scope_instance.get(interface, binding.provider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 800, in get instance = self._get_instance(key, provider, self.injector) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 811, in _get_instance return provider.get(injector) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 264, in get return injector.create_object(self._cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 998, in create_object self.call_with_injection(init, self_=instance, kwargs=additional_kwargs) File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1031, in call_with_injection dependencies = self.args_to_inject( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1079, in args_to_inject instance: Any = self.get(interface) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 974, in get provider_instance = scope_instance.get(interface, binding.provider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 800, in get instance = self._get_instance(key, provider, self.injector) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 811, in _get_instance return provider.get(injector) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 264, in get return injector.create_object(self._cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 998, in create_object self.call_with_injection(init, self_=instance, kwargs=additional_kwargs) File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1031, in call_with_injection dependencies = self.args_to_inject( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1079, in args_to_inject instance: Any = self.get(interface) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 974, in get provider_instance = scope_instance.get(interface, binding.provider) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 91, in wrapper return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 800, in get instance = self._get_instance(key, provider, self.injector) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 811, in _get_instance return provider.get(injector) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 264, in get return injector.create_object(self._cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 998, in create_object self.call_with_injection(init, self_=instance, kwargs=additional_kwargs) File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\injector\__init__.py", line 1040, in call_with_injection return callable(*full_args, **dependencies) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\components\llm\llm_component.py", line 38, in __init__ self.llm = LlamaCPP( ^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\llms\llama_cpp.py", line 104, in __init__ raise ImportError( ImportError: Could not import llama_cpp library.Please install llama_cpp with `pip install llama-cpp-python`.See the full installation guide for GPU support at `https://github.com/abetlen/llama-cpp-python`
エラーになった。またこの環境ではポート 8001 が他で使用中だったので、Makefile、settings-sagemaker.yaml、settings.yaml の 8001 を 8002 に書き換えた。
エラー対策で続きを入れてみる。
(base) E:\Projects\privateGPT>poetry install --with local Installing dependencies from lock file Package operations: 7 installs, 0 updates, 0 removals - Installing scipy (1.11.4) - Installing threadpoolctl (3.2.0) - Installing diskcache (5.6.3) - Installing scikit-learn (1.3.2) - Installing torchvision (0.16.2) - Installing llama-cpp-python (0.2.23) - Installing sentence-transformers (2.2.2) Installing the current project: private-gpt (0.2.0) (base) E:\Projects\privateGPT>poetry run python scripts/setup
もう一度実行してみる。
(base) E:\Projects\privateGPT>poetry run python -m private_gpt
ファイルのダウンロードがされて、これで http://localhost:8002 を開いたら UI にアクセスできた。
ただ何か送ると「AttributeError: 'NoneType' object has no attribute 'split'」というエラーになる。
17:04:53.533 [INFO ] uvicorn.access - 127.0.0.1:52251 - "GET /queue/data?session_hash=nxrm6zcbks HTTP/1.1" 200 Traceback (most recent call last): File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\queueing.py", line 495, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\route_utils.py", line 231, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\blocks.py", line 1594, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\blocks.py", line 1188, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 513, in async_iteration return await iterator.__anext__() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 639, in asyncgen_wrapper response = await iterator.__anext__() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\chat_interface.py", line 487, in _stream_fn first_response = await async_iteration(generator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 513, in async_iteration return await iterator.__anext__() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 506, in __anext__ return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\gradio\utils.py", line 489, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\ui\ui.py", line 127, in _chat all_messages = [*build_history(), new_message] ^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\ui\ui.py", line 109, in build_history *[ ^ File "E:\Projects\privateGPT\private_gpt\ui\ui.py", line 114, in <listcomp> content=interaction[1].split(SOURCES_SEPARATOR)[0], ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'split' 17:04:53.589 [INFO ] uvicorn.access - 127.0.0.1:52251 - "POST /run/predict HTTP/1.1" 200
一旦 PrivateGPT を再起動し、Mode に LLM Chat を選択して入れ直したら返事をしてくれた。「hello」と入れたら
Hello! How may I assist you today? I'm here to help answer any questions you have to the best of my ability. Please keep in mind that I cannot provide speculative or made-up information, and must always follow instructions given to me. Let me know if you have a specific question or topic you'd like me to help with.
「こんにちは」と入れたら
Hello there! I see you've greeted me in Japanese. That's great! If you have any questions or need assistance with something, feel free to ask. I'll do my best to help you in a respectful and honest manner. Let me know if you need anything specific. Is there a particular topic or question you have in mind?
日本語ということはわかるらしい。
試しに日本語の電子書籍の PDF を読み込ませて、ざっくり問い合わせてみた。英語で回答してくれたけれど、Chat with RTX 程には分析してはくれなかったらしい。
多分モデル?とか、日本語対応のいろいろを試してみる必要がありそう。要調査。
で……これはどこかのフォルダを指定してあらかじめ学習、という形式ではないらしい。アップロードフォームがある。それは面倒。
Vector Stores として他のプログラムを使っているらしい。これに大量に流し込めればいいんだろうか。ただちょっと見た感じではファイルの流し込み方がわからなかった。
あとは応答がとても遅いけれど、CPU (Core i5 13500K) で処理しているので、GPU を入れれば改善するはず。
探してみたら、フォルダまるごと登録というものがあるらしい。 LINK
(base) E:\Projects\privateGPT>poetry run python scripts\ingest_folder.py D:\Books --watch --log-file ingestLog.txt Traceback (most recent call last): File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 102, in <module> worker.ingest_folder(root_path, args.ignored) File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 38, in ingest_folder self._ingest_all(self._files_under_root_folder) File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 42, in _ingest_all self.ingest_service.bulk_ingest([(str(p.name), p) for p in files_to_ingest]) File "E:\Projects\privateGPT\private_gpt\server\ingest\ingest_service.py", line 92, in bulk_ingest documents = self.ingest_component.bulk_ingest(files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_component.py", line 127, in bulk_ingest documents = IngestionHelper.transform_file_into_documents( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 30, in transform_file_into_documents documents = IngestionHelper._load_file_to_documents(file_name, file_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 51, in _load_file_to_documents return reader_cls().load_data(file_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\readers\file\docs_reader.py", line 30, in load_data pdf = pypdf.PdfReader(fp) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_reader.py", line 352, in __init__ self._encryption.verify(pwd) == PasswordType.NOT_DECRYPTED ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_encryption.py", line 953, in verify key, rc = self.verify_v4(pwd) if self.V <= 4 else self.verify_v5(pwd) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_encryption.py", line 990, in verify_v5 key = AlgV5.verify_owner_password( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_encryption.py", line 532, in verify_owner_password AlgV5.calculate_hash(R, password, o_value[32:40], u_value[:48]) File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_encryption.py", line 577, in calculate_hash e = aes_cbc_encrypt(k[:16], k[16:32], k1 * 64) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxxxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\pypdf\_crypt_providers\_fallback.py", line 89, in aes_cbc_encrypt raise DependencyError(_DEPENDENCY_ERROR_STR) pypdf.errors.DependencyError: cryptography>=3.1 is required for AES algorithm
エラーになった。cryptography だと Requirement already satisfied になる。LINK によると pycryptodome を入れるらしい?
(base) E:\Projects\privateGPT>pip install pycryptodome
同じエラーで止まった。
(base) E:\Projects\privateGPT>poetry add cryptography
というかよく考えたら日本語をちゃんと認識したし PDF の中身も英訳して答えていたから、どこかに出力言語の設定がある気がする。
あとは Chat with RTX で使ったモデル?と使われているベクトル化?のを合わせてあげると同じような精度の回答が得られるのでは。
2023/Feb/29 追記
PrivateGPT は出力言語設定ではなくモデルで変えるらしい。
そういえば Chat with RTX も日本語化けてた。日本語対応モデルを使ってもまだ化けるらしいので、ベクトル化?embeddings で日本語対応でないとだめかな?
そして一晩放っておいたらまたエラー。
Traceback (most recent call last): File "C:\Users\takagiwa\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\readers\file\epub_reader.py", line 21, in load_data import ebooklib ModuleNotFoundError: No module named 'ebooklib' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 102, in <module> worker.ingest_folder(root_path, args.ignored) File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 38, in ingest_folder self._ingest_all(self._files_under_root_folder) File "E:\Projects\privateGPT\scripts\ingest_folder.py", line 42, in _ingest_all self.ingest_service.bulk_ingest([(str(p.name), p) for p in files_to_ingest]) File "E:\Projects\privateGPT\private_gpt\server\ingest\ingest_service.py", line 92, in bulk_ingest documents = self.ingest_component.bulk_ingest(files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_component.py", line 127, in bulk_ingest documents = IngestionHelper.transform_file_into_documents( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 30, in transform_file_into_documents documents = IngestionHelper._load_file_to_documents(file_name, file_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Projects\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 51, in _load_file_to_documents return reader_cls().load_data(file_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\takagiwa\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt--sQCGbRe-py3.11\Lib\site-packages\llama_index\readers\file\epub_reader.py", line 25, in load_data raise ImportError( ImportError: Please install extra dependencies that are required for the EpubReader: `pip install EbookLib html2text`
同じように入れてみた。
(base) E:\Projects\privateGPT>poetry add EbookLib html2text
とはいえこのままでは1ファイルからの処理しかできないようなので、ひとまずここまで。違うシステムを探してみよう。