{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "toc_visible": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Large language models (LLMs) and retrieval augmented generation (RAG) for physics-specific queries using the OpenAI API" ], "metadata": { "id": "YWzxX9SZaBT0" } }, { "cell_type": "markdown", "source": [ "2025-01-15 - James Verbus" ], "metadata": { "id": "3O-U04PBaJ2C" } }, { "cell_type": "markdown", "source": [ "https://www.linkedin.com/in/jamesverbus/" ], "metadata": { "id": "TV6fB5zRxP_W" } }, { "cell_type": "markdown", "source": [ "# Prerequisites" ], "metadata": { "id": "YZZYMocmenT-" } }, { "cell_type": "markdown", "source": [ "1) You need to set up an Open AI account. https://platform.openai.com/\n", "\n", "\n", "2) Create an OpenAI API key. https://platform.openai.com/settings/organization/api-keys\n", "\n", "3) Add your API keys as secret variables in your notebook: \"OPENAI_API_KEY\" and \"OPENAI_ORGANIZATION\"\n", "\n", "4) The first half of the notebook can be completed using a free account. If you want to complete the second half, you need to fund your OpenAI account for API calls ($1 should be enough).\n" ], "metadata": { "id": "OXGf7KMUeut2" } }, { "cell_type": "markdown", "source": [ "# Setup environment" ], "metadata": { "id": "D5gFhwT-sUqG" } }, { "cell_type": "code", "source": [ "from IPython.display import HTML, display\n", "\n", "def set_css():\n", " display(HTML('''\n", " \n", " '''))\n", "get_ipython().events.register('pre_run_cell', set_css)" ], "metadata": { "id": "Z7UKoMC0zWL3" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Install packages" ], "metadata": { "id": "wWc2FaiSkaqw" } }, { "cell_type": "code", "source": [ "!pip install llama-index==0.12.3 openai==1.54.5 pypdf==5.1.0 httpx==0.27.2" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "H8-xXp02abzy", "outputId": "b4bdf0e0-e89f-4466-e2d4-e4820e028f1c" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Requirement already satisfied: llama-index==0.12.3 in /usr/local/lib/python3.10/dist-packages (0.12.3)\n", "Requirement already satisfied: openai==1.54.5 in /usr/local/lib/python3.10/dist-packages (1.54.5)\n", "Requirement already satisfied: pypdf==5.1.0 in /usr/local/lib/python3.10/dist-packages (5.1.0)\n", "Requirement already satisfied: httpx==0.27.2 in /usr/local/lib/python3.10/dist-packages (0.27.2)\n", "Requirement already satisfied: llama-index-agent-openai<0.5.0,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.4.1)\n", "Requirement already satisfied: llama-index-cli<0.5.0,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.4.0)\n", "Requirement already satisfied: llama-index-core<0.13.0,>=0.12.3 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.12.10.post1)\n", "Requirement already satisfied: llama-index-embeddings-openai<0.4.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.3.1)\n", "Requirement already satisfied: llama-index-indices-managed-llama-cloud>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.6.3)\n", "Requirement already satisfied: llama-index-legacy<0.10.0,>=0.9.48 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.9.48.post4)\n", "Requirement already satisfied: llama-index-llms-openai<0.4.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.3.3)\n", "Requirement already satisfied: llama-index-multi-modal-llms-openai<0.4.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.3.0)\n", "Requirement already satisfied: llama-index-program-openai<0.4.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.3.1)\n", "Requirement already satisfied: llama-index-question-gen-openai<0.4.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.3.0)\n", "Requirement already satisfied: llama-index-readers-file<0.5.0,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.4.3)\n", "Requirement already satisfied: llama-index-readers-llama-parse>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (0.4.0)\n", "Requirement already satisfied: nltk>3.8.1 in /usr/local/lib/python3.10/dist-packages (from llama-index==0.12.3) (3.9.1)\n", "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai==1.54.5) (3.7.1)\n", "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from openai==1.54.5) (1.9.0)\n", "Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from openai==1.54.5) (0.8.2)\n", "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from openai==1.54.5) (2.10.3)\n", "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai==1.54.5) (1.3.1)\n", "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai==1.54.5) (4.67.1)\n", "Requirement already satisfied: typing-extensions<5,>=4.11 in /usr/local/lib/python3.10/dist-packages (from openai==1.54.5) (4.12.2)\n", "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx==0.27.2) (2024.12.14)\n", "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx==0.27.2) (1.0.7)\n", "Requirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx==0.27.2) (3.10)\n", "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx==0.27.2) (0.14.0)\n", "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai==1.54.5) (1.2.2)\n", "Requirement already satisfied: PyYAML>=6.0.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (6.0.2)\n", "Requirement already satisfied: SQLAlchemy>=1.4.49 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (2.0.36)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (3.11.10)\n", "Requirement already satisfied: dataclasses-json in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (0.6.7)\n", "Requirement already satisfied: deprecated>=1.2.9.3 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.2.15)\n", "Requirement already satisfied: dirtyjson<2.0.0,>=1.0.8 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.0.8)\n", "Requirement already satisfied: filetype<2.0.0,>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.2.0)\n", "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (2024.10.0)\n", "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.6.0)\n", "Requirement already satisfied: networkx>=3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (3.4.2)\n", "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.26.4)\n", "Requirement already satisfied: pillow>=9.0.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (11.0.0)\n", "Requirement already satisfied: requests>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (2.32.3)\n", "Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.2.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (8.5.0)\n", "Requirement already satisfied: tiktoken>=0.3.3 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (0.8.0)\n", "Requirement already satisfied: typing-inspect>=0.8.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (0.9.0)\n", "Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.17.0)\n", "Requirement already satisfied: llama-cloud>=0.1.5 in /usr/local/lib/python3.10/dist-packages (from llama-index-indices-managed-llama-cloud>=0.4.0->llama-index==0.12.3) (0.1.8)\n", "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from llama-index-legacy<0.10.0,>=0.9.48->llama-index==0.12.3) (2.2.2)\n", "Requirement already satisfied: beautifulsoup4<5.0.0,>=4.12.3 in /usr/local/lib/python3.10/dist-packages (from llama-index-readers-file<0.5.0,>=0.4.0->llama-index==0.12.3) (4.12.3)\n", "Requirement already satisfied: striprtf<0.0.27,>=0.0.26 in /usr/local/lib/python3.10/dist-packages (from llama-index-readers-file<0.5.0,>=0.4.0->llama-index==0.12.3) (0.0.26)\n", "Requirement already satisfied: llama-parse>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-readers-llama-parse>=0.4.0->llama-index==0.12.3) (0.5.19)\n", "Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk>3.8.1->llama-index==0.12.3) (8.1.7)\n", "Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk>3.8.1->llama-index==0.12.3) (1.4.2)\n", "Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk>3.8.1->llama-index==0.12.3) (2024.11.6)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai==1.54.5) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.27.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai==1.54.5) (2.27.1)\n", "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (2.4.4)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.3.2)\n", "Requirement already satisfied: async-timeout<6.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (4.0.3)\n", "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (24.3.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.5.0)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (6.1.0)\n", "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (0.2.1)\n", "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.18.3)\n", "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4<5.0.0,>=4.12.3->llama-index-readers-file<0.5.0,>=0.4.0->llama-index==0.12.3) (2.6)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (3.4.0)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (2.2.3)\n", "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy>=1.4.49->SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (3.1.1)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from typing-inspect>=0.8.0->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (1.0.0)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (3.25.1)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-legacy<0.10.0,>=0.9.48->llama-index==0.12.3) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-legacy<0.10.0,>=0.9.48->llama-index==0.12.3) (2024.2)\n", "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-legacy<0.10.0,>=0.9.48->llama-index==0.12.3) (2024.2)\n", "Requirement already satisfied: packaging>=17.0 in /usr/local/lib/python3.10/dist-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->llama-index-core<0.13.0,>=0.12.3->llama-index==0.12.3) (24.2)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->llama-index-legacy<0.10.0,>=0.9.48->llama-index==0.12.3) (1.17.0)\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Imports" ], "metadata": { "id": "qhW3YkZqxO_c" } }, { "cell_type": "code", "source": [ "import openai, os\n", "\n", "from google.colab import drive, userdata\n", "from llama_index.core import ServiceContext, Settings, SimpleDirectoryReader, VectorStoreIndex\n", "from llama_index.core.node_parser import SimpleNodeParser\n", "from llama_index.llms.openai import OpenAI" ], "metadata": { "id": "M791VaFJaa9E", "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "outputId": "5980a067-c9b2-4b46-ae28-597a2f8fe827" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## Setup Open AI keys" ], "metadata": { "id": "V0afRtKfsGfW" } }, { "cell_type": "markdown", "source": [ "You need to set up an Open AI account, fund it for API calls, and then add your API keys as environmental variables in your notebook.\n", "\n", "---\n", "\n" ], "metadata": { "id": "feHvwLJBdiZQ" } }, { "cell_type": "code", "source": [ "openai.organization = userdata.get(\"OPENAI_ORGANIZATION\")\n", "openai.api_key = userdata.get(\"OPENAI_API_KEY\")" ], "metadata": { "id": "KlpX05kaa2Hv", "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "outputId": "cfb8bd31-b42a-4767-e590-a273e8c97ea2" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "# Define queries" ], "metadata": { "id": "WlCTWpKExdM1" } }, { "cell_type": "code", "source": [ "questions = {\n", " 1: \"In the LUX D-D analysis, what neutron source rate provided optimal match between the absolute number of single-scatter events in simulation and data\",\n", " 2: \"In the first results from the LUX detector, what was the average electric field used when measuring the charge and light yields?\",\n", " 3: \"What is the mean energy of neutrons produced by a DD108 fusion source?\",\n", " 4: \"What is the energy at the endpoint of the D-D neutron recoil energy spectrum in liquid xenon? What was the size of the S1 and S2 signals observed in LUX at this endpoint\",\n", " 5: \"How low in energy was the ER response measured using 127Xe? Where did the 127Xe come from?\"\n", " # Add your questions 6, 7, ... here\n", "}" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "id": "E7lDuXOTxhro", "outputId": "85e21017-4e36-4576-ab19-655b3b593e74" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "# Query OpenAI gpt-4o-mini directly (without RAG)\n", "\n" ], "metadata": { "id": "YgES5gVJO_bw" } }, { "cell_type": "markdown", "source": [ "Available OpenAI models and limits are listed here: https://platform.openai.com/settings/organization/limits" ], "metadata": { "id": "LRTiA_BOxh51" } }, { "cell_type": "code", "source": [ "client = openai.OpenAI(\n", " api_key=userdata.get(\"OPENAI_API_KEY\"),\n", ")\n", "\n", "def query_oai(question):\n", " chat_completion = client.chat.completions.create(\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": question\n", " }\n", " ],\n", " model=\"gpt-4o-mini\",\n", " )\n", "\n", " return chat_completion" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "id": "07iLj_kui_fY", "outputId": "26d036c1-0786-4266-8e22-b55f26833fa0" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## Question 1" ], "metadata": { "id": "1RjXfaQDi3yl" } }, { "cell_type": "code", "source": [ "question = questions[1]\n", "response = query_oai(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.choices[0].message.content)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 104 }, "id": "4eJrHWlShqa1", "outputId": "2fefcefb-3d0f-4aa6-8a4f-645e88500ba9" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: In the LUX D-D analysis, what neutron source rate provided optimal match between the absolute number of single-scatter events in simulation and data\n", "\n", "Response: In the LUX (Large Underground Xenon) dark matter experiment, the optimal neutron source rate that provided an effective match between the absolute number of single-scatter events in the simulated data and the actual observational data was identified as **10 neutrons per day**. This value was crucial for aligning the simulation results with the measured data, thereby enhancing the overall accuracy and reliability of the experiment’s findings regarding dark matter interactions.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Question 2" ], "metadata": { "id": "dPav8F7Yjd06" } }, { "cell_type": "code", "source": [ "question = questions[2]\n", "response = query_oai(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.choices[0].message.content)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 86 }, "id": "FdUJ8BMojeta", "outputId": "1bc8db93-403f-43a8-cbff-0000351d528c" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: In the first results from the LUX detector, what was the average electric field used when measuring the charge and light yields?\n", "\n", "Response: In the first results from the LUX (Large Underground Xenon) detector, the average electric field used when measuring the charge and light yields was approximately 0.25 kV/cm. This value was employed in the experiments to optimize the detection of light and charge produced by potential dark matter interactions within the detector's xenon target.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Question 3" ], "metadata": { "id": "7fUNxZ-vhIaC" } }, { "cell_type": "code", "source": [ "question = questions[3]\n", "response = query_oai(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.choices[0].message.content)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 225 }, "id": "rMyZuzshhExd", "outputId": "80751741-0b2c-41dd-eae8-fc57af14cd73" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: What is the mean energy of neutrons produced by a DD108 fusion source?\n", "\n", "Response: DD108 refers to a specific neutron source that utilizes deuterium-deuterium (D-D) fusion reactions. In general, D-D fusion can produce different types of reactions, but the primary reactions relevant to neutron production are:\n", "\n", "1. \\( D + D \\rightarrow T + p \\) (produces a proton)\n", "2. \\( D + D \\rightarrow He + n \\) (produces a neutron)\n", "\n", "The mean energy of the neutrons produced by these fusion reactions typically ranges in the order of several MeV (Mega electron Volts). In the case of the D-D reaction producing a neutron, the neutron energy is commonly around \\( 2.5 \\) MeV.\n", "\n", "For the DD108 fusion source, it's important to look at specific manufacturer data or research publications for precise figures, but generally, the output neutron energy is often around 2.5 to 2.45 MeV.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Question 4" ], "metadata": { "id": "EI8Y_ny_hRcG" } }, { "cell_type": "code", "source": [ "question = questions[4]\n", "response = query_oai(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.choices[0].message.content)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 312 }, "id": "A-jpelMLhLIc", "outputId": "35ff10b0-872d-495d-d062-819067718b40" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: What is the energy at the endpoint of the D-D neutron recoil energy spectrum in liquid xenon? What was the size of the S1 and S2 signals observed in LUX at this endpoint\n", "\n", "Response: In the context of dark matter searches and the detection of nuclear recoils, energy spectra can provide important information about various processes. For deuterium-deuterium (D-D) fusion reactions, neutron production can result in nuclear recoils, and in liquid xenon, the energy of these recoils can be directly correlated to the light and charge signals detected in experiments like LUX (Large Underground Xenon).\n", "\n", "The recoil energy spectrum for neutron recoils in liquid xenon typically has a maximum energy, which corresponds to the endpoint of the spectrum. This maximum recoil energy is roughly in the range of a few MeV per neutron, often around 2-3 MeV depending on the kinematics of the particular reaction.\n", "\n", "For the LUX experiment, the S1 and S2 signals directly correspond to the scintillation light (S1) and the ionization charge (S2) produced by recoiling nuclei in the liquid xenon. At the endpoint of the D-D neutron recoil energy spectrum, the observed S1 and S2 signals can be related to the energy deposition from the recoiling neutrons.\n", "\n", "While specific numbers for the S1 and S2 signal sizes at the endpoint can vary between experiments and conditions, in LUX, the S1 signals are typically in the range of a few hundred photoelectrons for energies around 1 MeV–3 MeV of nuclear recoil energy. The S2 signal, on the other hand, is usually much larger because it is proportional to the number of ionization electrons produced and subsequently detected.\n", "\n", "For the exact values of the S1 and S2 signals observed at the endpoint in LUX and other experimental details, it would be best to refer to the specific analysis or publications released from the LUX collaboration or related experimental sources, as they contain the most accurate data and detailed measurements. If you are looking for very specific numerical values, please check the latest publications from LUX or other relevant literature.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Question 5" ], "metadata": { "id": "qP3PCsYLcMTk" } }, { "cell_type": "code", "source": [ "question = questions[5]\n", "response = query_oai(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.choices[0].message.content)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 208 }, "id": "jKM-p--DzvWI", "outputId": "ccec6664-1ee8-49fc-c56b-5e78988a51d0" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: How low in energy was the ER response measured using 127Xe? Where did the 127Xe come from?\n", "\n", "Response: The energy levels for the ER (excitation-resolved) response measured using \\(^{127}\\text{Xe}\\) typically refer to low-energy nuclear reactions or resonance states relevant to experimental nuclear physics studies. The specific low-energy threshold of the measured ER response would depend on the details of the experiment, including the method used and the reactions being investigated. Typically, these measurements might focus on energies below a few MeV, but for precise values, you would need to refer to specific experimental papers or results that detail the findings.\n", "\n", "As for the origin of \\(^{127}\\text{Xe}\\), this isotopes is stable and can be found naturally in trace amounts in the Earth's atmosphere and in some minerals. \\(^{127}\\text{Xe}\\) can also be produced artificially in nuclear reactors or accelerators, where xenon isotopes are created through neutron capture processes. The xenon gas might also be purified and separated from other isotopes through methods such as fractional distillation or gaseous diffusion for specific experimental use. \n", "\n", "If you need more specific information about the ER response or the origin of \\(^{127}\\text{Xe}\\) in a particular context, please provide additional details.\n" ] } ] }, { "cell_type": "markdown", "source": [ "# Setup basic RAG system with Brown Particle Astrophysics Group Thesis in a Google drive" ], "metadata": { "id": "FYBe512NyW1h" } }, { "cell_type": "markdown", "source": [ "NOTE: You will need to have funded your OpenAI account to proceed past this point. You need a Tier 1 account to execute the OpenAI API calls." ], "metadata": { "id": "D9vxv59Flw0V" } }, { "cell_type": "markdown", "source": [ "## Fetch theses and add them to Google Drive" ], "metadata": { "id": "M92IwH1uym53" } }, { "cell_type": "markdown", "source": [ "Brown Particle Astrophysics Group Theses\n", "\n", "Copied from: https://particleastro.brown.edu/graduate-theses/" ], "metadata": { "id": "KUkJC4oZkqMZ" } }, { "cell_type": "code", "source": [ "drive.mount('/content/drive')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "id": "ElYLjY1Kkn_J", "outputId": "4c25f951-62b6-4b3e-c1ba-90d10c78c63a" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Mounted at /content/drive\n" ] } ] }, { "cell_type": "code", "source": [ "top_level_path = '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/'\n", "module_path = 'Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)'" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "id": "V2ZGZnmoHGfO", "outputId": "57399234-20a9-45b9-b799-e5b5a3975570" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "code", "source": [ "os.chdir(top_level_path + '/' + module_path + '/' + 'documents/lux_dd_papers')" ], "metadata": { "id": "Pd9qfs2rl9VU", "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "outputId": "9ebdbcdc-3c40-4ec3-dc3e-ce88f60d0463" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "code", "source": [ "!ls" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "id": "-JpQSOVVl_32", "outputId": "96f108c1-83df-47bb-b08d-7a485bdea7a3" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "1608.05381v2.pdf 1-s2.0-S0168900217301158-am.pdf\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Setup Llama index" ], "metadata": { "id": "cJnCMk-9yxb7" } }, { "cell_type": "code", "source": [ "# Set basic Llama index parameters\n", "llama_index_data_path = top_level_path + '/' + module_path + '/' + '/documents/lux_dd_papers'\n", "Settings.chunk_size = 1000\n", "Settings.chunk_overlap = 100\n", "Settings.llm = OpenAI(model=\"gpt-4o-mini\")" ], "metadata": { "id": "NYeOd1NGmCOa", "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "outputId": "aedf4eaa-92d7-4169-d687-1b842ee57431" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "This next step takes a few minutes. Let's review what is happening while the code runs." ], "metadata": { "id": "lzGf7Bi5Rx_E" } }, { "cell_type": "code", "source": [ "# Read mounted Google Drive with Llama index\n", "documents = SimpleDirectoryReader(llama_index_data_path, recursive = True).load_data()\n", "\n", "# Create index from documents\n", "index = VectorStoreIndex.from_documents(documents)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "collapsed": true, "id": "dvmUTf9PmjT0", "outputId": "e2a12c5a-43d5-4a5e-97c9-38fd7c0dda06" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "# Query OpenAI gpt-4o-mini with RAG" ], "metadata": { "id": "08N-6lqGzxb3" } }, { "cell_type": "code", "source": [ "def query_rag(question):\n", " query_engine = index.as_query_engine(similarity_top_k=5)\n", " response = query_engine.query(question)\n", "\n", " return response" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "id": "PMCyrZuh1hCE", "outputId": "6735589d-89e7-447c-dcdb-74005bb4bbc5" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## Question 1" ], "metadata": { "id": "X6RTBrgciW8A" } }, { "cell_type": "code", "source": [ "question = questions[1]\n", "response = query_rag(question)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "id": "6Cv_rSvcunNm", "outputId": "1f2205ad-7f6b-4f3f-a1e6-fd4b407ef677" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "code", "source": [ "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.response)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 86 }, "id": "OnoXI9ohic4Y", "outputId": "84be67b8-91b5-4928-cf42-f2ee3c196ac7" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: In the LUX D-D analysis, what neutron source rate provided optimal match between the absolute number of single-scatter events in simulation and data\n", "\n", "Response: The neutron source rate of 2.6×10^6 n/s was used for data normalization in the LUX D-D analysis, which is in agreement with the independently measured source rate of (2.5 ± 0.3) × 10^6 n/s. This agreement confirms the consistency between the data and simulation in both absolute rate and shape.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Question 2" ], "metadata": { "id": "-_bDopUCiZ8j" } }, { "cell_type": "code", "source": [ "question = questions[2]\n", "\n", "response = query_rag(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.response)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 69 }, "id": "wlFteCEfwrUm", "outputId": "31d0fda9-9064-4acf-fe1e-45b9557b7e09" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: In the first results from the LUX detector, what was the average electric field used when measuring the charge and light yields?\n", "\n", "Response: The average electric field used when measuring the charge and light yields in the first results from the LUX detector was 180 V/cm.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Question 3" ], "metadata": { "id": "IsNmK4xqbqAW" } }, { "cell_type": "code", "source": [ "question = questions[3]\n", "\n", "response = query_rag(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.response)" ], "metadata": { "id": "-UijfNqJkrDy", "colab": { "base_uri": "https://localhost:8080/", "height": 69 }, "outputId": "fbaf01e1-93c6-4e81-c5ec-9597b53e5b37" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: What is the mean energy of neutrons produced by a DD108 fusion source?\n", "\n", "Response: The mean energy of neutrons produced by a DD108 fusion source is approximately 2.45 MeV.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Question 4" ], "metadata": { "id": "tWI-Jp6FrQLX" } }, { "cell_type": "code", "source": [ "question = questions[4]\n", "\n", "response = query_rag(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.response)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 86 }, "id": "Rf1e0bfrbtFm", "outputId": "26aa242d-62d2-4a38-e6dc-e7e88018f102" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: What is the energy at the endpoint of the D-D neutron recoil energy spectrum in liquid xenon? What was the size of the S1 and S2 signals observed in LUX at this endpoint\n", "\n", "Response: The energy at the endpoint of the D-D neutron recoil energy spectrum in liquid xenon is 74 keV nr. At this endpoint, the mean S1 signal observed was 2500 phd, and a raw S2 analysis threshold of 164 phd was applied.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Question 5" ], "metadata": { "id": "gnZ9rZMc0A3Y" } }, { "cell_type": "code", "source": [ "question = questions[5]\n", "\n", "response = query_rag(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.response)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 86 }, "id": "uWBIsnbo0C2e", "outputId": "f79ecddc-b18e-42ee-b977-7b68f21b1b5d" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: How low in energy was the ER response measured using 127Xe? Where did the 127Xe come from?\n", "\n", "Response: The electron recoil (ER) response was measured using 131mXe, which is a metastable state of xenon resulting from cosmogenic activation. The measurements were conducted at energies close to 122 keVee. The specific low-energy threshold for the ER response using 127Xe is not detailed in the provided information.\n" ] } ] }, { "cell_type": "code", "source": [ "response.metadata" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 659 }, "id": "sGvs6MdB5Seq", "outputId": "cc8daa91-ccdf-4afb-f9c5-942280f3abfa" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "execute_result", "data": { "text/plain": [ "{'9e523246-bf4e-42b2-9f58-9b09af5ec736': {'page_label': '7',\n", " 'file_name': '1-s2.0-S0168900217301158-am.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1-s2.0-S0168900217301158-am.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 11051497,\n", " 'creation_date': '2025-01-13',\n", " 'last_modified_date': '2025-01-13'},\n", " 'f9ce9669-2ac8-4f15-92b5-a02ec09a6501': {'page_label': '21',\n", " 'file_name': '1608.05381v2.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 4399092,\n", " 'creation_date': '2025-01-13',\n", " 'last_modified_date': '2025-01-13'},\n", " '42451e0e-638d-4c21-979c-c3c28ea71eae': {'page_label': '3',\n", " 'file_name': '1-s2.0-S0168900217301158-am.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1-s2.0-S0168900217301158-am.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 11051497,\n", " 'creation_date': '2025-01-13',\n", " 'last_modified_date': '2025-01-13'},\n", " 'e433fc20-e6e3-4084-8d2c-62f018a3cb07': {'page_label': '6',\n", " 'file_name': '1608.05381v2.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 4399092,\n", " 'creation_date': '2025-01-13',\n", " 'last_modified_date': '2025-01-13'},\n", " '97a91fbb-b7e7-47f4-8cca-54c65c3f65e3': {'page_label': '15',\n", " 'file_name': '1608.05381v2.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 4399092,\n", " 'creation_date': '2025-01-13',\n", " 'last_modified_date': '2025-01-13'}}" ] }, "metadata": {}, "execution_count": 26 } ] }, { "cell_type": "code", "source": [ "response.source_nodes" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "oP0SeqvZ5Ts8", "outputId": "249b4ace-6e26-44e6-c9e3-3220cbf9050c" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "execute_result", "data": { "text/plain": [ "[NodeWithScore(node=TextNode(id_='9e523246-bf4e-42b2-9f58-9b09af5ec736', embedding=None, metadata={'page_label': '7', 'file_name': '1-s2.0-S0168900217301158-am.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1-s2.0-S0168900217301158-am.pdf', 'file_type': 'application/pdf', 'file_size': 11051497, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='2480a473-d79f-4a9c-848c-c0933595225e', node_type='4', metadata={'page_label': '7', 'file_name': '1-s2.0-S0168900217301158-am.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1-s2.0-S0168900217301158-am.pdf', 'file_type': 'application/pdf', 'file_size': 11051497, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, hash='1cf118df31232f37621df6eb03c97a23984baff943bacd6157aced3a2625bb99'), : RelatedNodeInfo(node_id='c2c6daea-0b50-4a91-907f-cf5ca4b58a1e', node_type='1', metadata={}, hash='2f00f0cf6e1e154da8095626e44d619dc0cb3189d5f0e326b73c061249b03272')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='Table 2: The time-of-flight (ToF) dependence upon neutron\\nenergy. The corresponding nuclear recoil spectrum endpoint\\nenergy in argon and xenon is given in columns three and\\nfour, respectively.\\nEn [keV] ToF [ns/m] Maximum Recoil [keV nr]\\nAr Xe\\n1 2286 0 .1 0 .03\\n10 723 1 0 .3\\n100 229 10 3\\n272 139 26 8\\n1000 72 96 30\\n2450 46 235 74\\nthe expected coherent elastic neutrino-nucleus\\nscattering (CENNS) signal in upcoming large\\nliquid noble dark matter detectors [43, 44].\\nVariations in reflector material, geometry and\\npositioning can all change the peak energy and\\nenergy distribution, so discussion on the potential\\nadvantages of the technique will consider the most\\nideal case (272 keV peak energy) unless specified\\notherwise.\\n3.2.1. A monoenergetic 272 keV neutron source\\nA beam of quasi-monoenergetic 272 keV neutrons\\ncan be obtained by positioning a deuterium-loaded\\nmaterial (the “reflector”) behind the D-D neutron\\ngenerator, directly in line with the neutron colli-\\nmation conduit leading to the TPC (see Fig. 2).\\nThe limited solid angle presented by the neutron\\nconduit is used to collect neutrons that scatter\\nin the deuterium-loaded reflector with a scattering\\nangle of ∼180◦. Deuterium is an optimal reflector\\nmaterial; its low atomic mass provides the most\\nsignificant reduction in neutron energy possible for\\n∼180◦ elastic scatters [45]—larger energy reduc-\\ntions from non-backwards neutron scatters on 1H\\nare discussed below. These reflected neutrons have\\na minimum kinetic energy of 272 keV. In addition,\\na double-scatter (both scatters must be neutron-\\ndeuteron) elastic scattering event with a summed\\nscattering angle of 180 ◦ within the deuterium-\\nloaded reflector also provides an outgoing 272 keV\\nneutron.\\nAlthough neutron-hydrogen scattering can result\\nin neutron energies below 272 keV, all neutron\\nscatters when using hydrogen are in the forward\\ndirection with a scattering angle of 0–90 ◦ in the\\nlab frame. With a hydrogen reflector, small vari-\\nations in the neutron scattering angle produce\\nlarge fluctuations in reflected neutron energy. In\\ncontrast, using direct backscatters provided by\\ndeuterium’s significant differential scattering cross-\\nsection at 180 ◦ suppresses the effects of variations\\nin scattering angle, and provides a better defined\\nquasi-monoenergetic neutron beam. Deuterium has\\nthe largest cross-section for 180 ◦ scatters of all\\npotential reflector materials.\\nThe ×9 reduction in the neutron beam energy\\nprovided by the deuterium reflector has several\\nadvantages for low-energy nuclear recoil calibration.\\nThe use of 272 keV neutrons provides a reduction\\nin the uncertainty associated with kinematic energy\\nreconstruction for low-energy events. A 1 keV nr\\nnuclear recoil produced by a 2.45 MeV neutron in\\nliquid xenon corresponds to a neutron scattering\\nangle of 13 ◦, which is a 4.6 cm deflection over a\\nlength of 20 cm. By comparison, a 1 keV nr nuclear\\nrecoil produced by a 272 keV neutron in liquid\\nxenon has a scattering angle of 41 ◦, which is a\\n14 cm deflection over the same vertex separation.\\nIn large liquid xenon TPCs, the typical uncertainty\\nassociated with ( x, y) position reconstruction of\\neach vertex in events of this nuclear recoil is 1–\\n3 cm [37].', mimetype='text/plain', start_char_idx=0, end_char_idx=3189, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8159171282856174),\n", " NodeWithScore(node=TextNode(id_='f9ce9669-2ac8-4f15-92b5-a02ec09a6501', embedding=None, metadata={'page_label': '21', 'file_name': '1608.05381v2.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf', 'file_type': 'application/pdf', 'file_size': 4399092, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='22769803-d5a8-4c2a-a4f2-0da074d51512', node_type='4', metadata={'page_label': '21', 'file_name': '1608.05381v2.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf', 'file_type': 'application/pdf', 'file_size': 4399092, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, hash='a010c046cd1c5c18aa3e36559fc4e47f1f9e9fea68c029adf4c1c0058721e982'), : RelatedNodeInfo(node_id='1c3e946e-064b-4fa8-a3dd-0f82eebdc711', node_type='1', metadata={}, hash='26e15fde96e0669ce4fd0b4e6a3da783deaa3e17f7dc6d0f53b575a484b0821c')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='21\\nBelow we discuss the implications for the physics of liquid\\nxenon response at low energies.\\nIn contrast to electronic recoils, recoiling nuclei lose a\\nfraction of their energy to nuclear collisions, dissipating\\nenergy as heat rather than in processes leading to a\\ndetectable electronic signal. Reconstruction of nuclear\\nrecoil event energy, therefore, requires an understanding\\nof these processes as a function of recoil energy. The\\nformula for energy reconstruction can be written as\\nEnr = W(Ne + Nph)\\nL , (14)\\nwhere Lis the fraction of energy that goes into detectable\\nelectronic channels [44]. Here, W = 13 .7 eV is the\\naverage energy needed to create an exciton or electron-ion\\npair, Ne is the absolute number of ionization electrons,\\nand Nph is the absolute number of scintillation photons.\\nBoth Ne and Nph represent the number of signal carriers\\nafter recombination but before biexcitonic quenching\\neffects, in contrast to ne and np defined earlier in Secs. III\\nand IV, which are the measured number of signal carriers\\nthat escape the interaction site. A detailed description of\\nthe recombination and biexcitonic quenching components\\nof the model is reported in Ref. [6].\\nThe factor L is traditionally given by the Lindhard\\nmodel [21, 44]. It is described by the formula\\nL= kg(ϵ)\\n1 + kg(ϵ) . (15)\\nThe parameter k is a proportionality constant between\\nthe electronic stopping power and the velocity of the\\nrecoiling nucleus. The quantity g(ϵ) is proportional to\\nthe ratio of electronic stopping power to nuclear stopping\\npower, calculated using the Thomas-Fermi screening\\nfunction. It is a function of the energy deposited,\\nconverted to the dimensionless quantity ϵ using\\nϵ= 11.5(Enr/keVnr)Z−7/3 . (16)\\nIn these terms, g(ϵ) is given in Ref. [45] by\\ng(ϵ) = 3ϵ0.15 + 0.7ϵ0.6 + ϵ. (17)\\nA commonly accepted value for the proportionality\\nconstant is k = 0 .166, but this may range from 0.1\\nto 0.2 [44]. We utilize the Lindhard model in our\\nnuclear recoil response model, allowing k to float in the\\nfit to these data. The best-fit value from the global\\noptimization is k= 0.1735 ±0.0060.\\nIn addition to Lindhard’s model, we explored an\\nalternative model proposed in Ref. [22] with a larger\\nionization and scintillation yield at recoil energies below\\n2 keVnr. To do so, we begin with the generic form of L\\nin Eq. 14:\\nL= α se\\nse + sn\\n. (18)\\nHere, se and sn are the electronic and nuclear stopping\\npowers, respectively, and α is a scaling parameter used\\nto model the cascade of collisions in a nuclear recoil\\nevent (best-fit is α = 2 .31 in the global optimization).\\nThe ratio se/sn is analogous to g in Eq. 15. While the\\nLindhard model uses the Thomas-Fermi approximation\\nto calculate sn, we replace this with the empirical form\\nfrom Ziegler et al. [46]:\\nsn(ϵZ) = ln(1 + 1.1383 ϵZ)\\n2(ϵZ + 0.01321 ϵ0.21226\\nZ + 0.19593 ϵ0.5\\nZ ) , (19)\\nwhere ϵZ = 1 .068ϵ. The slight difference in energy\\nscales is due to different assumed screening lengths in\\nthe calculation of the dimensionless energy.\\nTo directly compare to data, we sum the measured\\nlight and charge to get a measured total quanta,\\nnq = ne + np.', mimetype='text/plain', start_char_idx=0, end_char_idx=3090, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8145504709190874),\n", " NodeWithScore(node=TextNode(id_='42451e0e-638d-4c21-979c-c3c28ea71eae', embedding=None, metadata={'page_label': '3', 'file_name': '1-s2.0-S0168900217301158-am.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1-s2.0-S0168900217301158-am.pdf', 'file_type': 'application/pdf', 'file_size': 11051497, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='07fe6669-ba17-48b5-a1c0-d111f628635d', node_type='4', metadata={'page_label': '3', 'file_name': '1-s2.0-S0168900217301158-am.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1-s2.0-S0168900217301158-am.pdf', 'file_type': 'application/pdf', 'file_size': 11051497, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, hash='148469aaf2be878cf812109d8105bb5d713e956d4e0b737911d7da3bcb31e2e0'), : RelatedNodeInfo(node_id='5476d6bd-8658-4347-8396-c88527afc4d0', node_type='1', metadata={}, hash='2eecb944e13480a7b127846400c0f5c8cecf66ac4ccc455eef636d5d42760ab6')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='and then subsequently complete the journey to the\\nfar detector. These neutrons lose an undetermined\\namount of energy during their scatters in passive\\nmaterial and have a poorly defined scattering angle\\nin the liquid noble test chamber. These effects make\\ninference of the deposited nuclear recoil energy in\\nthe target medium difficult. Neutrons that scatter\\nin passive materials during their journey between\\nthe liquid xenon cell and the far detector provide a\\nsimilar source of background events. Second, it is\\ndifficult to differentiate events consisting of multiple\\nelastic scatters in the liquid noble target during\\nsingle-phase operation as is typically used forex situ\\nLy studies. These multiple elastic scatter events\\nwill have a systematic increase in the observed\\nscintillation signal and a measured scattering angle\\nthat is no longer directly related to the path taken\\nthrough the liquid noble target. Finally, due to the\\nphysical size of the detectors, there is a systematic\\nuncertainty associated with the range of allowed\\nscattering angles. It is possible to attempt to\\naccommodate these effects on average and estimate\\nthe associated systematic uncertainties using a\\nneutron transport Monte Carlo simulation with\\na model of the experimental setup, but a more\\ndirect calibration technique can eliminate these\\nsystematic uncertainties entirely.\\nWe present a new scattering-angle-based tech-\\nnique for an in situ , absolute nuclear recoil cali-\\nbration in modern, large, liquid-noble-based TPCs\\nused for rare event searches [21–24]. In this\\ntechnique, neutrons of known energy and direction\\nare fired into a large liquid noble TPC [25]. The\\ndetector’s position reconstruction capabilities pro-\\nvide the ( x, y, z) coordinates of each interaction in\\nmultiple-scatter events. The calculated scattering\\nangle provides a direct measurement of the recoil\\nenergy at each scattering vertex according to Eq. 1.\\nAn ideal neutron source for this type of measure-\\nment should have the following characteristics:\\n•The neutron source should be compact and\\nportable to allow deployment in deep under-\\nground laboratory space.\\n•In order to precisely define En, the neutron\\nsource must produce a monoenergetic energy\\nspectrum, ideally with a width ( σ/µ) subdom-\\ninant to other systematic effects contributing\\nto spectrum broadening described in Sec. 2.\\n•To calibrate noble gas detectors in the nuclear\\nrecoil energy region of interest, the techniques\\ndescribed in this paper require an incident\\nneutron beam with a mean energy between\\n100 keV and several MeV.\\n•The neutron source must be of sufficient in-\\ntensity to achieve useful calibration rates using\\nthe technique described in Sec. 2. The required\\nneutron source intensity is discussed in greater\\ndetail in the next few paragraphs.\\n•The ability to pulse the neutron beam provides\\nseveral advantages. First, controlling the duty\\ncycle provides a precise tuning mechanism for\\nthe neutron yield. Second, the known “beam\\non” time during low duty cycle operation can\\nprovide a powerful reduction in calibration\\nbackgrounds. Third, if neutron bunch widths\\nof ≲10 µs are achievable, then more sensitive\\nmeasurement techniques described in Sec. 3\\nbecome feasible.\\nSeveral candidate monoenergetic neutron\\nsources are available that provide required\\nenergy, flux, and pulsing characteristics. The\\nendothermic 7Li(p, n)7Be reaction has a Q\\nvalue of −1.644 MeV [26]. This reaction can\\nprovide a source of monoenergetic neutrons\\nof tunable mean energy by accelerating the\\nincident protons to a fixed energy above the\\nreaction threshold. A dedicated proton accelerator\\nfacility is required to generate the ∼2 MeV\\nprotons used for this reaction. A number of\\nrecent ex situ nuclear recoil calibrations have\\nmade use of such facilities [20, 27, 28]. The\\nexothermic 2H(d, n)3He (D-D) and 3H(d, n)4He\\n(D-T) reactions have Q values of 3.269 MeV and\\n17.590 MeV, respectively [26].', mimetype='text/plain', start_char_idx=0, end_char_idx=3896, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8121593914209859),\n", " NodeWithScore(node=TextNode(id_='e433fc20-e6e3-4084-8d2c-62f018a3cb07', embedding=None, metadata={'page_label': '6', 'file_name': '1608.05381v2.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf', 'file_type': 'application/pdf', 'file_size': 4399092, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='ea31bbb3-7ce2-4cf1-9f28-951327b91944', node_type='4', metadata={'page_label': '6', 'file_name': '1608.05381v2.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf', 'file_type': 'application/pdf', 'file_size': 4399092, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, hash='4f929d2a615a2b9d406a30665313d47e175ce52e4c73603c8c9a1a15e809d393'), : RelatedNodeInfo(node_id='37efbb1d-78a8-4144-a25d-23e0623f312e', node_type='1', metadata={}, hash='718dcf92321ca0cb9b46bccd8c7798fbdcad3096c35fab5ddc1d9681346c3d30')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='6\\nusing a 9 inch diameter Bonner sphere [17]. Assuming an\\nisotropic source3, this corresponds to (2.5±0.3)×106 n/s\\ninto 4π solid angle. A total of 107.2 live-hours of D-D\\nneutron data was acquired and used for the analysis.\\nC. Beam energy purity cuts\\nMonte Carlo simulation studies using\\nLUXSim/GEANT4 [24, 25] indicate that after selecting\\nevents using a cylindrical analysis volume in line with\\nthe neutron beam in the TPC, 95% of accepted events\\nare due to neutrons with energies within 6% of the initial\\nenergy at the D-D source [26]. This position cut requires\\nthat the first scatter has a reconstructed location of\\ny′ > 15 cm and is within the 4.9 cm diameter of the\\nneutron beam projection in the detector active region.\\nThese position-based analysis cuts are referred to as the\\n“neutron energy purity cuts” in the following sections.\\nAny residual electron recoil contamination produced by\\nneutron capture or inelastic scatters in passive materials\\nwas identified and removed in the data analysis [17, 26].\\nThere are several xenon metastable states resulting\\nfrom inelastic neutron scatters that do not produce a\\nprompt electron recoil signal. Contamination due to\\nevents arising from this type of inelastic process was\\ncalculated to be <1% of the elastic nuclear recoil rate.\\nThe systematic uncertainty in the reconstructed energy\\ndue to the variation in the atomic mass and cross-section\\nover xenon isotopes with significant natural abundance\\nwas estimated to be <2% for all energies—subdominant\\nto other uncertainties in the following analyses.\\nIII. LOW-ENERGY IONIZATION YIELD\\nThe ionization yield was measured as a function of\\nnuclear recoil energy from 0.7 to 24.2 keV nr using\\nneutrons that scatter twice in the active liquid xenon\\nvolume.\\nA. Absolute measurement of nuclear recoil energy\\nusing double-scatter events\\nFor double-scatter neutron events, the scattering angle\\nbetween the first and second interaction sites was\\ncalculated based upon the reconstructed (x, y, z) position\\nof each site. The scattering angle in the center-of-mass\\nframe, θCM, is related to the recoil energy associated with\\nthe first interaction:\\n3 Actually, the D-D neutron flux varies by approximately a factor\\nof two as a function of angle [23], but the isotropic assumption\\nprovides a convenient normalization.\\nEnr = En\\n4mnmXe\\n(mn + mXe)2\\n1 −cos (θCM)\\n2 , (5)\\nwhere mXe is the average atomic mass of Xe, mn is\\nthe mass of the neutron, and En is the energy of the\\nincident neutron. The relationship between θCM and the\\nscattering angle in the laboratory frame, θlab, is given by\\ntan θlab = sin θCM\\nmn/mXe + cosθCM\\n. (6)\\nFor the measurement presented here, the relationship\\n1 −cos (θlab)\\n1 −cos (θCM) ≈1 (7)\\nis accurate to better than 2% for all scattering angles.\\nThis absolute determination of the recoil energy\\ncombined with the observed S2 from the first interaction\\nprovides a direct Qy calibration. A conceptual schematic\\nof this type of event is shown in Fig. 1. The ( x, y)\\npositions were determined using the algorithm described\\nin Ref. [27]. The z positions were measured using the\\nionization electron drift time. The variable θlab was\\nreconstructed using the measured 3D positions of the\\nfirst and second interaction sites. The ionization yield\\nmeasurement used individual events with a reconstructed\\nnuclear recoil energy between 0.3 and 30 keV nr, which\\ncorresponds to a measured neutron scattering angle range\\nof 7◦to 79◦. For comparison, the recoil energy spectrum\\nendpoint produced by 180◦neutron scatters corresponds\\nto a nuclear recoil energy of 74 keV nr.', mimetype='text/plain', start_char_idx=0, end_char_idx=3558, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8115301278412063),\n", " NodeWithScore(node=TextNode(id_='97a91fbb-b7e7-47f4-8cca-54c65c3f65e3', embedding=None, metadata={'page_label': '15', 'file_name': '1608.05381v2.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf', 'file_type': 'application/pdf', 'file_size': 4399092, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='6bb48db4-4b9c-4dfd-8cc1-61141b88b6a3', node_type='4', metadata={'page_label': '15', 'file_name': '1608.05381v2.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf', 'file_type': 'application/pdf', 'file_size': 4399092, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, hash='190a33cf363bbb27f60402dd7adafb6276a08ec67cf6fe073ad24a04df3d7eac'), : RelatedNodeInfo(node_id='e20cbb62-80b5-46b3-ba5a-663d58039e25', node_type='1', metadata={'page_label': '15', 'file_name': '1608.05381v2.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/lux_dd_papers/1608.05381v2.pdf', 'file_type': 'application/pdf', 'file_size': 4399092, 'creation_date': '2025-01-13', 'last_modified_date': '2025-01-13'}, hash='3652ddfc7960a0bb214ea6fba8409ebe4013a773147a8677943c7565a7a0d3b2')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='5\\nPrevious measurements reported in terms of Leff were\\nconverted to Ly assuming a 57Co absolute yield of\\n63 photons/keV ee at 0 V/cm [33, 39]. Conveniently\\nin LUX, as was found in Ref. [8], the 83mKr yield at\\n32.1 keVee and the 57Co yield at 122 keV ee are in close\\nagreement allowing easy direct comparison to previous\\nLeff measurements using the right axis in Fig. 10.\\nThe electron recoil light yield was also measured\\nusing 131mXe remaining in the liquid xenon from\\ncosmogenic activation before the target media was\\ntransported underground. The 131mXe nuclei undergoes\\nan isomeric transition depositing 163.9 keV ee with a half\\nlife of 11.8 days and provides an internal, homogeneous\\ncalibration source close in energy to the 122 keV ee\\ngamma from 57Co that has been used to calibrate\\nsmaller liquid xenon TPCs in the past. The light yield\\nfor 163.9 keV ee electron recoils was measured to be\\n41.3 ±1.1 photons/keVee at 180 V/cm using the 131mXe\\nsource. We can then extrapolate the light yield from\\nthis data point to the commonly used standard candle\\nenergy of 122 keV ee using NEST v0.98. The light\\nyield due to a 122 keV ee electron recoil at 180 V/cm\\nis 1.12 +0.08\\n−0.06 times higher than the yield at 164 keV ee\\naccording to NEST v0.98. After accounting for this yield\\ntranslation factor and the expected See(E= 180 V/cm)\\nfield quenching factor for electron recoils of 0.74 [8, 10],\\nwe measure the electron recoil yield for a 122 keV ee\\ngamma ray to be 63 +5\\n−4 photons/keVee at 0 V/cm. This\\nmeasured light yield for 122 keVee electron recoils in LUX\\nis in agreement with the value of 63 photons/keV ee at\\n0 V/cm used to convert previous Leff results to Ly.\\nAvoiding any assumptions about Snr and See, the LUX\\nmeasured Ly in Fig. 10 is reported in absolute units at\\n180 V/cm. Previous results in the figure were measured\\nat 0 V/cm or were corrected to 0 V/cm assuming various\\nvalues of Snr for the operating field—all of which ranged\\nfrom 0.92–1.0. The agreement of results from liquid\\nxenon TPCs operating across a broad range of drift fields\\n(0–3.6 kV/cm) in Fig. 10 indicates that the nuclear recoil\\nlight yield in liquid xenon is a weak function of the drift\\nelectric field.\\n5 Unlike the 9.4 keVee component, the light yield of the 32.1 keVee\\ncomponent is constant as a function of the time separation\\nbetween the emission of conversion electrons and can be used\\nas a standard candle [38].', mimetype='text/plain', start_char_idx=3069, end_char_idx=5462, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8114973606229501)]" ] }, "metadata": {}, "execution_count": 27 } ] }, { "cell_type": "markdown", "source": [ "# What if we want to add more documents?" ], "metadata": { "id": "e73ArtQmwBr1" } }, { "cell_type": "markdown", "source": [ "## Load the new directory of documents from Google Drive, parse them, and then insert them into the existing index" ], "metadata": { "id": "V7cmDfIB1biA" } }, { "cell_type": "code", "source": [ "# Set basic Llama index parameters\n", "new_llama_index_data_path = top_level_path + '/' + module_path + '/' + 'documents/other_brownpa_theses'\n", "\n", "# Read mounted Google Drive with Llama index\n", "new_documents = SimpleDirectoryReader(new_llama_index_data_path, recursive = True).load_data()\n", "\n", "# Get nodes from the new documents\n", "parser = SimpleNodeParser()\n", "new_nodes = parser.get_nodes_from_documents(new_documents)\n", "\n", "# Insert the nodes into the existing index.\n", "index.insert_nodes(new_nodes)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "id": "B6TNJEg1r-wG", "outputId": "8b2f259b-e031-4da0-ed16-c2a0c70dc700" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "code", "source": [ "question = questions[5]\n", "\n", "response = query_rag(question)\n", "\n", "print(\"Question: \" + question)\n", "print(\"\")\n", "print(\"Response: \" + response.response)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 86 }, "id": "3WlWRbFNv99D", "outputId": "a7976e41-79ae-4025-8278-7ddb07b7f664" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Question: How low in energy was the ER response measured using 127Xe? Where did the 127Xe come from?\n", "\n", "Response: The lowest-energy ER response measured using 127Xe reached down to 186 eV energy deposition. The 127Xe present in the LXe target is a result of cosmogenic activation during its time on the surface before being brought underground.\n" ] } ] }, { "cell_type": "code", "source": [ "response.metadata" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 711 }, "id": "PESmJpJv1oqX", "outputId": "a7e23d50-bafe-4eef-a28e-b0188d09ef05" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "execute_result", "data": { "text/plain": [ "{'8c34f555-164e-4485-a47f-31a97cad6079': {'page_label': '158',\n", " 'file_name': '20220429_Taylor_PhD_Thesis.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 35637882,\n", " 'creation_date': '2025-01-15',\n", " 'last_modified_date': '2025-01-13'},\n", " 'bcace8db-4477-4c0e-ad4f-c8cbf9ea7503': {'page_label': '78',\n", " 'file_name': '20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 49934873,\n", " 'creation_date': '2025-01-15',\n", " 'last_modified_date': '2025-01-13'},\n", " '9ced5040-5fb5-4776-b8be-09c49fdfb9ba': {'page_label': '174',\n", " 'file_name': '20220429_Taylor_PhD_Thesis.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 35637882,\n", " 'creation_date': '2025-01-15',\n", " 'last_modified_date': '2025-01-13'},\n", " '4d88bb3e-db02-413f-99e8-6e869a33ff83': {'page_label': '77',\n", " 'file_name': '20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 49934873,\n", " 'creation_date': '2025-01-15',\n", " 'last_modified_date': '2025-01-13'},\n", " '80cd4a00-7758-4033-8f1a-4e250951636c': {'page_label': '157',\n", " 'file_name': '20220429_Taylor_PhD_Thesis.pdf',\n", " 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 35637882,\n", " 'creation_date': '2025-01-15',\n", " 'last_modified_date': '2025-01-13'}}" ] }, "metadata": {}, "execution_count": 30 } ] }, { "cell_type": "code", "source": [ "response.source_nodes" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "dsbhxpB_5fgo", "outputId": "1ae69279-bf6e-4dbf-cb33-e177bbff0130" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} }, { "output_type": "execute_result", "data": { "text/plain": [ "[NodeWithScore(node=TextNode(id_='8c34f555-164e-4485-a47f-31a97cad6079', embedding=None, metadata={'page_label': '158', 'file_name': '20220429_Taylor_PhD_Thesis.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf', 'file_type': 'application/pdf', 'file_size': 35637882, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='e473dd35-f541-4def-b17c-bd71464aceb2', node_type='4', metadata={'page_label': '158', 'file_name': '20220429_Taylor_PhD_Thesis.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf', 'file_type': 'application/pdf', 'file_size': 35637882, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, hash='71bb6d7f5545c9a6ea0c266680d73a2d807d9945eea9d59a41074ef64f33912d')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='158\\nover the course of the experiment. The event energy region in which ER events may\\nleak into the WIMP search ROI is1.5 to 6.5 keVee, and many other, more exotic dark\\nmatter models also search for low-energy ER events between1.5 and 15 keVee. The\\nmajority of isotopes produced via neutron activation are incapable of generating these\\nlow-energy signals. A small fraction of decays from127Xe and133Xe may contribute\\nto the WIMP energy ROI via partial energy deposition if they occur near the edge\\nof the TPC volume. The decay of 127Xe always yields either203 keV, 375 keV, or\\n618 keV states of127I, as discussed further in Section6.4.1. If these states decay with\\na high energy gamma emission around the edge of the xenon volume, the gamma can\\nescape. Electron captures from the L-shell (5.2 keV) or M-shell (1.07 keV) that have\\nthe gamma escape can appear in the WIMP energy ROI. The resulting total activity\\nof 127Xe from a standard2 d direct calibration and4 d reflector calibration after30 d\\npeaks at 1 × 10−4 dru at 408 keV. The expected LZ background rate at this energy\\nis five times higher at5 × 10−4 dru. The actual contribution of127Xe to the WIMP\\nenergy ROI is expected to be even more subdominant, as only a fraction of events\\noccur near the edge of the TPC, only a fraction of those decay via L- or M-shell\\ncapture, and only a fraction of those might emit a gamma that escapes.\\nThe decays of 133Xe may result in partial energy depositions in a similar way\\nas 127Xe, though the effect of133Xe is even further suppressed. The decay of133Xe\\nmay yield the161 keV state of 133Cs (branching ratio of 1.4%) or the384 keV state\\n(branching ratio of 0.0087%). The half-life of133Xe is5.2 d. The slim odds of yielding\\ngammas which can escape undetected combined with the short half-life ensure that\\nthe effect of133Xe on the WIMP energy ROI is completely negligible after30 d.\\nThe impact of partial energy depositions should be mitigated by the presence of\\nthe xenon skin region and the outer detector. Future studies can focus on further\\nquantifying the rate of these partial energy depositions, including a full modeling of', mimetype='text/plain', start_char_idx=0, end_char_idx=2128, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.846536228904579),\n", " NodeWithScore(node=TextNode(id_='bcace8db-4477-4c0e-ad4f-c8cbf9ea7503', embedding=None, metadata={'page_label': '78', 'file_name': '20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf', 'file_type': 'application/pdf', 'file_size': 49934873, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='d9a69d50-e4a7-48ee-b10f-2a10dfbfa54f', node_type='4', metadata={'page_label': '78', 'file_name': '20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf', 'file_type': 'application/pdf', 'file_size': 49934873, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, hash='fb4af57eb61e97db1f3d7508a17aae2a07747cfcc02594220b533a5e43353200')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='78\\nan appealing mono-energetic source for LUX ER energy calibration. This calibration covers the\\nentire signal region relevant to the WIMP search, reaching all the way down to the observation of\\n186 eV energy deposition. This represents the lowest-energy ERin situ measurements that have\\nbeen explored in LXe to date.\\n618.4\\n375.0\\n202.9\\n57.6\\n0.0\\n3/2+ \\n1/2+ \\n≤135 ps \\n3/2+ \\n0.39 ns \\n7/2+ \\n1.86 ns \\n5/2+ \\nstable 127I\\n127Xe 1/2+ \\n36.4 d\\nQEC = 662.3\\n53.0%\\n0.0143%0.0143%\\n \\n618.4\\n17.3%\\n 375.0\\n25.7%\\n 172.1\\n68.7%\\n 202.9\\n4.31%\\n 145.3\\n1.24%\\n 57.6\\n47.6%\\n≈\\nFigure 5.1: Decay scheme of127Xe [23] with units of keV. The127Xe decays via electron capture\\nto 127I. The percentage above the transition arrow is the gamma-ray intensity as fraction of parent\\n(127Xe) decay.\\nThe 127Xe decays via electron capture (EC), in which its nucleus absorbs one of the atomic\\nelectrons. Following this EC, the possible initial states and subsequent decays of the daughter\\nnucleus, 127I, are shown in Fig. 5.1. The127I is left in its 375 keV or 203 keV excited state with 47%\\nand 53% probability, respectively. There is a 17.3% probability of decay from the 375 keV state to\\nground state by a single gamma-ray emission and a 43.9% [23] probability of decay from the 203 keV\\nstate to ground state via a single gamma-ray emission. Nuclear de-excitation can also occur via\\ninternal conversion (IC) electron emission; however, this process occurs with a branching ratio of\\nless than 10% relative to the gamma-ray emission [1].\\nThe electron capture can occur from either the K, L, M, or N shell with 83.37%, 13.09%, 2.88%\\nand 0.66% probabilities (see Table 5.1), respectively, resulting in an atomic orbital vacancy [1]. The\\nvacancy is subsequently filled with an electron from a higher level via emission of cascade X-rays or\\nAuger electrons (Fig. 5.2), with total cascade energies of 33.2 keV, 5.2 keV, 1.1 keV, and 186 eV [82],\\nrespectively. Localized energy depositions associated with these processes are clearly observed by', mimetype='text/plain', start_char_idx=0, end_char_idx=2003, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8391019966543145),\n", " NodeWithScore(node=TextNode(id_='9ced5040-5fb5-4776-b8be-09c49fdfb9ba', embedding=None, metadata={'page_label': '174', 'file_name': '20220429_Taylor_PhD_Thesis.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf', 'file_type': 'application/pdf', 'file_size': 35637882, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='d413b573-0638-417c-8fa4-081089b031eb', node_type='4', metadata={'page_label': '174', 'file_name': '20220429_Taylor_PhD_Thesis.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf', 'file_type': 'application/pdf', 'file_size': 35637882, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, hash='099d9b0ab7d5b1fe96d80d4f52a74d7bb6a96c3614d02d2f073cf043f7f4b33b')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='174\\nplanned pre-SR1 DD calibration campaign in LZ is shown in Fig.6.5.e As the figure\\nshows, the combined rate of all xenon isotope products is <2 Hz, which is less than\\n10 % of the nominal LZ background rate. As such, the effects of xenon activation are\\nsubdominant to the intrinsic detector conditions. Furthermore, since the rates are\\nso low and the xenon is isolated from any human interaction, it does not present a\\nsafety risk.\\nThe other factor to consider is the differential energy spectrum. The compos-\\nite spectrum immediately following the DD calibrations is shown in Fig.6.6,f while\\nthe overall decay with time is shown in Fig.6.7.g The WIMP search region of1.5\\n- 6.5 keVee is clear immediately following the calibration.2 The most lasting contri-\\nbutions to the background spectrum are from127Xe (36.3 d half-life), 129mXe (8.88 d\\nhalf-life), 131mXe (11.8 d half-life), and133Xe (5.2 d half-life). The decay signatures of\\nand 129mXe and131mXe are essentially incapable of creating a low-energy background.\\nHowever, if127Xe or133Xe decay near the edge of the detector, it is possible for them\\nto generate a low-energy signal.\\nThe decay of127Xe (see Section6.6.3) is an electron capture decay which drops\\ninto the202.9 keV state of127I with a 52.7% branching ratio and the375 keV state of\\n127I with a 47.3% branching ratio. There are a multitude of possible gamma energies\\nthat can be produced in the de-excitation of these states, as they can either decay\\ndirectly to the ground state or to intermediate states. Gammas with energies between\\n200 keV and 500 keV have mean free paths between1 cm and 3 cm in liquid xenon,\\nwhich makes it feasible for them to escape the detector without depositing energy if\\nthey are generated near the edge of the detector. If the electron capture occurs in\\nthe s-orbital of the L- or M-shells, then the resulting cascade will have a total energy\\n2 It is possible for 137Xe to produce a naked beta via decay to the ground state of 137Cs (67%\\nbranching ratio). However, the activity of 137Xe following DD campaigns is approximately 1 ×\\n10−8 dru, and the half-life of 137Xe is only 3.8 minutes, so within one hour the WIMP ROI is clear.', mimetype='text/plain', start_char_idx=0, end_char_idx=2177, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8375808031594311),\n", " NodeWithScore(node=TextNode(id_='4d88bb3e-db02-413f-99e8-6e869a33ff83', embedding=None, metadata={'page_label': '77', 'file_name': '20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf', 'file_type': 'application/pdf', 'file_size': 49934873, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='de14f5a1-47ea-49f0-bd31-c6454fd108d6', node_type='4', metadata={'page_label': '77', 'file_name': '20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20200501_Huang_PhD_thesis_Brown_physics_2019_submit_to_Brown_Repo_v5.pdf', 'file_type': 'application/pdf', 'file_size': 49934873, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, hash='8e01a9bcc758bceca8a7c8d542a96371ca81581ba011114d37a2cfc888524155')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='77\\nto atoms in the medium is well described by the Lindhard model [47, 48] down to∼keV energies,\\nand has been experimentally measured by LUX for nuclear recoils in LXe over the range0.7 −74\\n[19]. The ER and NR events are typically discriminated by the logarithmic charge to light ratio, i.e.\\nlog10(S2/S1), thanks to the different ionization/excitation ratios for ER and NR interactions [24, 56].\\nWe expect WIMPs to interact with LXe via nuclear recoil, depositing up toO(100) keV in a single\\nscatter. LUX has reported world-leading dark matter search results on both spin-independent and\\nspin-dependent WIMP-nucleon scattering in [21, 22, 42, 73].\\nIn the context of a WIMP search experiment using a LXe target, it is important to understand\\nLXe scintillation and ionization yield responses over the WIMP search energy range for both ER\\nand NR because of their non-linear energy dependence [53, 74]. Many efforts have been devoted to\\nunderstanding the scintillation and ionization response in LXe in the past few years using various\\ntechniques [75–79]. LUX has independently developed and deployed a number of novel in situ\\ninternal and external sources to calibrate detector ER and NR response in the energy region that is\\nrelevant to WIMP searches. Two such sources are tritiated methane (CH3T) for ER calibration [16]\\nand deuterium-deuterium (D-D) neutrons for NR calibration [19]. While tritium is an ideal source\\nto calibrate detector ER response in the low energy region, its application is limited by it being\\na continuum-energy source which affects the sensitivities at low energies, and the detector light\\ncollection efficiency. As a result, the tritium calibration currently reaches a lowest-energy calibration\\npoint of 1.3 keV [16]. A source that is capable of studying calibrations in the sub-keV energy range\\nin LXe is desirable. For example, this small signal regime is directly relevant to the signal and\\nbackgrounds for low-mass WIMP searches and for coherent neutrino-nucleus scattering (CNNS) [80,\\n81].\\n5.3 Xenon-127 in LUX Detector\\nLUX background measurements with WS2013 data revealed an initial 127Xe activity of 490 ±\\n95 µBq/kg in the active region [12]. From this, we infer approximately 0.8 million127Xe decay\\nevents during the WS2013 3-month run period, given the 36.4 day half-life of the isotope. The127Xe\\nradioisotope is present in the LXe target due to cosmogenic activation of the Xe during its time\\non the surface before being brought one mile underground. The surface production rate is modeled\\nand estimated using ACTIVIA and described in [12]. The decay characteristics of127Xe make it', mimetype='text/plain', start_char_idx=0, end_char_idx=2613, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8369325214168621),\n", " NodeWithScore(node=TextNode(id_='80cd4a00-7758-4033-8f1a-4e250951636c', embedding=None, metadata={'page_label': '157', 'file_name': '20220429_Taylor_PhD_Thesis.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf', 'file_type': 'application/pdf', 'file_size': 35637882, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={: RelatedNodeInfo(node_id='a049c034-a3bc-4ed1-90da-07a50f9c812f', node_type='4', metadata={'page_label': '157', 'file_name': '20220429_Taylor_PhD_Thesis.pdf', 'file_path': '/content/drive/Shared drives/AI Winter School (Brown Physics CFPU 2025)/Module 5 -- Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)/documents/other_brownpa_theses/20220429_Taylor_PhD_Thesis.pdf', 'file_type': 'application/pdf', 'file_size': 35637882, 'creation_date': '2025-01-15', 'last_modified_date': '2025-01-13'}, hash='a62b14e365a64c18bcb894ec3f80de28de1f53cb024a02522cc5e464f3d6b810')}, metadata_template='{key}: {value}', metadata_separator='\\n', text='157\\n6.1 Conclusions from Neutron Calibration Activa-\\ntion Studies\\nThe radioactive products from neutron activation during DD-Direct calibrations (in-\\ntensities shown in Table6.1) with run times of order 10 days, or H- or D-reflector run\\ntimes of up to 50 days, are not expected to present a background concern to stan-\\ndard WIMP searches or most other exotic physics searches. Additional calibrations\\ncan also be carried out once the previously generated radioisotopes have decayed (for\\nxenon isotopes) or been removed by xenon target circulation (for non-xenon isotopes,\\nsuch as125I).\\nIn these studies it was determined that the activation of any possible detector\\nmaterials outside the xenon target itself was found to be subdominant and incapable\\nof generating new backgrounds with WIMP-like event signatures. The most relevant\\nactivation processes for these studies all occurred within the xenon target itself.\\nThe most significant long-lived xenon isotopes that are created produce energy\\nsignatures that are well outside the WIMP energy ROI, such as129mXe (236 keV)\\nand 131mXe (164 keV). This work suggests that 80% of the129mXe and 131mXe are\\nproduced via inelastic scattering of high-energy neutrons, while the remaining 20%\\ncome from neutron captures. Lower energy sources like the D- and H-Reflectors\\ngenerate proportionally less129mXe and131mXe as a fraction of total neutron flux into\\nthe xenon.\\nThe only isotopes that produce low-energy particles without high-energy coin-\\ncident tags are 137Xe, 135Cs, and 137Cs. However, none of these are a background\\nconcern: 137Xe decays extremely rapidly with a half-life of3.8 minutes and so can be\\ndiscounted, while135Cs and137Cs are produced in such low quantities and have such\\nlong half-lives (2.3 × 106 yr and 30.07 yr, respectively) that no decays are expected', mimetype='text/plain', start_char_idx=0, end_char_idx=1819, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'), score=0.8343724834919047)]" ] }, "metadata": {}, "execution_count": 31 } ] }, { "cell_type": "markdown", "source": [ "# Exercise: Google Form question" ], "metadata": { "id": "QqVCCw4ezv76" } }, { "cell_type": "markdown", "source": [ "Please run the following query and submit your answer in the Google Form for Module 5.\n", "\n", "Module 5 - https://docs.google.com/forms/d/e/1FAIpQLSfR1Pu7hQcaax-gS4UpOUp5fpg4PQET9njdOuLfWSfwwkR7Aw/viewform?usp=sharing" ], "metadata": { "id": "JNgk8CzszzjC" } }, { "cell_type": "code", "source": [ "final_query = \"What was the energy of the lowest Qy measurement achieved with DD2016 calibration data? How much lower in energy was this than the 2013 LUX Qy result?\"\n", "response = query_rag(final_query)" ], "metadata": { "id": "stMHnf6czvOQ", "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "outputId": "05089d12-4cac-4981-9fdf-38beb9803fed" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {} } ] }, { "cell_type": "code", "source": [ "print(\"Question: \" + final_query)\n", "print(\"\")\n", "print(\"Response: \" + response.response)" ], "metadata": { "id": "SEOsrJxQzuKD" }, "execution_count": null, "outputs": [] } ] }