losalamos.tools.search_references#

Batch-extract bibliographic references from PDFs using a Gemini model.

This tool scans a folder of PDF files, builds a prompt from the first N pages, and queries a generative model to produce BibTeX entries. Results are saved as .bib files and the corresponding PDFs are renamed accordingly.

Usage#

Module execution (recommended):

Shell (bash/zsh):

python -m losalamos.tools.search_references         --file specs.json

PowerShell (ps1):

$SPEC="C:\path\to\specs.json"

python -m losalamos.tools.search_references `
    --file $SPEC

Specification (JSON)#

The input file must define:

{
    "folder": "/path/to/pdf_dir",
    "api": "YOUR_API_KEY",
    "model": "gemini-3-flash-preview",
    "pages": 2,
    "prompt": "path/to/prompt.txt or inline string"
}

Side effects#

  • Creates .bib files alongside each PDF

  • Renames PDFs based on the generated BibTeX key

  • Skips files that already have a corresponding .bib

Functions

ask_gemini(prompt, model, client[, ...])

get_arguments()

get_auxiliary_context(pdf_path)

Look for auxiliary .txt and .ris files alongside the PDF.

main()

next_available_path(base_path)

Return a non-colliding path by appending alphabetical suffixes.

losalamos.tools.search_references.get_auxiliary_context(pdf_path: Path) str[source]#

Look for auxiliary .txt and .ris files alongside the PDF.

Reads the contents of any matching auxiliary files (sharing the same stem as the PDF) and returns them as a formatted string to be injected into the LLM prompt context.

Parameters:

pdf_path (pathlib.Path) – The file path to the target PDF document.

Returns:

A formatted string containing the text from the auxiliary files, or an empty string if no such files exist.

Return type:

str

losalamos.tools.search_references.get_arguments()[source]#
losalamos.tools.search_references.next_available_path(base_path: Path) Path[source]#

Return a non-colliding path by appending alphabetical suffixes.

Example:

file.pdf -> file_a.pdf -> file_b.pdf -> … -> file_aa.pdf

losalamos.tools.search_references.ask_gemini(prompt, model, client, max_retries=5, base_delay=2)[source]#
losalamos.tools.search_references.main() None[source]#