losalamos.tools.search_references#
Batch-extract bibliographic references from PDFs using a Gemini model.
This tool scans a folder of PDF files, builds a prompt from the first N pages,
and queries a generative model to produce BibTeX entries. Results are saved
as .bib files and the corresponding PDFs are renamed accordingly.
Usage#
Module execution (recommended):
Shell (bash/zsh):
python -m losalamos.tools.search_references --file specs.json
PowerShell (ps1):
$SPEC="C:\path\to\specs.json"
python -m losalamos.tools.search_references `
--file $SPEC
Specification (JSON)#
The input file must define:
{
"folder": "/path/to/pdf_dir",
"api": "YOUR_API_KEY",
"model": "gemini-3-flash-preview",
"pages": 2,
"prompt": "path/to/prompt.txt or inline string"
}
Side effects#
Creates
.bibfiles alongside each PDFRenames PDFs based on the generated BibTeX key
Skips files that already have a corresponding
.bib
Functions
|
|
|
Look for auxiliary |
|
|
|
Return a non-colliding path by appending alphabetical suffixes. |
- losalamos.tools.search_references.get_auxiliary_context(pdf_path: Path) str[source]#
Look for auxiliary
.txtand.risfiles alongside the PDF.Reads the contents of any matching auxiliary files (sharing the same stem as the PDF) and returns them as a formatted string to be injected into the LLM prompt context.
- Parameters:
pdf_path (pathlib.Path) – The file path to the target PDF document.
- Returns:
A formatted string containing the text from the auxiliary files, or an empty string if no such files exist.
- Return type:
str
- losalamos.tools.search_references.next_available_path(base_path: Path) Path[source]#
Return a non-colliding path by appending alphabetical suffixes.
- Example:
file.pdf -> file_a.pdf -> file_b.pdf -> … -> file_aa.pdf