LLMjacking: The New Frontier of Resource Hijacking

Author: CyberDudeBivash
Powered by: CyberDudeBivash Brand | cyberdudebivash.com
Related: cyberbivash.blogspot.com

Daily Threat Intel by CyberDudeBivash
Zero-days, exploit breakdowns, IOCs, detection rules & mitigation playbooks.

Follow on LinkedIn Apps & Security Tools

By Authority of: CyberDudeBivash

The era of “Cryptojacking” has evolved. While hackers once scrambled for your CPU to mine Bitcoin, they are now hunting your GPU to run Large Language Models. This is LLMjacking.

In this guide, we’ll break down how this exploit works and, more importantly, how you can build a fortress around your Ollama or local AI instance.

1. What is LLMjacking?

LLMjacking occurs when an attacker gains unauthorized access to a local AI server (like Ollama) to steal its “inference power.”

The Exploit Mechanism

Scanning: Attackers use automated tools to scan the internet for port 11434 (Ollama’s default).
Infiltration: Because most users don’t set up an authentication layer, the attacker finds an open API.
The Theft: The attacker sends complex prompts to your server. Your GPU works at 100% capacity to generate responses for their application.
The Cost: You pay the electricity bill and suffer massive system lag; the attacker gets a free, high-performance AI API.

2. The CyberDudeBivash “Steel Wall” Defense

To stop LLMjacking, we must move from a “Public” state to a “Hardened” state. Follow these five steps to secure your server.

Step 1: Bind to Localhost (The Foundation)

Never allow Ollama to listen to the open web directly. Ensure your environment variables are set so Ollama only talks to your own machine.

Linux/Systemd: Set OLLAMA_HOST=127.0.0.1 in your service file.
Docker: Do not map port 11434:11434. Instead, use internal container networking.

Step 2: Deploy the Nginx “Bouncer”

Since Ollama has no built-in password, we put a “Bouncer” (Nginx) in front of it. This requires every visitor to show an ID card (Username/Password).

Refer to our previous guide on Nginx Basic Auth for the configuration details.

Step 3: Encrypt with SSL (The Secret Code)

Without SSL (HTTPS), your password is sent in plain text. Using Let’s Encrypt ensures that even if someone intercepts the traffic, they can’t read your credentials.

Step 4: Rate Limiting (The Anti-Spam)

LLM queries are resource-heavy. By setting a rate limit in Nginx (e.g., 2 requests per second), you prevent an attacker from flooding your GPU with thousands of tokens, even if they somehow bypass your password.

Step 5: Fail2Ban (The Ban Hammer)

Automate your defense. If an IP address tries to guess your password three times and fails, Fail2Ban should block that IP at the firewall level for 24 hours.

3. Verification Checklist

Run these tests to ensure you are safe:

Can I access http://[Your-IP]:11434? (Answer should be NO).
Does https://yourdomain.com ask for a password? (Answer should be YES).
Does my GPU usage spike when I’m not using it? (Check via nvidia-smi or htop).

The Bottom Line

AI is the most expensive computing resource you own. Leaving an Ollama server unsecured in 2026 is the digital equivalent of leaving a gold bar on your front porch. Lock it down.

CyberDudeBivash Final Word: “Don’t let your hardware work for the enemy. Encrypt, Authenticate, and Monitor.”

To combat LLMjacking, we don’t just want a passive firewall; we want an active alarm system. This script acts as a “tripwire”—if your GPU utilization stays above a certain threshold (e.g., 80%) for too long while you aren’t using it, it sends an emergency alert to your phone via Telegram.

Step 1: Get Your Telegram Credentials

Bot Token: Message @BotFather on Telegram. Use /newbot and follow the prompts to get your API Token.
Chat ID: Message @userinfobot to get your unique Chat ID.

Step 2: Install the Python Dependencies

We will use nvitop (or pynvml) to pull real-time NVIDIA data.

Bash

pip install nvitop requests

Step 3: The “CyberDudeBivash” Tripwire Script

Create a file named gpu_shield.py and paste the following:

Python

			
import time
import requests
from nvitop import Device
# --- CONFIGURATION ---
TELEGRAM_TOKEN = "YOUR_BOT_TOKEN"
CHAT_ID = "YOUR_CHAT_ID"
THRESHOLD_PERCENT = 80.0  # Alert if GPU > 80%
CHECK_INTERVAL = 30       # Check every 30 seconds
STRIKE_LIMIT = 2          # Alert after 2 consecutive high readings (60 seconds)
def send_telegram_alert(message):
    url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
    payload = {"chat_id": CHAT_ID, "text": message, "parse_mode": "Markdown"}
    try:
        requests.post(url, json=payload)
    except Exception as e:
        print(f"Error sending alert: {e}")
def monitor_gpu():
    strikes = 0
    print(" CyberDudeBivash GPU Shield Active...")
    
    while True:
        devices = Device.all()
        for device in devices:
            utilization = device.gpu_utilization()
            
            if utilization > THRESHOLD_PERCENT:
                strikes += 1
                print(f" Warning: GPU {device.index} at {utilization}% (Strike {strikes})")
            else:
                strikes = 0 # Reset if usage drops
            if strikes >= STRIKE_LIMIT:
                alert_msg = (
                    f" *LLMjacking Alert!*\n"
                    f"High GPU activity detected on {device.physical_description}.\n"
                    f"Current Load: {utilization}%\n"
                    f"Check your Ollama logs immediately!"
                )
                send_telegram_alert(alert_msg)
                strikes = 0 # Reset after sending alert
                
        time.sleep(CHECK_INTERVAL)
if __name__ == "__main__":
    monitor_gpu()

		

Step 4: Running it as a Background Service

To ensure this script runs 24/7 even after you close your terminal, use PM2 or a systemd service.

Using PM2 (easiest):

Bash

			
sudo npm install -g pm2
pm2 start gpu_shield.py --interpreter python3
pm2 save
pm2 startup

Why this works against LLMjacking

Attackers don’t just run one small query; they flood your server with high-token-count requests to maximize their “theft.” This causes your GPU to stay at high utilization for minutes or hours.

Legitimate Use: You usually know when you are running a model.
LLMjacking: You get a notification while you’re away or asleep. This is the final tier of the CyberDudeBivash defense strategy: Active Countermeasures.If the “Tripwire” script detects that your GPU is being pinned for a sustained period—indicating a high-token LLMjacking attack—it will automatically execute a “Emergency Shutdown” of the Ollama service and alert you.The “Emergency Kill” UpgradeWe will update your previous script to include a Strike System with a hard-kill command.Updated gpu_shield.pyPythonimport time import requests import subprocess from nvitop import Device # --- CONFIGURATION --- TELEGRAM_TOKEN = "YOUR_BOT_TOKEN" CHAT_ID = "YOUR_CHAT_ID" THRESHOLD_PERCENT = 85.0 # High usage threshold STRIKE_LIMIT = 10 # 10 strikes (at 30s intervals = 5 mins of constant high usage) CHECK_INTERVAL = 30 def send_telegram_alert(message): url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage" payload = {"chat_id": CHAT_ID, "text": message, "parse_mode": "Markdown"} try: requests.post(url, json=payload) except: pass def emergency_shutdown(): """Shuts down the Ollama service to protect hardware and stop the theft.""" print(" CRITICAL: Sustained attack detected. Shutting down Ollama...") try: # Stop the systemd service subprocess.run(["sudo", "systemctl", "stop", "ollama"], check=True) # Force kill any lingering processes subprocess.run(["sudo", "pkill", "-9", "ollama"], check=False) return True except Exception as e: print(f"Failed to stop service: {e}") return False def monitor_gpu(): strikes = 0 print(" CyberDudeBivash Active Defense System Engaged...") while True: devices = Device.all() for device in devices: utilization = device.gpu_utilization() if utilization > THRESHOLD_PERCENT: strikes += 1 print(f" High usage: {utilization}% (Strike {strikes}/{STRIKE_LIMIT})") else: if strikes > 0: print(" Usage normalized. Resetting strikes.") strikes = 0 if strikes >= STRIKE_LIMIT: if emergency_shutdown(): msg = " *EMERGENCY SHUTDOWN EXECUTED*\nSustained high GPU load (5+ mins) detected. Ollama has been killed to prevent further theft." else: msg = " *SHUTDOWN FAILED*\nSustained attack detected but could not stop Ollama. Check server immediately!" send_telegram_alert(msg) strikes = 0 # Reset and wait for manual restart time.sleep(CHECK_INTERVAL) if __name__ == "__main__": monitor_gpu() Important: Granting “Kill” PermissionsSince the script needs sudo to stop a system service, you must allow your user to run systemctl stop ollama without a password. Otherwise, the script will hang.
Run: sudo visudo
Add this line at the bottom (replace your_username with your Linux user): your_username ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop ollama, /usr/bin/pkill -9 ollama

The “CyberDudeBivash” Hardened Stack Recap

Infrastructure: Ollama on Localhost.
Gateway: Nginx + SSL + Basic Auth.
Traffic Control: Rate Limiting (Nginx).
Intrusion Detection: Fail2Ban (Bans failed logins).
Active Countermeasure: gpu_shield.py (Kills service if theft occurs).

CyberDudeBivash Final Note: “Authentication keeps out the honest hackers; automation stops the smart ones. You’ve officially turned your server from a victim into a fortress.”

Unlike a standard web hack, a compromised AI server involves unique risks like Model Poisoning (corrupting your AI’s logic) and Resource Hijacking. Here is the definitive recovery checklist.

Post-Incident Recovery Checklist

Immediate Containment

Kill the Service: Stop the Ollama process immediately (sudo systemctl stop ollama) to sever any active attacker connections.
Sever Network Exposure: Bind Ollama to 127.0.0.1 and close port 11434 on your firewall.
Isolate GPU/NPU: In high-security environments, restart the machine to clear the GPU’s VRAM, ensuring no malicious resident code remains in memory.

Eradication & Malware Hunting

Audit Model Integrity: Attackers can upload “poisoned” models. Delete all models in your ~/.ollama/models folder and re-download them from official sources (ollama pull).
Scan for RCE Footprints: Check /tmp and %TEMP% directories for suspicious executables. Exploits like CVE-2024-37032 can leave behind reverse shells or miners.
Check for Persistence: Review your crontab and systemd services for any new, unrecognized entries that might restart a miner or a backdoor.

Forensics & Investigation

Analyze Ollama Logs: Look for high-volume requests in journalctl -u ollama. Note the IP addresses—these are your primary attackers.
Audit Tool-Calling: If you had “tools” or “functions” enabled, check your system logs for unauthorized API calls or database queries executed by the AI.
Monitor for Data Exfiltration: Review outbound network traffic for spikes. Attackers may have used your model to process and “leak” local files.

Hardening & Restoration

Update to Version 0.7.0+: Ensure you are on the latest version to patch the Out-Of-Bounds Write and Path Traversal vulnerabilities.
Reset API Keys: If your Ollama server was connected to other apps (like LangChain or an Nginx proxy), rotate all associated API keys and passwords immediately.
Enable Logging: Configure Nginx to log not just the access, but the specific headers to better track future attempts.

The Clean Slate Strategy

If you suspect deep compromise (RCE), the safest path is to reimage the OS.

CyberDudeBivash Warning: “AI models are data, but they execute like code. If a model was swapped, your entire application’s logic is now untrustworthy. When in doubt, wipe and rebuild.”

Final Summary

Incident Phase	Key Action
Detection	GPU usage spikes + Port 11434 exposure.
Protection	Nginx Reverse Proxy + SSL + Basic Auth.
Monitoring	`gpu_shield.py` + Fail2Ban.
Recovery	Delete local models, update version, and rotate keys.

#AISecurity #LLMSecurity #Ollama #GenerativeAI #ModelInversion #AdversarialAI #AIInfrastructure # CYBERDUDEBIVASH

Cyberdudebivash

Leave a comment Cancel reply