<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Job-Search on hippotion</title><link>https://blog.hippotion.com/tags/job-search/</link><description>Recent content in Job-Search on hippotion</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 13 Feb 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.hippotion.com/tags/job-search/index.xml" rel="self" type="application/rss+xml"/><item><title>🎯 Know the Market Without Job-Hunting: An LLM-Scored Job Poller in n8n</title><link>https://blog.hippotion.com/posts/ats-job-poller/</link><pubDate>Fri, 13 Feb 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/ats-job-poller/</guid><description>You don&amp;rsquo;t have to be job-hunting to want to know your market — what&amp;rsquo;s out there, what it pays, where you&amp;rsquo;d fit. So I built an n8n workflow: it polls the public ATS APIs (Greenhouse/Lever/Ashby) plus a broad remote-jobs feed, filters for remote-EU infra roles, scores each posting against my CV with an LLM, and emails me only the 80%+ matches. No database, no scraping.</description><content:encoded><![CDATA[<p>You don&rsquo;t have to be about to change jobs to want to know the landscape. What&rsquo;s being built, what it
pays, where you&rsquo;d actually fit — staying current on the market (and your own worth) is just good
professional hygiene. The trouble is that <em>checking</em> is tedious, so most of us don&rsquo;t, until we&rsquo;re
already job-hunting and starting cold.</p>
<p>So I automated mine. An <a href="https://n8n.io">n8n</a> workflow on my homelab polls job boards every six hours,
scores each new posting against my profile with an LLM, and emails me only the strong matches — the
ones scoring 80%+. When it&rsquo;s quiet, it&rsquo;s silent. When something genuinely fits, I know the same day.
Here&rsquo;s what I learned building it. Repo at the bottom.</p>
<h2 id="three-apis-cover-most-of-the-market">Three APIs cover most of the market</h2>
<p>Company career pages look bespoke, but underneath, the vast majority run on one of three ATS — and
all three hand you the jobs as unauthenticated JSON:</p>
<ul>
<li><strong>Greenhouse</strong> — <code>boards-api.greenhouse.io/v1/boards/{token}/jobs?content=true</code></li>
<li><strong>Lever</strong> — <code>api.lever.co/v0/postings/{token}?mode=json</code></li>
<li><strong>Ashby</strong> — <code>api.ashbyhq.com/posting-api/job-board/{token}?includeCompensation=true</code></li>
</ul>
<p>No scraping, no headless browser. You poll the API the page itself calls, normalize the three
shapes into one <code>{ company, title, location, remote, url, posted_at, description, external_id }</code>, and
you&rsquo;re done with the hard part.</p>
<h2 id="resolve-the-token-is-half-the-battle">&ldquo;Resolve the token&rdquo; is half the battle</h2>
<p>The naive assumption — <em>the token is the company name, and everyone&rsquo;s on one of the three</em> — is half
right. When I probed my initial wishlist, <strong>roughly half 404&rsquo;d everywhere</strong>: HashiCorp (now under
IBM → Workday), SUSE (SuccessFactors), Aiven (Teamtailor), Hugging Face. They&rsquo;re on a fourth or fifth
system entirely. The honest move was to ship the ~33 that actually resolve and leave the rest as
disabled config stubs. Verify before you trust a slug.</p>
<h2 id="dedup-without-a-database">Dedup without a database</h2>
<p>I didn&rsquo;t want to stand up Postgres just to remember which jobs I&rsquo;d already seen. n8n&rsquo;s <strong>Data Tables</strong>
handle it natively: a <code>seen_jobs</code> table, an <code>external_id</code> namespaced <code>{ats}:{company}:{id}</code>, and the
<code>rowNotExists</code> operation drops anything already recorded. State lives inside n8n, backed up with it.
Zero extra infrastructure.</p>
<p>The ordering matters: <strong>notify first, mark seen second.</strong> The insert only happens after the email
sends, so a failed send retries next run instead of silently swallowing a posting.</p>
<h2 id="the-location-filter-is-a-trap">The location filter is a trap</h2>
<p>My first version kept everything that wasn&rsquo;t explicitly US-based. The inbox filled with <em>&ldquo;Senior
Platform Engineer — Spain (Remote)&rdquo;</em> and <em>&quot;… — United Kingdom (Remote)&quot;</em>. Those aren&rsquo;t remote-for-me
— they&rsquo;re remote <em>if you live in Spain</em>. Useless from where I sit.</p>
<p>The fix was to invert the logic. Keep only three things:</p>
<ul>
<li>globally-remote / worldwide / anywhere,</li>
<li>pan-EU (EMEA / Europe / EU / EEA),</li>
<li>my own country.</li>
</ul>
<p>…and <strong>drop single-country remote</strong>, even EU ones. Region and home matches win over the country
deny-list, ambiguous locations are kept (a missed match is worse than one extra line to skim). That
one change cut the noise more than anything else.</p>
<h2 id="let-an-llm-read-the-actual-job">Let an LLM read the actual job</h2>
<p>Keyword + location filtering gets you a candidate list, but it can&rsquo;t tell a &ldquo;Platform Engineer&rdquo; who
herds Kubernetes from a &ldquo;Platform Engineer&rdquo; who owns a Figma design system. The job description can.</p>
<p>So the last step scores each new posting against my CV. My first version batched all of them into
<strong>one</strong> big LLM call — which promptly timed out on the free tier. The fix was the opposite: <strong>one
small call per job</strong>, which also means a single slow or rate-limited job never sinks the batch. Each
call asks a <a href="https://build.nvidia.com">NVIDIA NIM</a> model (Llama 3.1 8B, OpenAI-compatible) for one
number and a reason:</p>
<blockquote>
<p>Score this job 0–100 for fit against my profile. Return <code>{score, reason}</code>.</p>
</blockquote>
<p>That score is what lets me <strong>widen the net instead of narrowing it.</strong> On top of the curated company
list I pull a broad remote-jobs feed (every company, all categories); the cheap keyword + location
filters do the first pass, then I <strong>only email the roles scoring 80%+.</strong> Casting wide is fine when a
model is the bar at the door. A line ends up looking like:</p>
<blockquote>
<p><strong>92%</strong> — <em>Grafana Labs</em> — Senior Platform Engineer (Remote, EMEA) — <em>strong k8s/GitOps overlap</em> — link</p>
</blockquote>
<p>Scoring is fail-safe: if a call hiccups, that job is just skipped, and every posting gets marked seen
either way — so nothing re-scores forever, and a rare bad run never floods or stalls the inbox.</p>
<h2 id="the-unglamorous-bits-that-make-it-trustworthy">The unglamorous bits that make it trustworthy</h2>
<ul>
<li><strong>One bad source can&rsquo;t kill the run</strong> — every fetch is wrapped; failures become a <code>⚠️ N sources failing</code> footer so a company quietly changing ATS is visible, not invisible.</li>
<li><strong>A prime run</strong> seeds the table silently the first time, so I&rsquo;m not buried under every currently-open
role on day one.</li>
<li><strong>Everything tunable lives in one Config node</strong> — companies, keywords, location lists, the profile,
the model — so adding a company is a one-line edit, not a graph safari.</li>
</ul>
<h2 id="takeaways">Takeaways</h2>
<ul>
<li>The &ldquo;scrape job boards&rdquo; problem mostly isn&rsquo;t a scraping problem — it&rsquo;s three public APIs and a
normalizer.</li>
<li>For personal automation, reach for the boring-but-correct primitive: native dedup state beats a
database you have to operate.</li>
<li>An LLM works best here as the <strong>bar at the door</strong>: cheap deterministic filters keep the candidate
set (and the cost) small, then the model gates on real fit — which is what lets you cast a wide net
without drowning in it.</li>
</ul>
<p>Workflow JSON, the full node-by-node breakdown, and setup notes:
<strong><a href="https://github.com/janos-gyorgy/ats-job-poller">github.com/janos-gyorgy/ats-job-poller</a></strong>.</p>
]]></content:encoded></item></channel></rss>