Migrating Images and Assets to Sanity

Asset migration is the part of every CMS migration that takes twice as long as planned. Documents are easy — they’re structured JSON, the schema is yours to define, and you control the import. Assets are the messy half: hotlinked URLs, missing alt text, broken redirects, files that pretend to be images but are actually PDFs, and references buried inside HTML you’ll need to rewrite later.

This guide walks through a repeatable process that works whether you’re coming from WordPress, Contentful, Drupal, a folder of Markdown files, or a custom CMS. The patterns are the same: inventory, model, upload, rewrite, validate, clean up.

Why Assets Need Their Own Playbook

Sanity stores assets in a content-addressable pool keyed by SHA1 hash. Uploading the same file twice returns the same asset ID — no duplicate, no extra storage. That property is the foundation of a safe migration: the upload step is naturally idempotent, so you can re-run it after a partial failure without creating duplicates.

The catch is that references to those assets are not idempotent. A migrated document with a broken image reference stays broken until you fix it. The whole job is keeping the upload step and the reference-rewriting step in lockstep, with a persisted map between them.

Note

Sanity deduplicates by file content, not filename. Two hero.jpg files with different bytes become two separate assets. The same bytes uploaded under two different filenames become one asset. Lean on this — never try to deduplicate yourself before uploading.

Step 1: Inventory Source Assets

Before uploading anything, build a manifest. You’re trying to answer: for every asset I need to migrate, where does it live now and what document references it?

// scripts/build-asset-manifest.ts
import { writeFileSync } from "node:fs";

interface AssetRecord {
  sourceUrl: string;
  filename: string;
  alt?: string;
  caption?: string;
  mimeType?: string;
  parentDocumentId: string;
  field: string; // e.g. "featuredImage", "body[3].image"
}

const manifest: AssetRecord[] = [];

for (const post of sourcePosts) {
  // Featured image
  if (post.featured_media_url) {
    manifest.push({
      sourceUrl: post.featured_media_url,
      filename: post.featured_media_url.split("/").pop() ?? "image",
      alt: post.featured_media_alt,
      parentDocumentId: post.id,
      field: "featuredImage",
    });
  }

  // Inline images in body HTML
  const inlineMatches = [...post.content.matchAll(/<img[^>]+src="([^"]+)"[^>]*>/g)];
  for (const [, src] of inlineMatches) {
    manifest.push({
      sourceUrl: src,
      filename: src.split("/").pop()?.split("?")[0] ?? "image",
      parentDocumentId: post.id,
      field: "body",
    });
  }
}

writeFileSync("export/asset-manifest.json", JSON.stringify(manifest, null, 2));
console.log(`Manifest: ${manifest.length} asset records, ${new Set(manifest.map((r) => r.sourceUrl)).size} unique URLs`);

The manifest is the source of truth for everything that follows. Spend extra time on it — you want to discover that 12% of your inline images are hotlinked to external sites now, not after you’ve started uploading.

Tip

Strip query strings off image URLs when building filenames, but keep them on the source URL for fetching. CDNs often append cache-busting params that aren’t part of the file identity.

Step 2: Model Your Image Schema

Don’t use a bare image field. Wrap it in a reusable object so every image in your dataset has alt text, optional captions, and hotspot enabled.

// schemas/objects/figure.ts
import { defineType, defineField } from "sanity";

export const figure = defineType({
  name: "figure",
  title: "Figure",
  type: "image",
  options: {
    hotspot: true, // CRITICAL — lets editors set focal points
  },
  fields: [
    defineField({
      name: "alt",
      title: "Alt text",
      type: "string",
      description: "Describe the image for screen readers and search engines.",
      validation: (rule) =>
        rule.custom((value, context) => {
          // Decorative images can have empty alt, but require a flag
          const parent = context.parent as { decorative?: boolean } | undefined;
          if (parent?.decorative) return true;
          if (!value) return "Alt text is required for non-decorative images";
          return true;
        }),
    }),
    defineField({
      name: "decorative",
      title: "Decorative (no alt text needed)",
      type: "boolean",
      initialValue: false,
    }),
    defineField({
      name: "caption",
      type: "string",
    }),
  ],
});

Use it in document schemas instead of type: "image":

defineField({
  name: "featuredImage",
  type: "figure",
})

Warning

Always enable hotspot: true on image fields. Without it, editors can’t control cropping and the URL builder will center-crop blindly — faces, logos, and product details get sliced off in responsive layouts.

For non-image files (PDFs, video, audio), use a separate file type with the same wrapper pattern. Sanity’s asset pipeline handles both, but the schema field type and upload call are different.

Step 3: Upload Assets to Sanity

Now fetch each source URL and upload it. Two things matter here: concurrency control so you don’t get rate-limited, and a persistent asset map so you can resume after a crash.

// scripts/upload-assets.ts
import { createClient } from "@sanity/client";
import { readFileSync, writeFileSync, existsSync } from "node:fs";
import pLimit from "p-limit";

const client = createClient({
  projectId: "your-project-id",
  dataset: "production",
  apiVersion: "2024-01-01",
  token: process.env.SANITY_TOKEN,
  useCdn: false,
});

const manifest = JSON.parse(readFileSync("export/asset-manifest.json", "utf-8"));

// Resume from a previous run if the map already exists
const assetMap: Record<string, string> = existsSync("export/asset-map.json")
  ? JSON.parse(readFileSync("export/asset-map.json", "utf-8"))
  : {};

const uniqueUrls = [...new Set(manifest.map((r: any) => r.sourceUrl))] as string[];
const limit = pLimit(5); // 5 concurrent uploads

let completed = 0;

await Promise.all(
  uniqueUrls.map((sourceUrl) =>
    limit(async () => {
      if (assetMap[sourceUrl]) {
        completed++;
        return; // Already uploaded
      }

      try {
        const response = await fetch(sourceUrl);
        if (!response.ok) {
          console.warn(`Skip ${sourceUrl}: HTTP ${response.status}`);
          return;
        }

        const contentType = response.headers.get("content-type") ?? "";
        const assetType = contentType.startsWith("image/") ? "image" : "file";

        const buffer = Buffer.from(await response.arrayBuffer());
        const filename = sourceUrl.split("/").pop()?.split("?")[0] ?? "asset";

        const record = manifest.find((r: any) => r.sourceUrl === sourceUrl);

        const asset = await client.assets.upload(assetType, buffer, {
          filename,
          title: record?.filename,
          description: record?.alt,
        });

        assetMap[sourceUrl] = asset._id;
        completed++;

        // Persist after every upload so a crash doesn't lose progress
        writeFileSync("export/asset-map.json", JSON.stringify(assetMap, null, 2));

        console.log(`[${completed}/${uniqueUrls.length}] ${filename} -> ${asset._id}`);
      } catch (err) {
        console.error(`Failed ${sourceUrl}: ${err}`);
      }
    }),
  ),
);

console.log(`Done. ${Object.keys(assetMap).length} of ${uniqueUrls.length} assets uploaded.`);

Three details that save real time:

  • pLimit(5) keeps you under Sanity’s API rate limits and prevents memory blow-up on large migrations. Tune it based on file sizes.
  • Writing the map after every success means a crash at asset #4,200 of 5,000 doesn’t restart from zero.
  • Detecting MIME type from the response header routes PDFs and video to client.assets.upload("file", ...) automatically.

Tip

Sanity’s asset deduplication runs server-side on SHA1 of the file contents. You can re-run this script as many times as you need — duplicate uploads return the existing asset ID instantly, no extra storage charged.

Step 4: Rewrite Document References

The asset map is now the bridge between source URLs and Sanity asset IDs. The reference-rewriting step uses it to fix every document that pointed at the old URLs.

For top-level image fields, this is straightforward:

function buildImageField(sourceUrl: string, alt: string | undefined) {
  const assetId = assetMap[sourceUrl];
  if (!assetId) return undefined;

  return {
    _type: "figure",
    asset: { _type: "reference", _ref: assetId },
    alt: alt ?? "",
    decorative: !alt,
  };
}

Inline images inside Portable Text are trickier because they live as blocks inside the body array. If you converted HTML with @portabletext/block-tools, your custom rules may have stashed the source URL on a marker field — now’s when you replace it with a real asset reference and add an _key.

import { randomUUID } from "node:crypto";

function rewriteImagesInBody(blocks: any[]) {
  return blocks.map((block) => {
    if (block._type === "figure" && block._sanity?.source) {
      const assetId = assetMap[block._sanity.source];
      if (!assetId) {
        console.warn(`No asset for ${block._sanity.source} — leaving marker`);
        return block;
      }
      return {
        _type: "figure",
        _key: block._key ?? randomUUID().slice(0, 12),
        asset: { _type: "reference", _ref: assetId },
        alt: block.alt ?? "",
      };
    }
    return block;
  });
}

Warning

Every array member in Sanity needs a stable _key. If you forget it on Portable Text image blocks, the Studio will throw “missing keys” errors and patches against those documents will fail in confusing ways. Generate one during the rewrite if the source didn’t supply one.

Step 5: Validate the Asset Migration

After running the upload and rewrite scripts, run these GROQ queries against your staging dataset to catch problems before they reach production.

// Documents with image fields that didn't resolve
*[_type == "article" && defined(featuredImage) && !defined(featuredImage.asset)]{
  _id, title
}

// Documents missing alt text on featured images
*[_type == "article" && defined(featuredImage.asset) && !defined(featuredImage.alt) && featuredImage.decorative != true]{
  _id, title
}

// Portable Text blocks that still have unresolved source markers
*[_type == "article" && count(body[_type == "figure" && defined(_sanity.source)]) > 0]{
  _id, title,
  "unresolved": body[_type == "figure" && defined(_sanity.source)]._sanity.source
}

// Asset count by mime type
count(*[_type == "sanity.imageAsset"])
count(*[_type == "sanity.fileAsset"])

Compare counts against the manifest. Discrepancies tell you exactly what slipped through — usually hotlinked images that returned 404, redirects you didn’t follow, or content types you didn’t expect.

Querying Images on the Frontend

This is the part of the sanity-best-practices skill that bites people most often: LQIP and dimensions are not included automatically. You have to ask for them explicitly, or blur-up placeholders won’t work and you’ll get layout shift.

*[_type == "article" && slug.current == $slug][0]{
  title,
  featuredImage {
    asset->{
      _id,
      url,
      metadata {
        lqip,                          // base64 blur placeholder
        dimensions { width, height }   // for aspect ratio + width/height
      }
    },
    alt,
    hotspot,
    crop
  },
  body[]{
    ...,
    _type == "figure" => {
      asset->{
        _id,
        url,
        metadata { lqip, dimensions { width, height } }
      },
      alt,
      hotspot,
      crop
    }
  }
}

Project hotspot and crop whenever you’re going to crop or resize on the frontend — without them, @sanity/image-url falls back to a center crop and ignores the focal point editors set.

Step 6: Clean Up Orphans

Sanity does not garbage collect orphaned assets. An asset uploaded during a failed dry run, or referenced by a document you later deleted, will sit in your project forever and count against your storage quota.

// Find image assets with no incoming references
*[_type == "sanity.imageAsset" && count(*[references(^._id)]) == 0]{
  _id, originalFilename, size
}

Inspect the list. Anything that’s clearly migration debris can be deleted with the client:

const orphans = await client.fetch<{ _id: string }[]>(
  `*[_type == "sanity.imageAsset" && count(*[references(^._id)]) == 0]{ _id }`,
);

for (const { _id } of orphans) {
  await client.delete(_id);
}

Warning

Run this against a staging dataset first and review the list manually. Some “orphans” may be assets referenced from drafts, scheduled documents, or fields that aren’t part of your main schema — deletion is permanent and won’t ask twice.

Common Pitfalls

Hotlinked images on dead domains. Old WordPress posts often reference images on third-party servers that no longer exist. Decide upfront: do you skip them, replace with a placeholder, or fail the migration? Don’t let your script silently leave broken markers everywhere.

Redirects. Some CDNs return 301s instead of serving the file directly. Make sure your fetch follows redirects (the default in modern Node), and log the final URL — it sometimes differs from the manifest.

Filenames with no extension. Cloud storage URLs occasionally serve images from extension-less paths. Sanity will accept them, but the asset’s originalFilename will look wrong in the Studio. Sniff the MIME type and append the right extension before uploading if presentation matters.

EXIF stripping and rotation. Sanity reads orientation EXIF tags and rotates images automatically — but only when uploading via the API. Images that look correct in the source CMS may appear rotated after migration if the source CMS was also doing client-side rotation but stripping EXIF. Spot-check portrait photos.

SVGs with embedded scripts. Sanity blocks SVG uploads by default for security reasons. If you genuinely need them, use client.assets.upload("file", ...) instead of "image" — they’ll go in as file assets and won’t be processed as images, but you also lose the image transformation pipeline.

Migration Checklist

  • Build a manifest of every asset source URL, with parent document and field
  • Define a figure object type with hotspot: true and required alt text
  • Switch document schemas from type: "image" to type: "figure"
  • Upload assets with controlled concurrency, persisting the asset map after each success
  • Detect MIME type from response headers and route files vs images correctly
  • Rewrite top-level image fields with the asset map
  • Rewrite Portable Text image blocks, generating _key values where missing
  • Validate with GROQ — broken refs, missing alt, unresolved markers
  • Update frontend queries to project metadata.lqip, dimensions, hotspot, crop
  • Delete orphaned assets after a manual review on staging
  • Run the full pipeline against a cloned dataset before production