redesigndavid.com - Building an Automatic Image Offloader

Building an Automatic Image Offloader

*In this post, you’ll learn how to build an automatic image offloader with FastAPI and S3 that rewrites image links with temporary presigned URLs to prevent link rot, improve security, and speed up your blog.*

There are a few reasons why one might wish to have their very own `Automatic Image Offloader`. A few that come to mind are protection from potentially broken image links--image links that worked at the time you were composing your precious blog post, but break later when the original author decides to take them down. There's also the possibility that an image that was once a cute little teddy bear gets replaced with something NSFW. These are things you have no control over.

It is also possible the image was being hosted on a slow server. Or the original host has tracked excessive hotlink traffic coming from your site and decided to cut you off.

Slow, broken or swapped images could very well be minor cosmetic concerns. There are however more serious security concerns. Linking directly to an image hosted on another website could lead to malicious attacks. A website with good intentions hosting royalty free images for the public good can get compromised. Suddenly every image they serve would now have malicious code embedded in the image. You might not get affected, but your readers would be.

## Manual Image Offloading
It's true. An automatic image offloader is not the only solution. You can always do all the work yourself. It is pretty simple. When you write your post and you find a royalty free image you want to use, rather than linking to it directly, you upload to some storage you control and use that link. If you've been doing that and you have no problem remembering to do all those steps, this post is not for you.

I can never be trusted to remember so many extra steps. It's hard enough to think of content for your blog, having to do extra steps just get in the way of creativity. My background is in visual effects pipeline engineering. My whole career is automating mundane tasks for artists so they may spend their time doing creative things. In other words, I cannot in good conscience allow a writer to spend 5 minutes jumping through hoops, downloading and uploading images, copy pasting links. I'd rather the writer will spend those 5 minutes thinking and breathing writing. So this post is really for them.

## The Setup
An automatic image uploader can be setup in many ways. Most of the time you'll be limited by the way your website is set up. If you're using Wordpress, you'll need to look for or create a plugin that performs the same task. If you're using `fastapi` as your backend and `react` as a frontend, you're in luck. You might be able to just copy and paste my code.

The way I decided to implement my Automatic Image Offloader is 2-fold. The first is an endpoint on my fastapi backend that automatically uploads images to an S3 bucket, where it'll stay there until I decide to remove it, and then returns a presigned url that expires after a certain time. The next part is a React image component that acts like any other img tag except that it'll replace the img src with the temporary presigned url.

Let me just note that I could have easily skipped the fastapi endpoint and choose to have the react component perform the image upload as well as the link swap. I opted to use the fastapi endpoint because at some point I want to be able to sanitize the image and upload different resolutions.

## The FastAPI endpoint
I'm making an assumption that you already have a `fastapi` application setup, and so you'd want to extend it by adding an additional router.
```python
from fastapi import APIRouter

router = APIRouter()
```

Then you'll need to add that router to your app like this:
```python
from yourmodule import image
app = FastAPI(lifespan=lifespan)
app.include_router(image.router, tags=["Image"])
```

I've decided to use S3 to host my images. I've setup my bucket and generated a secret key. On my environment or my .env, I have:
```bash
AWS_ACCESS_KEY_ID=ABCDEFGSECRET
AWS_SECRET_ACCESS_KEY=ABCDEFGSECRET12345428888888888888
AWS_BUCKET_NAME=secretbucketname
AWS_REGION_NAME=myregion
AWS_ENDPOINT_URL=endpoint
```

The values I posted above are fake, you can't copy paste those values. You'll need to generate your own access key and id for your own s3 bucket.

Then you'll need to install `boto3`. If you're just using a virtual environment, you can use:
```sh
pip3 install boto3
```

I put my requirements on my `setup.py` as I feel it's neater that way.

Then you'll need a method you can use to get an S3 client.
```python
import boto3
import os

def get_s3_client():
"""Get an S3 client."""
session = boto3.Session(
aws_access_key_id= os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
)
client = session.client(
"s3",
region_name= os.getenv("AWS_REGION_NAME"),
endpoint_url= os.getenv("AWS_ENDPOINT_URL"),
)
return client
```

Now every time you can want to use your S3 service in an endpoint, you do so by using the method wrapped by FastAPI's Dependency Injection. For example:
```python
from typing import Annotated
from fastapi import Depends
import botocore

@router.get("/foo")
def get_foo(
s3_client: Annotated[botocore.client.BaseClient, Depends(get_s3_client)],
):
return {"result": "Yes I can use an s3 client"}
```

Before uploading anything to your S3 instance, you need to decide what filename and folder structure you'll use. I decided I don't want to deal with folders because it adds extra meaning to objects on my bucket that I really don't want to worry or think about. I also decided I'll want to use a hash so that 2 files with the same filename but different sources won't stomp on each other. I am using `hashlib.md5` to get a hash because I need the hash to be consistent across any machine I run this on.

```python
import hashlib

def get_pathname_hash(pathname: str) -> str:
"""Get an md5 hash of pathname."""
return hashlib.md5(string=bytes(path, encoding="utf8")).hexdigest()
```

You'll also need a method you can use to download files before you can upload them to your S3 bucket. This implementation keeps the files in memory rather than downloading to disk. This could be dangerous. If you accidentally link to an image that is bigger than the available memory on the VM your fastapi service is running on, your service will go down. A safer solution might be to save them to disk using a temporary folder, but you'll need to clean up after so as not not fill up the disk your fastapi instance is running on. The 2 have their pros and cons, you’ll need to weigh the trade-offs and choose the right approach for your use case.

```python
import requests

def get_url_content(url) -> bytes:
"""Get url content."""
response = requests.get(url, stream=True)
response.raise_for_status()
return response.content
```

Finally, you can setup your endpoint. I'm using `/static` for mine.

```python
@router.get("/static")
def get_presigned_url(
path: str,
s3_client: Annotated[botocore.client.BaseClient, Depends(get_s3_client)],
):
"""Get presigned s3 url for image."""
hash = get_pathname_hash(path)

# Check if the hash already exists in the bucket.
if not s3_client.list_objects_v2(
Bucket=os.getenv("AWS_BUCKET_NAME"),
Prefix=hash,
).get("Contents"):
content = get_url_content(path)
s3_client.put_object(
Body=content,
Key=hash,
Bucket=os.getenv("AWS_BUCKET_NAME"),
)
# Finally generate a presigned url
return s3_client.generate_presigned_url(
"get_object",
Params={"Bucket": os.getenv("AWS_BUCKET_NAME"), "Key": hash},
ExpiresIn=3600, # 3600 seconds or 1 hour
)
```
This will return a `str` link to the image on your S3 instance, and the link will only work for 1 hour.

All together, your endpoint will look like this:
```python

from typing import Annotated

import boto3
import botocore
import requests
import hashlib
from fastapi import APIRouter, Depends

router = APIRouter()

def get_pathname_hash(pathname: str) -> str:
"""Get an md5 hash of pathname."""
return hashlib.md5(string=bytes(path, encoding="utf8")).hexdigest()

def get_url_content(url) -> bytes:
"""Get url content."""
response = requests.get(url, stream=True)
response.raise_for_status()
return response.content

@router.get("/static")
def get_presigned_url(
path: str,
s3_client: Annotated[botocore.client.BaseClient, Depends(get_s3_client)],
):
"""Get presigned s3 url for image."""
hash = get_pathname_hash(path)

# Check if the hash already exists in the bucket.
if not s3_client.list_objects_v2(
Bucket=os.getenv("AWS_BUCKET_NAME"),
Prefix=hash,
).get("Contents"):
content = get_url_content(path)
s3_client.put_object(
Body=content,
Key=hash,
Bucket=os.getenv("AWS_BUCKET_NAME"),
)
# Finally generate a presigned url
return s3_client.generate_presigned_url(
"get_object",
Params={"Bucket": os.getenv("AWS_BUCKET_NAME"), "Key": hash},
ExpiresIn=3600, # 3600 seconds or 1 hour
)
```

## React Component
The React component is pretty simple. As mentioned, all it needs to do is get a src, replace it with the temporary presigned link.
```jsx
import { useEffect, useState } from "react";
import axios from "axios";

function Image(props) {
const { node, src, ...imgprops } = props;
const [imageUrl, setImageUrl] = useState(src);
useEffect(() => {
axios.get(`${process.env.API_ROOT}/static\?path=${props.src}`).then(
(res) => setImageUrl(res.data));
}, []);
return (
<>
<img src={imageUrl} {...imgprops} />
</>
);
}
export default Image;

```

Discussion