# WebDAV Cross-Server COPY Integration

> **Active Project** — design and implementation in progress.

**Created:** 2026-03-13
**Status:** Design
**Source:** [webdav-cross-server-copy.md](https://simtable.acequia.io/acequia/docs/webdav-cross-server-copy.md) — test report from gsd.acequia.io → waldo.acequia.io
**Test vehicle:** WebDavSync app (`acequia/stigmergic/WebDavSync/`)

## Problem

When a user on a slow connection needs to copy a file between two Acequia nodes on the same LAN (or same datacenter), every byte currently crosses the internet twice: download to client, upload to destination. WebDAV COPY with a foreign Destination URI should let the source server push directly to the destination server. The client sends two tokens and a URL; the bytes flow server-to-server.

## Current State

**Nephele already attempts cross-server COPY** — it parses foreign Destination URIs and makes outbound requests, returning 207 Multi-Status. But it fails because:

1. **No credential forwarding** — the outbound request to the destination is unauthenticated
2. **`pathsHaveSameAdapter()` guard** — `COPY.js:49` and `MOVE.js:48` reject cross-adapter destinations with `ForbiddenError`

The test report confirmed: nephele returns 207 with a 404 from the destination (auth rejection), and ignores `TransferHeaderAuthorization` entirely.

## Protocol: TPC TransferHeader Convention

From CERN's grid storage community (dCache, StoRM, EOS). Simple prefix-stripping:

```
Client sends:
  Authorization: Bearer <source-token>
  TransferHeaderAuthorization: Bearer <destination-token>
  Destination: https://dest-server.com/path/file.txt

Source server strips "TransferHeader" prefix, forwards to destination:
  Authorization: Bearer <destination-token>
  PUT /path/file.txt
```

Any `TransferHeader*` request header gets the prefix stripped and relayed. Transport-agnostic — works over HTTP, WebSocket, WebRTC.

## Implementation Strategy

### Approach: Express Middleware Before Nephele

Intercept COPY/MOVE in `server.mjs` before they reach nephele. This avoids patching nephele and keeps the logic in our own code.

**Location:** `server.mjs`, before the final `server(req, res, next)` call (currently ~line 736).

```
Middleware stack:
  ... existing auth, subdomain, static file middleware ...
  → NEW: crossServerCopy middleware
  → nephele server (handles same-server COPY/MOVE as before)
```

### Implementation: `src/crossServerCopy.mjs`

```js
// Pseudo-code for the middleware

export function crossServerCopyMiddleware(req, res, next) {
  if (req.method !== 'COPY' && req.method !== 'MOVE') return next()

  const dest = req.headers['destination']
  if (!dest) return next()  // let nephele handle missing Destination

  let destUrl
  try { destUrl = new URL(dest) } catch { return next() }

  // Same-host → let nephele handle it
  const sourceHost = req.headers.host
  if (destUrl.host === sourceHost) return next()

  // Cross-server: handle here
  handleCrossServerTransfer(req, res, destUrl)
}
```

### Core Logic

```
1. Authenticate source request (already done by upstream middleware)
2. Read source file content via local filesystem adapter
3. Collect TransferHeader* headers, strip prefix
4. PUT content to destination URL with forwarded headers
5. If MOVE and PUT succeeded, DELETE source
6. Return 201 (Created) or 207 (Multi-Status) with per-resource results
```

### Key Decisions

**Read source locally, not via HTTP.** The middleware runs on the source server with filesystem access. No need to self-request — read directly from the adapter or filesystem path.

**Stream, don't buffer.** Large files must be piped from the local read stream to the outbound PUT request. See "Large File Considerations" below for details.

**Recursive COPY for collections.** If the source is a directory, PROPFIND it locally, then iterate:
- MKCOL each directory on the destination
- PUT each file
- Collect per-resource status for 207 response

**MOVE = COPY + DELETE.** Only delete the source after successful destination write. If any file in a recursive MOVE fails to copy, don't delete the source (partial move is worse than no move).

### Header Forwarding

```js
function extractTransferHeaders(reqHeaders) {
  const forwarded = {}
  for (const [key, value] of Object.entries(reqHeaders)) {
    if (key.toLowerCase().startsWith('transferheader')) {
      const realHeader = key.slice('transferheader'.length)
      // Capitalize: transferheaderauthorization → Authorization
      forwarded[realHeader] = value
    }
  }
  return forwarded
}
```

### Auth Considerations

**Source auth:** Already verified by jwtAuth.mjs upstream. The middleware knows the authenticated user and their scoped paths. COPY requires read access to the source path.

**Destination auth:** Entirely the client's responsibility. The client provides the destination token via `TransferHeaderAuthorization`. The source server forwards it blindly — it doesn't verify or inspect the destination token.

**Chain tokens work naturally.** A client with a delegation chain for server A and a separate chain for server B provides both as Bearer tokens. Each server verifies its own.

**Scope check:** The existing `ScopedFileSystemAdapter.isAuthorized()` already classifies COPY as a read method (`serverFactories.mjs:50`). The middleware should respect this — the source token needs read access to the source path, nothing more.

### Error Handling

| Scenario | Response |
|----------|----------|
| Destination unreachable | 502 Bad Gateway |
| Destination auth rejected | 207 with per-resource 401/403 |
| Source file not found | 404 (before reaching middleware, handled by nephele) |
| Partial recursive failure | 207 Multi-Status with per-resource results |
| Destination already exists + Overwrite: F | 412 Precondition Failed |

## Large File Considerations

### How Nephele Handles Large Files Today

Nephele is fully streaming with no in-memory buffering at any stage:

- **GET:** `resource.getStream()` returns a Node.js Readable, piped directly to the HTTP response with backpressure handling (pause/resume on drain). Supports `Range` headers for partial content.
- **PUT:** Request body stream piped directly to `resource.setStream()`, which opens a file write stream and pipes input to disk. No buffering.
- **Local COPY:** Uses `fsp.copyFile()` — a kernel-level operation (copy-on-write where the OS supports it). The file never passes through JS memory.
- **ETags:** By default (`contentEtagMaxBytes: -1`), ETags use metadata only (size + timestamps). No file content read for ETag calculation on large files.
- **No body size limits** on WebDAV routes. Express JSON middleware (1MB limit) only applies to `/auth` API endpoints.

### Cross-Server COPY: The Streaming Challenge

Local COPY uses `fsp.copyFile()` (zero-copy). Cross-server COPY cannot — it must read the source file and PUT it to the remote server over HTTP. This is fundamentally different:

| Aspect | Local COPY | Cross-Server COPY |
|--------|-----------|-------------------|
| Data path | Kernel copy-on-write | Read stream → HTTP PUT |
| Memory | Zero (kernel handles it) | Stream buffer (~64KB chunks) |
| Bottleneck | Disk I/O | Network bandwidth |
| Failure mode | Atomic (succeeds or fails) | Partial transfer possible |
| Duration | Milliseconds for most files | Seconds to minutes for large files |

### Middleware Implementation for Large Files

**Use `node:stream/promises` `pipeline()`.** This is the correct primitive:

```js
import { pipeline } from 'node:stream/promises'
import { createReadStream } from 'node:fs'

async function copyFileToRemote(localPath, destUrl, destHeaders) {
  const stat = await fsp.stat(localPath)
  const readStream = createReadStream(localPath)

  const resp = await fetch(destUrl, {
    method: 'PUT',
    headers: {
      ...destHeaders,
      'Content-Length': String(stat.size),
      'Content-Type': 'application/octet-stream',
    },
    body: readStream,       // Node 18+ supports Readable as fetch body
    duplex: 'half',         // Required for streaming request bodies
  })

  if (!resp.ok) throw new Error(`Destination returned ${resp.status}`)
}
```

**Why `pipeline()` + `fetch`, not `http.request`:**
- `fetch` with a Readable body is supported in Node 18+ and handles backpressure natively
- `Content-Length` is set from `stat.size`, so the destination knows the full size upfront (no chunked encoding needed)
- Same API shape as browser `fetch`, making the logic portable to BrowserDAV

**Why not `node-fetch` or `undici` directly:** Built-in `fetch` (backed by undici) is sufficient and avoids an extra dependency.

### Failure Modes at Scale

| Failure | Effect | Mitigation |
|---------|--------|------------|
| Network drop mid-transfer | Partial file on destination | Destination server should reject incomplete PUT (Content-Length mismatch). Return 502. |
| Destination disk full | PUT fails with 507 | Report in 207 Multi-Status per-resource result |
| Source server restart mid-stream | Read stream destroyed | fetch body errors, destination rejects. Return 502. |
| Client disconnects | Middleware still running | Check `req.socket.destroyed` periodically; abort outbound fetch if client gone. Use `AbortController`. |
| Timeout on slow network | Transfer stalls | Set per-file timeout proportional to file size (e.g., 30s + 1s per MB). Use `AbortSignal.timeout()`. |

### Client Disconnect Handling

The cross-server transfer is fire-and-forget from the client's perspective — the client sends a COPY request, the server handles the transfer. But if the client disconnects, the server should not keep transferring indefinitely:

```js
async function handleCrossServerCopy(req, res, destUrl) {
  const controller = new AbortController()

  // Abort outbound transfer if client disconnects
  req.on('close', () => {
    if (!res.writableEnded) controller.abort()
  })

  const readStream = createReadStream(localPath)
  const resp = await fetch(destUrl, {
    method: 'PUT',
    body: readStream,
    signal: controller.signal,
    duplex: 'half',
  })
  // ...
}
```

### Recursive Directory COPY: Concurrency

For directory trees, files should be transferred with bounded concurrency — not all at once (memory/socket exhaustion) and not sequentially (slow):

```
- MKCOL directories depth-first (sequential, fast)
- PUT files with concurrency limit (e.g., 4 parallel transfers)
- Collect per-resource status for 207 response
```

A simple semaphore or `p-limit` pattern works. The concurrency limit should be configurable (default 4, higher for LAN transfers).

### Progress and Resumption (Future)

For initial implementation: no progress, no resumption. A failed transfer returns an error and the client retries.

For future consideration:
- **TPC Perf-Markers:** The CERN convention sends periodic progress headers during long transfers. Complex to implement over standard HTTP responses.
- **Resumable uploads:** If the destination supports `Content-Range` on PUT, partial transfers could be resumed. This requires coordination between source and destination that goes beyond the basic TPC model.
- **WebSocket progress channel:** The client could open a WebSocket to the source server for real-time progress on a long-running COPY. This fits the Acequia model better than HTTP polling.

## Browser-Side (BrowserDAV)

The BrowserDAV app at `acequia/stigmergic/BrowserDAV/` advertises COPY/MOVE support but doesn't implement them (`WebDAVFileSystem.js` returns 405).

For browser-to-server or browser-to-browser cross-server COPY:

1. Add COPY/MOVE handlers to `WebDAVFileSystem.js`
2. Same `TransferHeader*` prefix-stripping logic
3. For browser-to-browser: the "outbound PUT" goes over WebRTC data channel via the acequia routing layer

The JavaScript implementation can be shared between Node.js (nephele middleware) and browser (BrowserDAV handler).

## Testing with WebDavSync

The WebDavSync app (`acequia/stigmergic/WebDavSync/`) currently does client-mediated copy: `sourceClient.get()` → `destClient.put()`. It's the ideal test vehicle because:

1. It already has dual-server auth (separate tokens for source and destination)
2. It has a sync engine that compares directory trees and copies diffs
3. It has progress tracking and per-file error handling

### Test Plan

**Phase 1 — Server middleware (nephele → nephele):**
1. Implement `crossServerCopy.mjs` middleware
2. Deploy to two test nodes (e.g., stigmergic.acequia.live and simtable.acequia.io)
3. curl test: `COPY` with `TransferHeaderAuthorization` between them
4. Verify 201 response and file appears on destination

**Phase 2 — WebDavSync integration:**
1. Add a "server-side copy" mode to SyncEngine
2. When both endpoints are nephele servers, use COPY with TransferHeader instead of get+put
3. Compare transfer times: client-mediated vs server-to-server
4. Test with large files to validate streaming

**Phase 3 — BrowserDAV:**
1. Implement COPY/MOVE in WebDAVFileSystem.js
2. Test browser-as-source → nephele destination
3. Test nephele-as-source → browser destination (browser needs to handle inbound COPY)

## Files to Create/Modify

| File | Action | Description |
|------|--------|-------------|
| `src/crossServerCopy.mjs` | Create | Express middleware for cross-server COPY/MOVE |
| `server.mjs` | Modify | Mount middleware before nephele |
| `acequia/stigmergic/BrowserDAV/webdav/WebDAVFileSystem.js` | Modify | Add COPY/MOVE handlers |
| `acequia/stigmergic/WebDavSync/src/webdav/sync-engine.js` | Modify | Add server-side copy mode |
| `test/cross-server-copy.test.mjs` | Create | Integration tests |

## Open Questions

1. ~~**Streaming large files:**~~ **Resolved.** Use `createReadStream()` + built-in `fetch` with `duplex: 'half'`. Set `Content-Length` from stat. Same API shape as browser fetch for portability.

2. **Progress reporting:** Deferred to future work. Initial implementation: no progress, just success/failure. See "Progress and Resumption" section for future options (TPC Perf-Markers, WebSocket progress channel).

3. **MOVE across servers:** MOVE = COPY + DELETE. But what if the DELETE fails after successful COPY? The file exists in both places. Recommendation: return 207 with the COPY success and DELETE failure as separate resource entries. The client can retry the DELETE or treat it as an acceptable state.

4. **Rate limiting:** Should the middleware limit concurrent outbound transfers to prevent a single client from saturating the server's upload bandwidth? Recommendation: yes, a global semaphore (e.g., max 4 concurrent outbound transfers per server) prevents one user's recursive COPY from starving others.

5. **Same-subdomain cross-server:** When two nephele instances serve the same subdomain (e.g., replicas), should COPY between them be treated as local or cross-server? Answer: always cross-server if the Destination host differs from `req.headers.host`. The middleware doesn't need to know about replica topology.

6. **Destination `Content-Length` validation:** If the destination server validates that the received body matches `Content-Length` and the connection drops mid-transfer, the destination should reject with 400. But not all servers do this. Should the source middleware verify completion by doing a HEAD on the destination after PUT?

7. **Transfer size limits:** Should there be a configurable max file size for cross-server COPY? A 100GB cross-server COPY could tie up a server for hours. A size limit (e.g., 10GB default) with an override header could prevent accidental resource exhaustion.
