How can I get Unicode output from robocopy in a PowerShell script?

Summary

Robocopy’s output encoding is fixed as UTF-16LE, which causes issues when redirected or piped in PowerShell scripts. Attempting to change console encoding or using Set-Content with UTF-8 results in gibberish output due to encoding mismatch.

Root Cause

  • Robocopy outputs in UTF-16LE regardless of system settings.
  • PowerShell’s default encoding for redirection (>) and pipes (|) is UTF-8, leading to encoding conflicts.

Why This Happens in Real Systems

  • Encoding mismatch: Robocopy’s UTF-16LE output is incompatible with PowerShell’s UTF-8 handling.
  • Redirection limitations: PowerShell’s > and | operators do not preserve UTF-16LE encoding.

Real-World Impact

  • Data corruption: Log files or displayed output appear as gibberish.
  • Pipeline failures: Filtering or processing output fails due to invalid characters.
  • Debugging challenges: Real-time logging and monitoring become unreliable.

Example or Code

$logPath = "$env:USERPROFILE\robocopy-log.txt"
$process = Start-Process -FilePath "robocopy.exe" -ArgumentList @("$env:PUBLIC", "$env:TEMP", "/E", "/L", "/UNICODE") -RedirectStandardOutput $logPath -NoNewWindow -Wait -Encoding Unicode

How Senior Engineers Fix It

  • Use Start-Process with -RedirectStandardOutput and -Encoding Unicode to preserve UTF-16LE.
  • Avoid redirection (>) or pipes (|) for Robocopy output.
  • Explicitly handle encoding in logging:
    Out-File -FilePath $logPath -Encoding Unicode

Why Juniors Miss It

  • Assumption of default encoding: Juniors often assume Robocopy uses system default encoding.
  • Overlooking PowerShell limitations: Lack of awareness about PowerShell’s UTF-8 default for redirection.
  • Improper use of Set-Content: Misunderstanding that Set-Content -Encoding utf8 works for UTF-16LE data.

Leave a Comment