Julia;for/map関数/.演算の速度比較(光線行列の計算アルゴリズムの改善)

はじめに
光線行列計算の最適化
for/map関数/.演算の速度比較
1. 計算結果の比較
考察
コード全体
1. 関連

はじめに

今回は以前制作したモンテカルロ法による光線行列計算(高次ガウシアンビーム(M²>1)の光線追跡-Juliaによるモンテカルロ法の実装-)を最適化した。基本的には光学よりJuliaの計算システムを確認するといった側面が強い。結論から言うと今回の計算の場合においてfor<map<.演算の順ではやかった。

光線行列計算の最適化

using PyPlot
using Distributions
using PyCall
using Random
using BenchmarkTools
plt = PyPlot
PyCall.PyDict(matplotlib["rcParams"])["font.family"] = "Meiryo";

今回利用するmoduleは以上のものになるのであらかじめ実行しておいてほしい。
コード全体は最後に乗せるが改善部分をピックアップして議論をしていく。

#~省略~
    r_results = zeros(length(ri_distibution), length(z_list_MC))
    theta_results = zeros(length(ri_distibution), length(z_list_MC))

    for (i,( rin_sample, theta_sample )) in enumerate(zip(ri_distibution, theta_distrbution))
        input_matrix = [rin_sample ; theta_sample] # input ray matrix
        lens_matrix = [1 0; -1/f 1] # lens matrix

        for (j, z) in enumerate(z_list_MC)
            if z <= f + dz  # before lens
                M = [1 z; 0 1]
            else  # after lens
                before_lens = [1 (f + dz); 0 1]
                z_matrix = [1 z-(f + dz); 0 1]
                M = z_matrix * lens_matrix * before_lens
            end
            output_matrix = M * input_matrix

            r_out = output_matrix[1]
            theta_out = output_matrix[2]

            r_results[i, j] = r_out
            theta_results[i, j] = rad2deg(theta_out)
        end
    end
#~省略~

今回改善を行っていくアルゴリズムは以上のアルゴリズムである。長さ\(z\)×光線数\(N\)をresultリストに格納していくといったのが今回のアルゴリズムとなる。

このアルゴリズムは行列計算をfor文内で計算する、変数の定義をfor文ないで行うなどの問題があった。まずは数式上でこれらの改善を行う。

今回目的としているのは各位置でのビーム径である。任意の位置でのビーム径は以下のように表せる。
\begin{equation}
\left[
\begin{array}{c}
r_{out} \\
\theta_{out}
\end{array}
\right]
= Z ⋅ M ⋅
\left[
\begin{array}{c}
r_{in} \\
\theta_{in}
\end{array}
\right]
\end{equation}
ここで\(Z\)は全光学素子通過後の距離の行列である。
また、
\begin{equation}
Z =
\left[
\begin{array}{c c}
1 & z \\
0 & 1
\end{array}
\right] \\
M = \left[ \begin{array}{c c} A & B\\ C & D \end{array} \right]
\end{equation}
とすると
\begin{equation}
\left[
\begin{array}{c}
r_{out} \\
\theta_{out}
\end{array}
\right]
=
\left[
\begin{array}{c}
(A+Cz)r_{in} + (B+Dz)\theta_{in} \\
C r_{in} + D \theta_{in}
\end{array}
\right]
\end{equation}

と展開できる。よって\(r_{out}\)は\(z\)の関数に書き下せ、
\begin{equation}
r_{out}(z) = (C r_{in} + D \theta_{in})z + (A r_{in} + B \theta_{in})
\end{equation}

よって、計算アルゴリズムには行列計算を行わなくてもよいことがわかる。

for/map関数/.演算の速度比較

今回は3つの手法(for/map関数/.演算)で計算速度がどの程度異なるのかを試してみた。肝心の計算部分を以下に示す。

for文

 ####################for文 ####################
    #予め行列計算
    lens_matrix = [1 0; -1/f 1] # lens matrix
    before_lens = [1 (f + dz); 0 1]
    M_lens_before = lens_matrix * before_lens
    A, B, C, D = M_lens_before[1,1], M_lens_before[1,2], M_lens_before[2,1], M_lens_before[2,2]

    #for文使って計算
    for (i, (rin_sample, theta_sample)) in enumerate(zip(ri_distibution, theta_distrbution))
        for (j, z) in enumerate(distance_before_lens)
            r_results[i, j] = rin_sample + z*theta_sample
        end
        for (j, z) in enumerate(distance_after_lens)
            idx = j + length(distance_before_lens)
            z_rel = z - (f + dz)
            r_results[i, idx] = rin_sample*(A+C*z_rel) + theta_sample*(B+D*z_rel)
        end
    end

map関数

 ####################map関数 ####################
    #予め行列計算
    lens_matrix = [1 0; -1/f 1] # lens matrix
    before_lens = [1 (f + dz); 0 1]
    M_lens_before = lens_matrix * before_lens
    A, B, C, D = M_lens_before[1,1], M_lens_before[1,2], M_lens_before[2,1], M_lens_before[2,2]

    b = length(distance_before_lens)
    a = length(distance_after_lens)

    for (i, (rin_sample, theta_sample)) in enumerate(zip(ri_distibution, theta_distrbution))
        r_results[i,1:b] = map(z -> rin_sample + z*theta_sample, distance_before_lens)
        r_results[i,b+1:b+a] = map(z -> rin_sample*(A+C*(z - (f + dz))) + theta_sample*(B+D*(z - (f + dz))),distance_after_lens)
    end

.演算子

 ####################.演算子####################
    #予め行列計算
    lens_matrix = [1 0; -1/f 1] # lens matrix
    before_lens = [1 (f + dz); 0 1]
    M_lens_before = lens_matrix * before_lens
    A, B, C, D = M_lens_before[1,1], M_lens_before[1,2], M_lens_before[2,1], M_lens_before[2,2]

    b = length(distance_before_lens)
    a = length(distance_after_lens)

    for (i, (rin_sample, theta_sample)) in enumerate(zip(ri_distibution, theta_distrbution))
        r_results[i, 1:b] = rin_sample .+ distance_before_lens .* theta_sample

        z_rel = distance_after_lens .- (f + dz)
        r_results[i, (b+1):(a+b)] = rin_sample .* (A .+ C .* z_rel) .+ theta_sample .* (B .+ D .* z_rel)
    end

計算結果の比較

using BenchmarkTools

N = 5000
M² = 1.0
dz = 0

println("MC_Ray_tracing:")
@btime MC_Ray_tracing(N=$N, M²=$M², dz=$dz);

println("for文:")
@btime fast_MC_Ray_tracing(N=$N, M²=$M², dz=$dz);

println("map関数:")
@btime fast2_MC_Ray_tracing(N=$N, M²=$M², dz=$dz);

println(".演算子:")
@btime fast3_MC_Ray_tracing(N=$N, M²=$M², dz=$dz);

ベンチマーク関数を使って比較した。
その結果は以下のようになった。

MC_Ray_tracing:
  259.137 ms (11515045 allocations: 500.84 MiB)
for文:
  1.674 ms (68 allocations: 7.79 MiB)
map関数:
  1.930 ms (20068 allocations: 12.52 MiB)
.演算子:
  2.238 ms (30068 allocations: 16.56 MiB)

考察

関数	実行時間	アロケーション数	メモリ使用量
`MC_Ray_tracing`（元の方法）	259.1 ms	11,515,045	500.84 MiB
`fast_MC_Ray_tracing`（for文）	1.67 ms	68	7.79 MiB
`fast2_MC_Ray_tracing`（`map`使用）	1.93 ms	20,068	12.52 MiB
`fast3_MC_Ray_tracing`（ドット演算）	2.24 ms	30,068	16.56 MiB

元の計算に比べて非常に計算時間を短縮することができた、計算時間はfor<map<.演算の順となった。メモリ割り当てを見てみるとfor文が最も少ないことがわかる。
どうやらJuliaではfor文を使え。ベクトル化してはいけない。の記事を見てみると新たなる配列を作ってしまうのが原因となっている可能性がある。また、chatGPTに聞いてみると

手法	処理内容・特徴	遅くなる理由
`for`文	シンプルで効率よく逐次代入。中間配列などを作らない。	✅ 最小限のメモリ割当とオーバーヘッド
`map`	各要素に関数を適用。配列を返すため中間配列が増える	⚠️ 生成された中間配列がメモリを消費
`.`演算子	ブロードキャストで配列演算。式の中に複数の配列演算が混在	⚠️ 一時配列が最も多く発生

との回答が返ってきた。

コード全体

今回の関数と確認、ベンチマーク比較のプログラムを以下に示す。

using PyPlot
using Distributions
using PyCall
using Random
using BenchmarkTools
plt = PyPlot
PyCall.PyDict(matplotlib["rcParams"])["font.family"] = "Meiryo";


function MC_Ray_tracing(; N ,M²,dz)
    max_distance = 10
    Random.seed!(123)

    f = 0.9  # レンズの焦点距離 (mm)
    rin = 6.3e-3  # 初期ビームウエスト半径 (mm) : w0
    λ = 940e-6  # 波長 (mm)
    θ = 4λ*M² / (2π * rin)  # ビームの発散角度 (rad)

    σ_d = rin * sqrt(2) # 1/e²幅=rinのときの標準偏差
    dist = Normal(0, σ_d)
    ri_distibution = rand(dist, N)

    σ_theta = θ / sqrt(2)  # ビームの発散角度の標準偏差
    dist2 = Normal(0,σ_theta)  # ビームの発散角度の分布
    theta_distrbution = rand(dist2, N)

    z_list_MC = range(0.0,max_distance,step = 0.1)


    r_results = zeros(length(ri_distibution), length(z_list_MC))
    theta_results = zeros(length(ri_distibution), length(z_list_MC))

    for (i,( rin_sample, theta_sample )) in enumerate(zip(ri_distibution, theta_distrbution))
        input_matrix = [rin_sample ; theta_sample] # input ray matrix
        lens_matrix = [1 0; -1/f 1] # lens matrix

        for (j, z) in enumerate(z_list_MC)
            if z <= f + dz  # before lens
                M = [1 z; 0 1]
            else  # after lens
                before_lens = [1 (f + dz); 0 1]
                z_matrix = [1 z-(f + dz); 0 1]
                M = z_matrix * lens_matrix * before_lens
            end
            output_matrix = M * input_matrix

            r_out = output_matrix[1]
            theta_out = output_matrix[2]

            r_results[i, j] = r_out
            theta_results[i, j] = rad2deg(theta_out)
        end
    end
    # calculation of 1/e^2 radius
    mean_rms = sqrt.(mean(r_results.^2, dims=1))[:]
    mean_r_1e2 = mean_rms / sqrt(2) 

    z_indices = findall(z -> z > max_distance*0.8, z_list_MC)
    z_subset = z_list_MC[z_indices]
    w_subset_M1 = mean_r_1e2[z_indices]
    slope_M1 = (w_subset_M1[end] - w_subset_M1[1]) / (z_subset[end] - z_subset[1])
    angle_M1 = atand(slope_M1)  # 角度（度）に変換

    return z_list_MC, mean_r_1e2 , angle_M1
end

#for分
function fast_MC_Ray_tracing(; N ,M²,dz)
    Random.seed!(123)

    ###### define parameter ######

    f = 0.9  # レンズの焦点距離 (mm)
    rin = 6.3e-3  # 初期ビームウエスト半径 (mm) : w0
    λ = 940e-6  # 波長 (mm)
    θ = 4λ*M² / (2π * rin)  # ビームの発散角度 (rad)

    σ_d = rin * sqrt(2) # 1/e²幅=rinのときの標準偏差
    dist = Normal(0, σ_d)
    ri_distibution = rand(dist, N)

    σ_theta = θ / sqrt(2)  # ビームの発散角度の標準偏差
    dist2 = Normal(0,σ_theta)  # ビームの発散角度の分布
    theta_distrbution = rand(dist2, N)

    #距離設定
    max_distance = 10
    z_list_MC = range(0.0,max_distance,step = 0.1)
    distance_before_lens = filter(x -> x <= f + dz, z_list_MC)
    distance_after_lens = filter(x -> x > f + dz, z_list_MC)

    ###### define parameter ######

    #resultのリストを製作
    r_results = zeros(length(ri_distibution), length(z_list_MC))

    """
    行列計算を展開
    任意の位置でのビーム径は以下のようにあらわせる
    [r_out theta_out] = Z ⋅ M ⋅ [r_in theta_in]
    ここでZは全光学素子あとの距離である。
    Z = [1 z,0 1]
    M = [A B,C D]とすると
    r_out = (A+Cz)r_in + (B+Dz)theta_in
    theta_out = C r_in + D theta_in
    と展開できる。
    よってroutはzの関数に書き下せる 
    """ 

    #予め行列計算
    lens_matrix = [1 0; -1/f 1] # lens matrix
    before_lens = [1 (f + dz); 0 1]
    M_lens_before = lens_matrix * before_lens
    A, B, C, D = M_lens_before[1,1], M_lens_before[1,2], M_lens_before[2,1], M_lens_before[2,2]

    #for文使って計算
    for (i, (rin_sample, theta_sample)) in enumerate(zip(ri_distibution, theta_distrbution))
        for (j, z) in enumerate(distance_before_lens)
            r_results[i, j] = rin_sample + z*theta_sample
        end
        for (j, z) in enumerate(distance_after_lens)
            idx = j + length(distance_before_lens)
            z_rel = z - (f + dz)
            r_results[i, idx] = rin_sample*(A+C*z_rel) + theta_sample*(B+D*z_rel)
        end
    end

    # calculation of 1/e^2 radius
    mean_rms = sqrt.(mean(r_results.^2, dims=1))[:]
    mean_r_1e2 = mean_rms / sqrt(2) 

    #十分遠方での広がり角を計算
    z_indices = findall(z -> z > max_distance*0.8, z_list_MC)
    z_subset = z_list_MC[z_indices]
    w_subset_M1 = mean_r_1e2[z_indices]
    slope_M1 = (w_subset_M1[end] - w_subset_M1[1]) / (z_subset[end] - z_subset[1])
    angle_M1 = atand(slope_M1)  # 角度（度）に変換

    return z_list_MC, mean_r_1e2 , angle_M1
end

#map関数
function fast2_MC_Ray_tracing(; N ,M²,dz)
    Random.seed!(123)

    ###### define parameter ######

    f = 0.9  # レンズの焦点距離 (mm)
    rin = 6.3e-3  # 初期ビームウエスト半径 (mm) : w0
    λ = 940e-6  # 波長 (mm)
    θ = 4λ*M² / (2π * rin)  # ビームの発散角度 (rad)

    σ_d = rin * sqrt(2) # 1/e²幅=rinのときの標準偏差
    dist = Normal(0, σ_d)
    ri_distibution = rand(dist, N)

    σ_theta = θ / sqrt(2)  # ビームの発散角度の標準偏差
    dist2 = Normal(0,σ_theta)  # ビームの発散角度の分布
    theta_distrbution = rand(dist2, N)

    #距離設定
    max_distance = 10
    z_list_MC = range(0.0,max_distance,step = 0.1)
    distance_before_lens = filter(x -> x <= f + dz, z_list_MC)
    distance_after_lens = filter(x -> x > f + dz, z_list_MC)

    ###### define parameter ######
    
    #resultのリストを製作
    r_results = zeros(length(ri_distibution), length(z_list_MC))

    #予め行列計算
    lens_matrix = [1 0; -1/f 1] # lens matrix
    before_lens = [1 (f + dz); 0 1]
    M_lens_before = lens_matrix * before_lens
    A, B, C, D = M_lens_before[1,1], M_lens_before[1,2], M_lens_before[2,1], M_lens_before[2,2]

    b = length(distance_before_lens)
    a = length(distance_after_lens)

    for (i, (rin_sample, theta_sample)) in enumerate(zip(ri_distibution, theta_distrbution))
        r_results[i,1:b] = map(z -> rin_sample + z*theta_sample, distance_before_lens)
        r_results[i,b+1:b+a] = map(z -> rin_sample*(A+C*(z - (f + dz))) + theta_sample*(B+D*(z - (f + dz))),distance_after_lens)
    end

    # calculation of 1/e^2 radius
    mean_rms = sqrt.(mean(r_results.^2, dims=1))[:]
    mean_r_1e2 = mean_rms / sqrt(2) 

    z_indices = findall(z -> z > max_distance*0.8, z_list_MC)
    z_subset = z_list_MC[z_indices]
    w_subset_M1 = mean_r_1e2[z_indices]
    slope_M1 = (w_subset_M1[end] - w_subset_M1[1]) / (z_subset[end] - z_subset[1])
    angle_M1 = atand(slope_M1)  # 角度（度）に変換

    return z_list_MC, mean_r_1e2 , angle_M1
end

function fast3_MC_Ray_tracing(; N ,M²,dz)
    Random.seed!(123)

    ###### define parameter ######

    f = 0.9  # レンズの焦点距離 (mm)
    rin = 6.3e-3  # 初期ビームウエスト半径 (mm) : w0
    λ = 940e-6  # 波長 (mm)
    θ = 4λ*M² / (2π * rin)  # ビームの発散角度 (rad)

    σ_d = rin * sqrt(2) # 1/e²幅=rinのときの標準偏差
    dist = Normal(0, σ_d)
    ri_distibution = rand(dist, N)

    σ_theta = θ / sqrt(2)  # ビームの発散角度の標準偏差
    dist2 = Normal(0,σ_theta)  # ビームの発散角度の分布
    theta_distrbution = rand(dist2, N)

    #距離設定
    max_distance = 10
    z_list_MC = range(0.0,max_distance,step = 0.1)
    distance_before_lens = filter(x -> x <= f + dz, z_list_MC)
    distance_after_lens = filter(x -> x > f + dz, z_list_MC)

    ###### define parameter ######
    
    #resultのリストを製作
    r_results = zeros(length(ri_distibution), length(z_list_MC))

    #予め行列計算
    lens_matrix = [1 0; -1/f 1] # lens matrix
    before_lens = [1 (f + dz); 0 1]
    M_lens_before = lens_matrix * before_lens
    A, B, C, D = M_lens_before[1,1], M_lens_before[1,2], M_lens_before[2,1], M_lens_before[2,2]

    b = length(distance_before_lens)
    a = length(distance_after_lens)

    for (i, (rin_sample, theta_sample)) in enumerate(zip(ri_distibution, theta_distrbution))
        r_results[i, 1:b] = rin_sample .+ distance_before_lens .* theta_sample

        z_rel = distance_after_lens .- (f + dz)
        r_results[i, (b+1):(a+b)] = rin_sample .* (A .+ C .* z_rel) .+ theta_sample .* (B .+ D .* z_rel)
    end

    # calculation of 1/e^2 radius
    mean_rms = sqrt.(mean(r_results.^2, dims=1))[:]
    mean_r_1e2 = mean_rms / sqrt(2) 

    z_indices = findall(z -> z > max_distance*0.8, z_list_MC)
    z_subset = z_list_MC[z_indices]
    w_subset_M1 = mean_r_1e2[z_indices]
    slope_M1 = (w_subset_M1[end] - w_subset_M1[1]) / (z_subset[end] - z_subset[1])
    angle_M1 = atand(slope_M1)  # 角度（度）に変換

    return z_list_MC, mean_r_1e2 , angle_M1
end

# 各実装の結果を比較
z_MC, w_original, _ = MC_Ray_tracing(N=5000, M²=1.0, dz=0)
_, w_for, _ = fast_MC_Ray_tracing(N=5000, M²=1.0, dz=0)
_, w_map, _ = fast2_MC_Ray_tracing(N=5000, M²=1.0, dz=0)
_, w_broadcast, _ = fast3_MC_Ray_tracing(N=5000, M²=1.0, dz=0)

# 結果の差分を計算
diff_for = abs.(w_original - w_for)
diff_map = abs.(w_original - w_map)
diff_broadcast = abs.(w_original - w_broadcast)

# 結果をプロット
plt.figure(figsize=(15, 10))

# 上段: ビーム半径の比較
plt.subplot(2, 1, 1)
plt.plot(z_MC, w_original, label="元の実装")
plt.plot(z_MC, w_for, label="for文版", linestyle="--")
plt.plot(z_MC, w_map, label="map関数版", linestyle="-.")
plt.plot(z_MC, w_broadcast, label="ブロードキャスト版", linestyle=":")
plt.xlabel("z (mm)")
plt.ylabel("ビーム半径 (mm)")
plt.title("各実装によるビーム半径の比較")
plt.legend()
plt.grid(true)

# 下段: 元の実装との差分
plt.subplot(2, 1, 2)
plt.semilogy(z_MC, diff_for, label="for文版との差分")
plt.semilogy(z_MC, diff_map, label="map関数版との差分")
plt.semilogy(z_MC, diff_broadcast, label="ブロードキャスト版との差分")
plt.xlabel("z (mm)")
plt.ylabel("差分 (mm)")
plt.title("元の実装との差分")
plt.legend()
plt.grid(true)

plt.tight_layout()
display(plt.gcf())

using BenchmarkTools

N = 5000
M² = 1.0
dz = 0

println("MC_Ray_tracing:")
@btime MC_Ray_tracing(N=$N, M²=$M², dz=$dz);

println("for文:")
@btime fast_MC_Ray_tracing(N=$N, M²=$M², dz=$dz);

println("map関数:")
@btime fast2_MC_Ray_tracing(N=$N, M²=$M², dz=$dz);

println(".演算子:")
@btime fast3_MC_Ray_tracing(N=$N, M²=$M², dz=$dz);