c++ - multiple pthreads are called multiple times; slower than serial code -
I resolve 4 econces on 2D discrete domain (Lx, LYA). These 4 eqns need to solve the steps of time of 1000. Each eqn will need parameters at the A, B and C (i, j) locations. I have dynamically created a_m, b_m, and c_m in main () and I will pass my address to each thread.
I created 4 functions eq1, eq2, eq3, eq4.
I am creating LX * LY numbers for each eqn. Each thread ID will be represnt in a unique (i, j) domain LX, LY. Therefore, each thread will only work in a specific data location (i, j) vectors a_m, b_m, c_m. Again, I'm mainly calling 4 units in sequential form (). In the sequence bcoz of 4 equations, the parameters A, B and C are updated on each (i, j) according to each equation.
My program is running slower than serial computing. Can you suggest some optimization tips for speed up?
I think creating 4 threads, solving each thread for each equation for Lx * LA time will not provide much movement.
ex> // main First of all, there are some errors in the code your eq2, eq3, eq4 loop imath Ornaments to the previously created thread will be orphaned. I think your code has undeclared behavior if you fix the code, then you can proceed with optimization. One major problem is the thread build overhead, you basically create 1000 times the thread and destroy it. If the code is associated with eq1, eq2 ..., then simple, then the overhead generated by thread construction / destruction is high. If you are not forced to use pthreads, then I go for OpenMP. If you want to use pthreads, you probably need to implement the "thread pool" If you go for OpenMP, then you will do something like this: / p> and OpenPP Runtime will know how many threads will be launched depending on your machine. # defined NLx * LY structure thread_data {int thread_id, t; Double * A, * B, * C; }; Straight thread_data thread_data_array [N]; // Function declaration; * Implementation of zero * eqn1 (zero * thread) {// eqn1 pthread_exit (NULL); Return 0; } Implementation of zero * eqn2 (zero * thread) {// eqn2 pthread_exit (NULL); Return 0; } Implementation of zero * eqn3 (zero * thread) {// eqn3 pthread_barrier_wait (& hinders); Pthread_exit (zero); Return 0; } Implementation of zero * eqn4 (zero * thread) {// eqn4 pthread_exit (NULL); Return 0; }
zeros * main (zero *) {pthread_t threaded [n]; // Dynamically for each of A_ M, B_m and C_m size N (time = 0; time & lt; 10000; time ++) {for (i = 0; i & lt; n; i ++) {// thread_data_array [i] is initialized // in one, * b, * c in thread_data_rere [i] A_M, BIM, CIM is mainly produced primarily pthread_create (threaded [i], eq1 , (Zero *) and thread_data_array [i]); } (I = 0; i & lt; N; i ++) {// thread_data_array [i] is started / threaded in a, * b, * c in [i] a-m, b_m Will save the address of the C_M, which is primarily dynamically created () Pthread_create (Threaded [i], eq2, (zero *) and thread_data_array [i]); } (I = 0; i & lt; N; i ++) {// thread_data_array [i] is started / threaded in a, * b, * c in [i] a-m, b_m Will save the address of the C_M, which is primarily dynamically created () Pthread_create (Threaded [i], eq3, (Zero *) and Thread_data_array [i]); } (I = 0; i & lt; N; i ++) {// thread_data_array [i] is started / threaded in a, * b, * c in [i] a-m, b_m Will save the address of the C_M, which is primarily dynamically created () Pthread_create (Threaded [i], eq4, (Zero *) and Thread_data_array [i]); } (J = 0; j & lt; n; j ++) {pthread_join (Threaded [j], NULL); }} Free (A_M); Free (b_m); Free (c_m); }
for
(time = 0; time & lt; 10000; time ++) {#pragma omp for parallel (i = 0; i
Comments
Post a Comment