performance - How to optimize proc SQL's group by with respect to required size of "work"? -
To comply with the suggestion, sample code is working here:
Data testing; Format ID 5. Group 1 1. Group 2 1. Group 3 1. Data $ 1 valid_frame ddmmyy10 valid_team ddmmyy10.; Input ID Group 1 Group 2 Group 3 Data Valid _from yymmdd8 _yymmdd8 right here;. datalines; 10001 01 0a 2013010120131231 10001 1 1B 2013010120130701 10001 1 1C 2013070120131231 10002 1 1D 2013010120131231 10002 1 1E 2013010120130101; Run; For every group, I want a flag, this indicates that the minimum value of "valid" is equal to "valid until" of those entries that share the id related to that group Whether or not the price is equal. . Given examples: In Group 10002, there is only one valid entry in group 3, i.e. 5th. Therefore, in Group 3, the valid validity of all valid entries of "valid from" (only one entry may be higher) is equal to maximum "legal_til", so I want a flag here code that works on small data sets , But not large (inadequate disk space).
proc sql; Select 1 and 0 after the minimum (valid_from + 100000000 * (1-group 1)) = maximum (valid_teel-00000000 * (1-group 1)), id, group 1, group 2, group 3, data, valid_ Make Table Test 2 as valid, smaller. Flag_1 ends in form 1, case when the minimum (valid_from + 100000000 * (1-group 2)) = maximum (legal_team -100000000 * (1-group 2)) then 1 and 0 as the flag_2 format 1. When the nominal (valid is from + 100000000 * (1-group 3)) = maximum (valid_team -100000000 * (1-group 3)) then 1 and 0 as the flag 1 digit 1 format from the test group ; Id; leave; The question is, how much free disk space is required on the work library? (Ideally, as a majority of the dataset size, the actual dataset is very large). Do many groups depend on the number? (I do not know why this was recommended, but it was suggested)
Bonus Question: How Runtime / Disc Usage Can Be Improve? (One way to improve disk usage is simply storing the group, id and date variable (not relevant data itself) in a different dataset and merges it back to the original flag.)
Your location is used by the SRO operation because Group BY + and more space to self-merge the original data with the summary results .
I will:
PROC summary data = LIB.A minimum maximization; class ID; Var Var1 Var2 ... Var10; Output Out = WorkMM_MX (Drop = _Freak_ _type_) Min (Vir 1) = Minver 1 Max (Veer 1) = MaxWarter .... .... Run; Data = WORK.MIN_MAX; Set = WORK.MIN_MAX; Flag_1 = minVar1 = maxVar1; .... flag_10 = minworker = maximal 10; Run; Proc sql; Create index ID on / * or SORT data * / LIB.A; Create index id on WORK.MIN_MAX; leave; Data LIB.B; Merge Lib A Work MIN_Max; By ID; Run;
Comments
Post a Comment