regex - R: Fast string split on first delimiter occurence -

- January 15, 2015

I have a file with ~ 40 million lines that I need to split by the first comma delimiter. Using the

stringr function str_split_fixed does the following, but it is very slow.

  Library (Dataable) Library (stringer) Df1 & lt; - data.frame (id = 1: 1000, letter 1 = representative (letter [sample (1: 25,1000, substituted = t)], 40)) df1 $ combCol1 & lt; - paste (df1 $ id, ', df1 $ letter1 September =' ') DF1 $ combCol2 & lt; - Paste ($ combCol1 DF1, ',', DF1 $ combCol1 September = '') st1 & lt; - str_split_fixed ($ combCol2 DF1, ',', 2)    Is there a faster way to do this? 
 updates    stri_split_fixed in more recent versions of   
 
 "Stringi" The  function has a  simplified  argument which will be the  TRUE  to return a matrix, thus the update solution will be:    Original answer (with updated benchmark)   If you click on " Stringer "are resting with syntax and do not want to be a hero from far away, If you want to profit even by increasing the speed, instead of the "string" package:  
  Library (stringer) library (string) system.time (temp1) - str_split_fixed (df1 $ combCol2 , ',', 2)) # User system elapsed # 3.25 0.00 3.25 system.time stri_split_fixed (DF1 $ combCol2, -; - (temp2a & lt do.call (rbind, stri_split_fixed (DF1 $ combCol2, "", 2 )) # User System # 0.04 0.00 0.05 system.time (temp2b go and execute ",", 2, simplify = TRUE) # User System Elapsed # 0.01 0.00 0.01    Most " "Stringing" functions in "string" functions Ntaan, but as it can be seen, for example, "string" output requires an additional step to bind data to output as a matrix instead of a list.  
  Here's the comparison with @ RichardScriven's suggestion in this comment:  
  fun1a & lt; - Function () do.call (rbind, stri_split_fixed (DF1 $ combCol2, "", 2)) fun1b & LT; - Function () stri_split_fixed (, DF1 $ combCol2, "" 2, easy making = TRUE) fun2 & lt; - Function () {do.call (rbind, regmatches ($ combCol2, regexpr (DF1 ",", DF1 $ combCol2) ,,, invert = TRUE)}} library (microbenchmark) microbenchmark (fun1a) fun1b () fun2 ), Bar = 10) # Unit: millisecond # expr min LQ Mean median uq max neval # fun1a () 4272647 46,35848 59,56948 51,94796 629.2920 9 8.46330 10 # fun1b () 17,55183 18 , 59337 20,09049 18,84907 22,09419 26,85343 10 # fun2 () 370,82055 404,23115 434,62582 439,54923 476,02889 480,97912 10    

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




php - PDO bindParam() fatal error -



-



September 15, 2015








I am trying to learn PDO, now I have written this small code, but it gives me a serious error:        Fatal error:  Call a member function at a non-object dot () ...    $ con = new mysqli ("127.0.0.1", "root", "", "csvdangercheck"); $ Query = $ con- & gt; ("Id,` var1`, `var2`) values (" id,: var1,: var2) ";); $ query- & gt; bypass pattern (': id', $ id); $ query- & gt; BindParam (': var1', $ VAL1); $ Query- & gt; bindParam (': var2', $ val2); $ Query-> Executed ();    I Trying to use  print_r ($ con-> errorInfo ())  but    Fatal error:  Undefined method mysqli :: errorInfo ( ) ... can someone tell me what am I missing?      Flaffeh said, you have PDO Missing Mysqy, try it:    $ con = new PDO ('mysql: host = 127.0.0.1; dbname = csvdangercheck', 'root', '');     





Read more





php - How can I cram 6+31 numeric characters into 22 alphanumeric
characters? -



-



March 15, 2015








    I have a 6 digit number and a 31-digit number (such as "234536" and "201103231043330478311223582826") that I The API needs to be crawled in the same 22-character alphanumeric field that uses PHP. I tried to convert each one to 32 (using custom code could not handle the  base_convert ()  large number) and to connect with a single-character delimiter, but It only gives me 26 characters This is a restore API, so the letters should be URI-protected.   I would like to do a database table cross with two references without reference to another reference value, if possible, any suggestions?      Instead use a radix of 62. This will give you 22 characters for the top, 3.35 for the pre and 17.3 for the next.    & gt; & Gt; & Gt; Math.log (10 ** 6) /math.log (62) 3.3474826039165504 & gt; & Gt; & Gt; Math.log (10 ** 31) /math.log (62) 17.295326786902177     





Read more





mysql - where clause in inner join query -



-



August 15, 2015








    Please help me about this question:    Look for functions ($ userpno) } {Echo $ userpno; $ This-> query = "SELECT task.employee_id, task.user_id, task.service_id, service.name AS servicename, service.description AS servicedescription, employee.name as employineame, employee.pic_path employment, employee.pic_path from work Work.user_id = '$ userpno' Join the INNER employee. Pno = task.employee_id Join INNER user user.pno = task.user_id service.service_id = task.service_id "INNER service; }    query works perfectly without:    WHERE task.user_id = '$ userpno'    my Also try this method:    WHERE task.user_id = $ userpno    but it does not work.   Error:    Warning: mysqli_fetch_array () parameter 1 is expected to be mysqli_result, given in Boolean C: \ wamp \ www \ admin \ classes \ Task Php on line 22    Please tell me where I say where is it.      Try:    $ this-> query = "SELECT Task.employee_id, task.user_id, task.service_id, service.name AS s...





Read more

Search This Blog

linux

regex - R: Fast string split on first delimiter occurence -

Comments

Post a Comment

Popular posts from this blog

php - PDO bindParam() fatal error -

php - How can I cram 6+31 numeric characters into 22 alphanumeric characters? -

mysql - where clause in inner join query -