Python: split a large text file by many headers -

- September 15, 2010

I have a large text file, which looks like this:

  latitude height Pressure 3 lines data group BSAS 2.3 4.5 45.0 875 5.6 6.5 46.2 676 3.4 3.4 48.2 565 6 Line data group SDAD 3.4 4.5 56.1 535 5.6 6.5 46.2 676 3.4 4.5 56.1 535 2.3 4.5 45.0 875 5.6 6.5 46.2 676 3.4 3.4 48.2 565 50 Line Data group asdasd 5.5 6.6 44.5 343 ... 3.7 8.4 56.5 456 ... and so on    I want to split the whole text file to isolate the data group, every data The group is stored in 2D Or the array. So far, I have tried two ways to do this.  
 The first route is going through each line and thus the data is being obtained:  
  # Objects define the object class called the wave # There are 4 attributes in each object: latitude, loan, height, pressure wave_list = [] with Open (file name, 'R') as F: except the next (F) # headline wave = wave (), enumerate In line (f): If 'data' is in line: If the wave is not empty: wave_list.append (wave) Every = wave () and: wave.lat.append (line.split () [0]) wave.lon.append (Line.split () [1]) wave.altitude.append (line.split () [2] ] Wave.pressure.append (line.split () [3]) wave_list.append (wave) return wave_list    The second method is using numpy loadtext:  
  f = open (filename, 'r') txt = f.read () # "Data" is divided by first element raw_chunks = txt.split ("data") [1:] # results Define a new list to store wave_list = [] # raw_chunks go through each part for RC: # "\ n" first_id = rc.find ("\ n") # "\ n" last_id Last index Lit = Rc.rfind = ""] = Trngklanl data. [1] WaveRetected = Data. [2] wave Pressure = data [3] wave_list.width wave wave list    However, both approaches are very slow, I have a look at pondo documentation but can not find the way to avoid headers in the middle of the file Could. I also look at various questions for examples:  
  
  
  
 But none of them solves my problem There is no faster way to read such a text file. Thank you in advance.   
 
   & lt; Number & gt; ( & lt; something & gt; ) and the number of lines to read ( & lt; numbers & gt; ) & lt; Some & gt;     
  
 
 
  
 Code> Import imports from imports of imports imports of the default decode Re-Data = With the default dict (list) Feather as the open (file name): Header = Next (wings, '') .split () for line in fin: m = re.match R '(\ d + ) Line. * (\ B \ w +) $ ', line) If I: Data [m.group (2)]. Detail (Islice (wings, int (m.group (1))))   
 Looking at:    upper latitudes 3 lines of data Group BSAS 2.3 4.5 45.0 875 5.6 6.5 46.2 676 3.4 3.4 48.2 565 6 Line Data Group Sdad 3.4 4.5 56.1 535 5.6 6.5 46.2 676 3.4 4.5 56.1 535 2.3 4.5 45.0 875 5.6 6.5 46.2 676 3.4 3.4 48.2 565   
 gives you the  data  like this:  
  {'bsas': ['2.3 4.5 45.0 875 \ n', '5.6 6.5 46.2 676 \ n' , '3.4 3.4 48.2 565 \ n'], 'sdad': ['3.4 4.5 56.1 535 \ n', '5.6 6.5 46.2 676 \ n', '3.4 4.5 56.1 535 \ n', '2.3 4.5 45.0 875 \ n ',' 5.6 6.5 46.2 676 \ n ',' 3.4 3.4 48.2 565 \ n ']}    For your comments, if "group" is unimportant Land, then:  
  fin as data = [] with open (filename): Header = Next (wings, ''). For the row in the wings. (): I = re If I: Data .append (list (Islice (wings, int (m.group (1)))) <('\') / Code>   

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps

Comments Post a Comment

Search This Blog

linux

Python: split a large text file by many headers -

Comments

Post a Comment

Popular posts from this blog

php - PDO bindParam() fatal error -

php - How can I cram 6+31 numeric characters into 22 alphanumeric characters? -

mysql - where clause in inner join query -