Python: split a large text file by many headers -
I have a large text file, which looks like this:
latitude height Pressure 3 lines data group BSAS 2.3 4.5 45.0 875 5.6 6.5 46.2 676 3.4 3.4 48.2 565 6 Line data group SDAD 3.4 4.5 56.1 535 5.6 6.5 46.2 676 3.4 4.5 56.1 535 2.3 4.5 45.0 875 5.6 6.5 46.2 676 3.4 3.4 48.2 565 50 Line Data group asdasd 5.5 6.6 44.5 343 ... 3.7 8.4 56.5 456 ... and so on I want to split the whole text file to isolate the data group, every data The group is stored in 2D Or the array. So far, I have tried two ways to do this.
The first route is going through each line and thus the data is being obtained:
# Objects define the object class called the wave # There are 4 attributes in each object: latitude, loan, height, pressure wave_list = [] with Open (file name, 'R') as F: except the next (F) # headline wave = wave (), enumerate In line (f): If 'data' is in line: If the wave is not empty: wave_list.append (wave) Every = wave () and: wave.lat.append (line.split () [0]) wave.lon.append (Line.split () [1]) wave.altitude.append (line.split () [2] ] Wave.pressure.append (line.split () [3]) wave_list.append (wave) return wave_list The second method is using numpy loadtext:
f = open (filename, 'r') txt = f.read () # "Data" is divided by first element raw_chunks = txt.split ("data") [1:] # results Define a new list to store wave_list = [] # raw_chunks go through each part for RC: # "\ n" first_id = rc.find ("\ n") # "\ n" last_id Last index Lit = Rc.rfind = ""] = Trngklanl data. [1] WaveRetected = Data. [2] wave Pressure = data [3] wave_list.width wave wave list However, both approaches are very slow, I have a look at pondo documentation but can not find the way to avoid headers in the middle of the file Could. I also look at various questions for examples:
But none of them solves my problem There is no faster way to read such a text file. Thank you in advance.
& lt; Number & gt; ( & lt; something & gt; ) and the number of lines to read ( & lt; numbers & gt; ) & lt; Some & gt;
Code> Import imports from imports of imports imports of the default decode Re-Data = With the default dict (list) Feather as the open (file name): Header = Next (wings, '') .split () for line in fin: m = re.match R '(\ d + ) Line. * (\ B \ w +) $ ', line) If I: Data [m.group (2)]. Detail (Islice (wings, int (m.group (1))))
Looking at: upper latitudes 3 lines of data Group BSAS 2.3 4.5 45.0 875 5.6 6.5 46.2 676 3.4 3.4 48.2 565 6 Line Data Group Sdad 3.4 4.5 56.1 535 5.6 6.5 46.2 676 3.4 4.5 56.1 535 2.3 4.5 45.0 875 5.6 6.5 46.2 676 3.4 3.4 48.2 565 gives you the data like this: {'bsas': ['2.3 4.5 45.0 875 \ n', '5.6 6.5 46.2 676 \ n' , '3.4 3.4 48.2 565 \ n'], 'sdad': ['3.4 4.5 56.1 535 \ n', '5.6 6.5 46.2 676 \ n', '3.4 4.5 56.1 535 \ n', '2.3 4.5 45.0 875 \ n ',' 5.6 6.5 46.2 676 \ n ',' 3.4 3.4 48.2 565 \ n ']} For your comments, if "group" is unimportant Land, then:
fin as data = [] with open (filename): Header = Next (wings, ''). For the row in the wings. (): I = re If I: Data .append (list (Islice (wings, int (m.group (1)))) <('\') / Code>
Comments
Post a Comment