On 2023-04-25 04:58, Roger Mason wrote: > Hello, > > I have files like this (Si.in): > > 'Si' : spsymb > 'silicon' : spname > -14.0000 : spzn > 51196.73454 : spmass > 0.534522E-06 2.2000 47.8169 400 : rminsp, rmt, rmaxsp, nrmt > 7 : nstsp > 1 0 1 2.00000 T : nsp, lsp, ksp, occsp, spcore > 2 0 1 2.00000 T > 2 1 1 2.00000 T > 2 1 2 4.00000 T > 3 0 1 2.00000 F > 3 1 1 1.00000 F > 3 1 2 1.00000 F > 1 : apword > 0.1500 0 F : apwe0, apwdm, apwve > 0 : nlx > 2 : nlorb > 0 2 : lorbl, lorbord > 0.1500 0 F : lorbe0, lorbdm, lorbve > 0.1500 1 F > 1 2 : lorbl, lorbord > 0.1500 0 F : lorbe0, lorbdm, lorbve > 0.1500 1 F I have the impression that the indentation of the data indicates a nesting level, so that there is a hierarchy. A general approach is possible to parse the whole along these lines. We define a simple data structure to represent a frame. - A frame consists of headings, rows and children. - The headings is a list of strings like ("lorbl" "lorbord"). - The rows are a vector of lists of items, which we can tokenize into strings and floating-point (or possibly more finely: we could have T and F be t and nil Lisp objects or whatever). - Children are other frames, listed below a certain frame, if they are indented by one from that frame. According to this, I wrote a prototype program: (defstruct frame () headings rows children) (defun tokenize-data (str) (let ((toks (tok #/'.*'|[^ ]+/ str))) (collect-each ((tok toks)) (match-case tok (@(@f (tofloat)) f) (@(and @(starts-with "'") @(ends-with "'")) [tok 1..-1]) (@else tok))))) (defun table-data-read (: (stream *stdin*)) (let ((stack (vector 32)) (prev-level 0)) (build (whilet ((line (get-line stream))) (let ((level (match-regex line #/ */))) (if (< level 32) (match-case line (`@data : @headings` (let ((fr (new frame headings (spl ", " headings) rows (vec (tokenize-data data)) children (vec)))) (set [stack level] fr) (if (eql 1 level) (add fr) (iflet ((parent [stack (pred level)])) (vec-push parent.children fr))))) (`@data` (iflet ((current [stack level])) (vec-push current.rows (tokenize-data data))))))))))) (prinl (table-data-read)) Note that this contains a hack: that the root level is 1 rather than 0. This is because the sample data's root node is indented by one. See the expression (eql 1 level). The program produces the following data (which I reformatted manually). Is this barking up the right tree? (#S(frame headings ("spsymb") rows #(("Si")) children #()) #S(frame headings ("spname") rows #(("silicon")) children #(#S(frame headings ("spzn") rows #((-14.0)) children #(#S(frame headings ("spmass") rows #((51196.73454)) children #()))) #S(frame headings ("rminsp" "rmt" "rmaxsp" "nrmt") rows #((5.34522e-7 2.2 47.8169 400.0)) children #(#S(frame headings ("nstsp") rows #((7.0)) children #()) #S(frame headings ("nsp" "lsp" "ksp" "occsp" "spcore") rows #((1.0 0.0 1.0 2.0 "T") (2.0 0.0 1.0 2.0 "T") (2.0 1.0 1.0 2.0 "T") (2.0 1.0 2.0 4.0 "T") (3.0 0.0 1.0 2.0 "F") (3.0 1.0 1.0 1.0 "F") (3.0 1.0 2.0 1.0 "F")) children #()) #S(frame headings ("apword") rows #((1.0)) children #(#S(frame headings ("apwe0" "apwdm" "apwve") rows #((0.15 0.0 "F")) children #()))) #S(frame headings ("nlx") rows #((0.0)) children #()) #S(frame headings ("nlorb") rows #((2.0)) children #()) #S(frame headings ("lorbl" "lorbord") rows #((0.0 2.0)) children #(#S(frame headings ("lorbe0" "lorbdm" "lorbve") rows #((0.15 0.0 "F") (0.15 1.0 "F")) children #()))) #S(frame headings ("lorbl" "lorbord") rows #((1.0 2.0)) children #(#S(frame headings ("lorbe0" "lorbdm" "lorbve") rows #((0.15 0.0 "F") (0.15 1.0 "F")) children #()))))))))