In my last post, I was playing around with methods to serialize Clojure data structures, especially a complex record that contains a number of other records and refs. Chas Emerick and others mentioned in the comments there, that putting a ref inside a record is probably a bad idea - and I agree in principle. But this brings me to a dilemma.
Lets assume I have a complex record that contains a number of "sub" records that need to be modified during a program's execution time. One scenario this could happen in is a record called "Table", that contains a "Row" which is updated (Think database tables and rows). Now this can be implemented in two ways,
- Mutable data structures - In this case, I would put each row inside a table as a ref, and when the need to update happens, just fine the row ID and use a dosync - alter to do any modifications needed.
- The advantage is that all data is being written to in place, and would be rather efficient.
- The disadvantage however, is that when serializing such a record full of refs, I would have to build a function that would traverse the entire data structure and then serialize each ref by dereferencing it and then writing to a file. Similarly, I'd have to reconstruct the data structure when de-serializing from a file.
{:filename"tab1name",
:tuples
#<Ref@511d89f8:
#{{:recordid nil,
:tupdesc
{:x
#<Ref@59a683e6:
[{:type"int", :field"colid"}
{:type"string", :field"name"}]>},
:tup #<Ref@411a9435: {:colid1, :name"akriti"}>}
{:recordid nil,
:tupdesc
{:x
#<Ref@59a683e6:
[{:type"int", :field"colid"}
{:type"string", :field"name"}]>},
:tup #<Ref@424f8ad5: {:colid2, :name"viksit"}>}}>,
:tupledesc
{:x
#<Ref@59a683e6:
[{:type"int", :field"colid"}{:type"string", :field"name"}]>}}
- Immutable data structures - This case involves putting a ref around the entire table data structure, implying that all data within the table would remain immutable. In order to update any row within the table, any function would return a new copy of the table data structure with the only change being the modification. This could then overwrite the existing in-memory data structure, and then be propagated to the disk as and when changes are committed.
- The advantage here is that having just one ref makes it very simple to serialize - simply de-ref the table, and then write the entire thing to a binary file.
- The disadvantage here is that each row change would make it necessary to return a new "table", and writing just the "diff" of the data to disk would be hard to do.
#<Ref@4a3e7799:
{:filename"tab1name",
:tuples
#{{:recordid nil,
:tupdesc
{:x
[{:type"int", :field"colid"}
{:type"string", :field"name"}]},
:tup{:colid1, :name"viksit"}}
{:recordid nil,
:tupdesc
{:x
[{:type"int", :field"colid"}
{:type"string", :field"name"}]},
:tup{:colid1, :name"akriti"}}},
:tupledesc
{:x
[{:type"int", :field"colid"}{:type"string", :field"name"}]}}
So at this point, which method would you recommend?