Table public.in_vote format

Michael Allan mike at zelea.com
Thu May 23 16:10:10 EDT 2013


You're right C, we should fix this.  We're losing information.

> Change the table format from its current form: 
> 
> servicename | voteremail | xml
> 
> where the xml contains at least a timestamp and the candidate-email (can be 
> empty), as well as a dart-sector.
> 
> to:
> 
> timestamp | servicename | voteremail | candidateemail | ds

I agree we need a timestamp column.  But I also like the flexibility
of the XML column, especially for prototyping.  So maybe like this:

  timestamp | servicename | voteremail | xml

> I propose a primary key over the tuple (timestamp, servicename,
> voteremail) for now, as these should be unique (you shouldn't cast
> different votes for one voter on the same poll at once). ...

Yes.

> ... We also should have a B-Tree index on timestamp and any index
> (probably hash-map) on servicename and voteremail to make querying
> fast ...

I think we automatically get fast indeces for the primary keys.

> ... You can still get the same query behaviour, by just selecting
> the newest vote. ...

I guess the most common query (vocount's) will be for the latest votes
of a given servicename.

> What do you think? I can help to adjust the Java code, if you don't
> like to put time into the effort.

Okay.  If we agree on the columns, then please go ahead.

Mike


conseo said:
> Hi Mike,
> 
> Problems:
> 1) The voting table does not store all vote events, but rather updates them in 
> place. Hence we have no complete history of vote events, but only compiled 
> snapshots of certain intervals. 
> 2) The current xml string makes it fairly difficult to compile a historical 
> vote count, because I need to query by timestamp as well.
> 
> Proposal:
> 
> Change the table format from its current form: 
> 
> servicename | voteremail | xml
> 
> where the xml contains at least a timestamp and the candidate-email (can be 
> empty), as well as a dart-sector.
> 
> to:
> 
> timestamp | servicename | voteremail | candidateemail | ds
> 
> which basically destructures the xml in plain SQL to make it queryable. If you 
> would like to have a collection in a field to flexibly save attribtues like ds 
> (dartsector) without schema changes in the future, I'd propose to use the new 
> JSON support in Postgres. This would still allow to get some of SQL's query 
> functionality for data in this field (e.g. filtering by JSON attributes) as 
> well as validation. (1)
> 
> I propose a primary key over the tuple (timestamp, servicename, voteremail) 
> for now, as these should be unique (you shouldn't cast different votes for one 
> voter on the same poll at once). We also should have a B-Tree index on 
> timestamp and any index (probably hash-map) on servicename and voteremail to 
> make querying fast, but this is not necessary to pin down the data format, 
> just taking the benefits of SQL at runtime.
> 
> Votes are not updated in place then, but added as new events (rows) with a 
> newer timestamp. You can still get the same query behaviour, by just selecting 
> the newest vote. You can also trivially compose the current xml form from the 
> SQL query, so I can't see any big breaks by this change. We need to recreate 
> the table from current data once, I can do so for you.
> 
> What do you think? I can help to adjust the Java code, if you don't like to 
> put time into the effort.
> 
> conseo
> 
> (1) http://www.postgresql.org/docs/current/static/functions-json.html
> There is also a xml type:
> http://www.postgresql.org/docs/current/static/functions-xml.html
> but since we use JSON for most communication, I think it is the best fit.
> Accessing JSON: 
> http://people.planetpostgresql.org/andrew/index.php?/archives/249-Using-PLV8-to-index-JSON.html
> >From my side we can also keep the field as XML-String as long as we pull 
> timestamp and candidate out of it.



More information about the Votorola mailing list