Posts Tagged ‘Python’

Python HL7 Parser Released

October 31, 2012

I have finished the message creation portion of our Python HL7 Parser. The readme has most of the good info.


If you end up using this package, let me know what you think and how to improve/fix it!


What’s that on the computer screen?

February 24, 2012

Just found this short but interesting article about an experience an engineer had when going out to dinner. The gist is that for a particular restaurant, their software did not fit well enough or was not faster than using a marker on a whiteboard.

This is the sort of war we are fighting. We ARE in the office, experiencing user interact with the system every day. It IS working but I feel there is so much more to do.  We have even put up signs to remind people about our quest for faster and more accurate methods for data capture.


Legacy Health IT: Case of the MUMPS

June 3, 2010

I recently recalled an article at The Daily WTF I read a while back on legacy health IT software, specifically regarding the language MUMPS, or “M”. It was developed by Mass. General Hospital and was pretty innovative in its time.

But that was 1960. Programming language have diversified and evolved (I like Python). But looking at a sample of code (Wikipedia) from this language makes it easy to understand why these large health IT companies find it hard to have an agile approach to software changes:

Here is some example code from Wikipedia:

;;19.0;VA FileMan;;Jul 14, 1992
D I ‘X1!’X2 S X=”" Q
S X=X1 D H S X1=%H,X=X2,X2=%Y+1 D H S X=X1-%H,%Y=%Y+1&X2
K %H,X1,X2 Q
C S X=X1 Q:’X D H S %H=%H+X2 D YMD S:$P(X1,”.”,2) X=X_”.”_$P(X1,”.”,2) K X1,X2 Q
S S %=%#60/100+(%#3600\60)/100+(%\3600)/100 Q
H I X<1410000 S %H=0,%Y=-1 Q S %Y=$E(X,1,3),%M=$E(X,4,5),%D=$E(X,6,7) S %T=$E(X_0,9,10)*60+$E(X_”000″,11,12)*60+$E(X_”00000″,13,14) TOH S %H=%M>2&’(%Y#4)+$P(“^31^59^90^120^151^181^212^243^273^304^334″,”^”,%M)+%D
S %=’%M!’%D,%Y=%Y-141,%H=%H+(%Y*365)+(%Y\4)-(%Y>59)+%,%Y=$S(%:-1,1:%H+4#7)
K %M,%D,% Q

Compared to some example Python code:

my_list = ['john', 'pat', 'gary', 'michael']
for i, name in enumerate(my_list):
     print "iteration %i is %s" % (i, name)

I’m being a tad unfair here, as the above MUMPS code does not simply print a list of names (I think), but what should be clear is the degree to which some of this legacy health IT code is completely un-manageable. It might be worth asking your large EMR vendor which types of platforms they are using and what they think of their ability to add features and fix bugs in a timely manner.

Two more weeks of testing

May 2, 2010

The past two weeks have been very busy. I have been on-site at the office of Carolina Oncology Specialists working with the doctors, nurses and staff who will be using Ankhos, and there is a lot of work to be done. Two of the major improvements which have come from this most recent week have been:

Regimen creation UI: While the regimen framework has been fully able to represent and create the complicated chemotherapy regimens designed by the doctors, the UI is not yet ‘easy to use’  in this regard. The doctors and I spent some more time hashing out ideas and hopefully the next iteration will be better.

We are trying to focus more on the vocabulary of the user. For instance, being able to program “once a week for three weeks, skip a week and then a fourth treatment” is more natural to a MD than specifying “Days 1,8,15,29” for a treatment. We need to figure out how to fit that into a UI.

FTP file dump ingest:  I was able to set up the web server and get the automatic file ingest working. The lab machines are currently set up to dump a file each time a patient has their blood drawn.  This folder has never been emptied.  Ever. This means around 64 thousand files are in each of these folders (multiple machines * multiple office sites = multiple folders)
My original script simply looped through the files in the directory in Python determining, by the modification date of the file, which files to ingest. I quickly learned that this was entirely too slow. What I ended up having to do was perform a ‘ls’ or ‘dir’ command on the directory using the Python subprocess package and let the filesystem do the sorting for me. I’m sure there is a log(n) process in there somewhere, because it certainly made the ingest time acceptable.

Open source HL7 parser

March 19, 2010

I’ve spent the last few days creating the data ingest engine for our application so we can automatically retrieve data from our in-house lab machines. In doing so, I decided to start my own HL7 parser. I did this because I wanted access to fields by field name and not just by list index.  I also wanted to learn the HL7 spec a bit more.

And learn I did! I learned that there all all sorts of flavors of HL7, each home-grown from lab or EMR vendors to suit their own needs.  We have also decided to make this an open-source aspect of our project. I wanted to wait a while to do this, but this video about the pitfalls of trying to make code ‘perfect’ before releasing it helped push this open source move.

The code can be found here:


This code is far from complete and we will be adding features as we find a need for them. This system currently is sufficient to ingest blood work data from our LabCorp machines.



As according to this recent post


Our newly released python HL7 parser can be found here:

HL7 fields, interoperability

March 9, 2010

I thought I’d quickly share a few of the lessons I’ve learned about parsing HL7 data.  I’m using the python hl7 parser by john paulett.

The first problem I had was with the line endings on my sample file. They must be in \r (carriage return only) for this parser to work.  This is in the hl7 standard, by the way.

hl7 may be a bit confounding at first glance, but it’s actually pretty straighforward once you know the data type labels. Here is a link that explains some of the fields and their names.

Hl7 is basically just a list of information vectors. Each of these vectors has a specific role. For example:  A PID vector  denotes patient identity information, while AL1 denotes a vector containing information about a patient allergy.

I have heard stories of vendors charging upwards fo $6,000 to implement different interfaces in their EMR… something that would take half a day. Atrocious.