Medscape And Frontier

In honor of Dave Winer (c2000) NYC visit, some notes on my history with Userland Frontier...

I started working for ScpCommunications in 1992. They were in medical journal publishing (and related businesses), and were an all-Mac shop. Initially, I was (a) leading the transition to a new/horrible accounting system, (b) helping start a process of doing post-audits on client projects (from mainly a financial perspective), and (c) occasionally trying to herd the in-house development team, which was building internal apps using 4th Dimension.

Somewhere along the line I discovered Userland Frontier, which at the time was used mainly for scripting and gluing 3rd party apps supporting Apple Event-s. Played with it a bit, but didn't really find a use for it.

Then the Web came. Specifically, the release of Mosaic for Mac in 1994. After some research I discovered the free predecessor to WebStar's Mac webserver. By the time we decided to enter the business, and launched the MedScape site in early 1995, WebStar had come out and we'd bought it.

Our initial plan was (a) require registration so we can identify high-value readers (e.g. doctors who write lots of prescriptions), (b) make the site free for everyone, at least for awhile, (c) sell ads/sponsorships to pharmcos, and (d) maybe charge for the site a year or two down the road. So we needed to put up a registration form and lock anonymous users out of articles (but not navigation pages).

We started with Apple Script, since it was "official". Someone else wrote the code to handle the form posting (which basically dumped most of the form data out to a text file, but added an ID/password to the WebStar ACL system. We were surprised at how slow the Apple Script ran (e.g. 20 seconds with no load).

I somehow discovered that Frontier had been re-positioned as a CGI tool. So I re-wrote the Apple Script code in Frontier (I think even then I should have been able to run Apple Script from inside Frontier, but hit some speedbumps, so decided to do a rewrite since it wasn't hard anyway). Latency dropped to around 3-4 seconds. So we switched to that architecture at the last minute (like 2 days before we went live).

As we started to grow in membership, I noticed performance getting worse, and traced it to the registration/login process. I discovered that WebStar's user control data was stored in the resource fork of the app, which meant (a) it was unindexed, making performance suck as the list got bigger, and (b) there was a shockingly low hard ceiling in terms of how much data could go into a resource. So I started writing a new version of code that stored the user IDs and passwords in Frontier's ODB. By amazing coincidence, had it working in time to rush into production just as we started hitting that hard ceiling (new IDs would get inserted, but with no password, so the logins would fail).

The next phase was Apple Search integration. We ran Apple Search on a separate server against a copy of the content tree. We needed to get WebStar to talk to it. Luckily, someone had already written the lower-level Frontier code to wrap the Apple Events supported by Apple Search. I worked with that person (whose name I forget) to greatly expand the higher-level code (search results page structure/format).

We were putting banner ads at the top of every page from the day we launched (initially most of them were "house ads" to promote various editorial features). How swap in and out various ads without touching all the static articles? This one didn't use Frontier. We made ~10 file aliases (shortcuts) for ad images, and another set for click-through pages (no clients had websites yet, so any promo info they might want to put up we would have to produce and host). Then we just had our manual production people pseudo-randomly use a matching set of alias names in a given article file. And if we wanted to change an ad, we just changed which file alias1.gif and alias2.html pointed to.

To do ad view/click analysis, we'd run a Frontier script against the WebStar log. We'd could hits against each of the images, and divide the total ad hits into total page hits, to come up with a crude cache ratio (since browsers don't keep requesting image files), which we'd apply back to all individual ads. Then we'd do some crude demographic summaries of our registration data (e.g. 1/3 are doctors), and apply those ratios against ad hits. I've forgotten the name of the Mac-only relational database we were using, but it was amazingly horrible, at least for batch imports, so we didn't want to try and load detail traffic data in there for nicer analysis.

(I'm sure there were other features we did in this gap, but can't think of them at the moment, except for a basic form emailer we'd use to handle Feedback.)

At some point, I realized (a) a huge portion of our traffic came from ~20 key navigation pages, (b) it would be nice to speed those up, and (c) it would be nice to do real ad (Banner Ad) rotation on those pages, even if we kept static pseudo-rotation on the thousands of content pages. By that time, Dave had implemented a page-caching system that was pretty cool, so I set up a third server that would handle those pages (taking the load off the primary content server). (And since navigation pages didn't require sign-in, only the primary server needed to overhead of handling registration/login.) I built an ad rotation system. It had multiple "runs" each containing any number of ad slots which would be randomly pulled. The run was based on a keyword in the cached page's macro code. If you wanted uneven weights across sponsors, you'd just make redundant slots holding the ads you wanted shown more frequently. I built a web-based administrative interface so that production people could manage the ad slots, so that I wouldn't have to go into Frontier interactively to do so. I used that same interface to manage the dynamic/cached pages. Production people would work with static files, and save them with a special suffix; Dave's code would automatically load that page into cache if it wasn't there already; I built an interface to browse what was in the cache for review, and to delete pages that needed to be changed/refreshed.

Then we made a deal with a separate company to run a Med Line search engine for us (this is a huge structured text database containing summaries of medical journal articles: it's huge). I'd serve the user a form, get the search parameters, open an HTTP socket to that other company, pass them the query, get back the results page, and then do some parsing on it to change some URLs and design details (and add an ad), then pass the results on to the user. Where was that damn XML-RPC when I needed it?!? :)

Unfortunately, this was the straw that started breaking the camel's back. Since these searches needed to be behind the registration wall, this code needed to run on the primary server. (I hadn't figured out how to do cookies yet, so that I could spread the load to different hosts.) The TCPCMD code was not stable at high load and high latency (e.g. a hiccup from the server at the other company would not be caught well). And this searching was a popular service, so it got called a lot. So it started bring out servers down. We had all the auto-reboot stuff in place, but crashes were often severe enough that this wouldn't even work. We ended up just letting the remote site serve pages itself for a chunk of time, foregoing the ad exposures. We put the search form behind registration, so short of some form hacking, we were still forcing people to register to get to this service. At some point we even ended up paying the TCPCMD author (Leonard Rosenthal? maybe someone else?) to dedicate some time to trying to solve the problem. No dice. Not necessarily his fault, the Mac TCP stack was still a bit dicey in these days, I believe. Even when we dumped the search-wrapper feature we had only temporary stability gains. I think even the form-contents-emailer code could have been causing problems.

So, between this problem, and general Mac-platform-unease (limits of hardware scale, lack of 3rd party apps like ad servers, etc.) we did some analysis, and moved to Win N T (and Netscape Enterprise), where Frontier did not yet exist.

We wrote code in Cold Fusion, server-side Java (pre-servlets), and C-NSAPI. Stability was never what it should have been on this platform either (a couple years later we paid Netscape chunks of money to send people onsite and try and solve the problem, and they failed)! But we could still handle a lot more traffic and load balance with routers (since the registration data was handled by a separate MsSqlServer).

As I left, they were aiming toward an all-Microsoft infrastructure, for what I consider political reasons. Then a merger changed the political winds, and they've been in the process of a migration to Solaris/J2EE.

Meanwhile, at my new gig, I'm probably stuck with Microsoft for low-level services (e.g. MsExchange, Active Directory), since we're mainly focused on Intranet stuff for our business. So I use IIS for web server. But write code in Python. Which is cool.

But I still owe my net-career (any my super-renovated DotCom Kitchen, paid for with Medscape stock) to Frontier and Dave.


Edited:    |       |    Search Twitter for discussion