Corona-Chan Project, Part 5: Final Model-Fitting Program

The linear relationship indicated by the graph shown in Part 4 of this series has shed light on something I overlooked when I was doing that part of the research. Specifically, it indicates that we can actually reduce the system of equations being graphed to a system of two linear equations in two variables. For reference, this was the system of equations:

equations4

It turns out I didn’t even have to do all that coding, and that post could have been a lot shorter if I simply manipulated the equations as follows:

Computing a linear system for parameters a and b

The two equations below the horizontal rules make up the linear system in question. This system has the following solutions:

Solution set for parameters a and b

Since x1 and x2 are both constants in this context, the expressions above are constant expressions. The solution set yields four data points, of which we only need one, and though I haven’t tested this, I’m assuming that they are all equivalent in terms of how they transform the curve. I’m also assuming (for obvious reasons) that the two expressions in the second equation work out to the same value for appropriate values of x1 and x2. If they don’t, this would mean that either the data is imperfect (they may be slightly off because we’re dealing with real-world data and not perfect mathematical models) or I’ve made some mistake in my calculations. When I write my upcoming book on my Coronavirus research, I will go back and refine my research, fixing any mistakes I may have made along the way.

Now that we have a formula for both a and b we can plug in values for x1 and x2 to get numerical values for a and b.

The next step is to determine c and d for the full equation y = c*tanh(axb)+d. To do this, we take two sample points (x3,x3) and (x4,y4) on the sigmoid curve and solve the following system of equations:

Computing a linear system for parameters c and d

Since tanh(axn-b) is a constant in both expressions, it is readily apparent that this is also a linear system and its solution is:

Solution set for parameters c and d

Let’s see how we would apply that to a sample smoothed data set generated from the Coronavirus data we saw earlier. The way we would go about automating this is equally apparent, and basically consists of two steps: First we extract the critical points from the output of the derivatives program developed in Parts 1-3 of this series (with just one additional layer of differentiation, for which the necessary modification to the program is simply a trivial replacement of 3 with 4). Second, we apply the above formulas for the solutions based on the critical points and based on arbitrary sample points taken from the first column of the data.

The flowchart for this program looks like this:

Flowchart for program that fits a mathematical model to the Coronavirus data

Here is the code:


 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 #include <string.h>
 4 #include <math.h>
 5 #include <errno.h>
 6 
 7 int threshold = 10;
 8 
 9 const float arcsinh1 = -0.8813736;
10 const float arcsinh3 = -1.8184465;
11 
12 int critical_point( int *, int size );
13 
14 void mainint argc, char **argv ){
15         FILE *fp;         // Pointer for input file
16         char buf[200];    // String buffer for input file
17         int size = 0;     // Size of integer arrays 
18         int i;            // Loop counter
19         char *token;      // Token used for string parsing
20         int *f0;          // Data points for sigmoid function
21         int *f3;          // Data points for third derivative
22         int *f4;          // Data points for fourth derivative
23         int c3, c4;       // Critical points for f3 and f4
24         int s1, s2;       // Sample points from f0
25         float a, b, c, d; // Parameters to be determined
26         if( !(fp = fopen( argv[1], "r" )) ){
27                 fprintfstderr"%s%s%s\n", argv[0], argv[1], strerror( errno ) );
28                 exit( errno );
29         }
30         // Count lines in file:
31         whilefgetc( fp ) != EOF ){
32                 fgets( buf, 200, fp );
33                 size++;
34         }
35         rewind( fp );
36         // Set up integer arrays:
37         f0 = (int *) mallocsizeofint ) * size );
38         f3 = (int *) mallocsizeofint ) * size );
39         f4 = (int *) mallocsizeofint ) * size );
40         // Read integer data from file:
41         for( i = size-1; i >= 0; i-- ){
42                 fgets( buf, 200, fp );
43                 strtok( buf, \t\r\n" ); // Skip date
44                 token = strtokNULL\t\r\n" );
45                 f0[i] = atoi( token );
46                 strtokNULL\t\r\n" ); // Skip f1
47                 strtokNULL\t\r\n" ); // Skip f2
48                 token = strtokNULL\t\r\n" );
49                 f3[i] = atoi( token );
50                 token = strtokNULL\t\r\n" );
51                 f4[i] = atoi( token );
52         }
53         fclose( fp );
54         // Calculate parameters a and b:
55         c3 = critical_point( f3, size );
56         c4 = critical_point( f4, size );
57         a = (arcsinh1 - arcsinh3)/(c3 - c4);
58         b = (c3 * (arcsinh1 - arcsinh3))/(c3 - c4) - arcsinh1;
59         // Calculate parameters c and d:
60         s1 = size/3;
61         s2 = 2*s1;
62         c = (f0[s1] - f0[s2])/(tanh(a*s1-b) - tanh(a*s2-b));
63         d = f0[s1] - (f0[s1] - f0[s2])*tanh(a*s1-b)/(tanh(a*s1-b) - tanh(a*s2-b));
64         // Print output:
65         printf"f = %.3f*tanh(%.3ft-%.3f)+%.3f", c, a, b, d );
66 }
67 
68 // Find the first point where the curve hits zero
69 int critical_point( int *f, int size ){
70         int i = 0;
71         while( i < size && f[i] < threshold ){
72         // This part accounts for anomalies where the
73         // data may dip below zero very early on
74                 i++;
75         }
76         while( i < size && f[i] > 0 ){
77                 i++;
78         }
79         if( i == size ){
80                 fprintfstderr"Critical point not found.\n" );
81                 exit( -1 );
82         }
83         // Return whichever point is closer to zero:
84         ifabs( f[i] ) < abs( f[i-1] ) ) return i;
85         return i-1;
86 }

Aaaaand here at long last is the Holy Grail of my research: the mathematical function to fit the Coronavirus data:

solution

I might have to tweak this a bit, because the values for c and d look a little bit large, but maybe they work out to the right function anyway. I chose negative values for my two arcsine constants because if I used the positive values I got negative parameters, which weren’t as aesthetically pleasing. I’m sure they figure out to the same curve in the end. At this point I don’t really care anymore. I’m kinda just trying to get this project over with because I’ve been working on it for three weeks and it’s getting kinda stale now. It’s no longer exciting or interesting to me, and I really just want to get it out of the way so I can move on to the next project. As I said, I’ll go back and fix any mistakes I made when I write the book.

There’s still the question of what to do with the numerical calculation program that I’ve now discarded from my research. I think I’m going to keep it. The reason for this is that even though it’s no longer important to the Corona-Chan project, it still represents an interesting coding excursion that may provide inspiration for future projects. And that’s really what this blog is all about – exploring new and interesting possibilities. So I really don’t care if that part of the research has been rendered useless by the present part; it’s still pretty neat.

So yeah, I think I’m done now. Well, except for the next part where I go back and analyze the data some more and figure out what sort of time frame we’re looking at. But that’s just a matter of applying the research I’ve already done. The actual meat of this project – the calculations and computations, the code, the flowcharts, the graphs, the theories – I’m done with all that.

So now I’ll have to find something else to occupy myself during the quarantine, which I’m guessing will go on for at least another month. Currently I’m in double quarantine, because I’m under the same worldwide quarantine everyone else is under, and I’m also under a quarantine ordered by my doctor as I’ve started having Coronavirus symptoms myself. I don’t know, maybe it’s bad karma for not taking this disease seriously. It’s pretty lame karma at any rate, because all I have is a mild cough. Thank Cthulhu for my top-of-the-line immune system. 😛

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s