{"id":248,"date":"2016-06-20T11:54:30","date_gmt":"2016-06-20T11:54:30","guid":{"rendered":"http:\/\/deberker.com\/archy\/?p=248"},"modified":"2023-11-26T19:25:53","modified_gmt":"2023-11-26T19:25:53","slug":"deep-learning-book-chapter-3-numerical-computation","status":"publish","type":"post","link":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/","title":{"rendered":"Deep Learning Book Chapter 3: Numerical Computation"},"content":{"rendered":"<p>This chapter was a bit of pot-pourri of things the authors wanted to tell us before we got on to machine\u00a0learning proper. Much of it was straightforward, and then there were a few stingers, such as the bits about directional gradients.<\/p>\n<p><strong>Typical problems we need to be aware of in using a digital computer to do analogue stuff<\/strong><\/p>\n<p><em>Underflow &amp; overflow<\/em><\/p>\n<p>Underflow: where very small numbers &#8211;&gt; zero, and overflow: big ones &#8211;&gt; zero.<\/p>\n<p>For instance,\u00a0for a softmax:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D\" alt=\"softmax(x_i)= \\frac{exp(x_i)}{\\sum_{j=1}^{n} exp(x_j) }\" align=\"absmiddle\" \/><\/p>\n<p>We have an issue if our x&#8217;s are very negative, because by exponentiating them we push them very close to zero, possibly resulting in underflow. Similarly, with huge x, we might get overflow. We can do a bit of\u00a0<b>normalization<\/b> to overcome this: by subtracting max(x), we don&#8217;t change the computation, but we ensure that in the numerator everything is &lt;0, and in the denominator, at least one of the entries is exp(0)=1, which means we&#8217;re not going to get underflow!<\/p>\n<p><em>Poor conditioning<\/em><\/p>\n<p>This is where we have a matrix -remember, we can usefully think of\u00a0<a href=\"http:\/\/deberker.com\/archy\/?p=156\">matrices as transformations<\/a>&#8211; which responds very drastically to small changes in input &#8211; precisely the kind of small changes we might induce by digitizing an analogue variable.<\/p>\n<p>Conditioning is defined by a condition number, which are calculated from the ratio of the maximum to the minimum eigenvalue. High condition numbers are problematic.<\/p>\n<p><strong>Gradient descent (vanilla)<\/strong><\/p>\n<p>Gradient descent is the most obvious weapon in our arsenal for minimising some function, which we want to do a lot. Typically this function is a\u00a0<em>cost<\/em> function of some kind &#8211; and by minimizing it, we maximise some measure of &#8216;goodness&#8217; of our model.<\/p>\n<p>In gradient descent, we work out the derivative of our function to be minimized, then take small steps in the direction\u00a0<em>opposite<\/em> to the gradient.<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=Cost%3D%20y(x)\" alt=\"Cost= y(x)\" align=\"absmiddle\" \/><\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=Gradient%20%3D%20%5Cfrac%7Bd(y)%7D%7Bd(x)%7D\" alt=\"Gradient = \\frac{d(y)}{d(x)}\" align=\"absmiddle\" \/><\/p>\n<p>Why do this? Well, if we&#8217;re in a region where the gradient is positive, this means that increasing X with increase Y. So we don&#8217;t want to do this (we&#8217;re minimising Y, remember). So we should decrease X.<\/p>\n<p>Conversely, if dy\/dx is negative, then if we increase X, we&#8217;ll decrease Y. So that&#8217;s exactly what we should do: increase X.<\/p>\n<p>Nicely illustrated by Figure 4.1 from the book:<a href=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/graddescent.jpg\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-255\" src=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/graddescent.jpg?resize=660%2C519\" alt=\"graddescent\" width=\"660\" height=\"519\" srcset=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/graddescent.jpg?w=700&amp;ssl=1 700w, https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/graddescent.jpg?resize=580%2C456&amp;ssl=1 580w, https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/graddescent.jpg?resize=624%2C490&amp;ssl=1 624w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%5B%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x)%7D%20%5C%5C%0A%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x)%7D%20%20%5C%5C%0A%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x)%7D%20%0A%5D\" alt=\"[\\frac{\\delta(y)}{\\delta(x)} \\\\ \\frac{\\delta(y)}{\\delta(x)} \\\\ \\frac{\\delta(y)}{\\delta(x)} ]\" align=\"absmiddle\" \/><\/a><\/p>\n<p><strong>Gradient descent\u00a0(partial derivatives)<\/strong><\/p>\n<p>What if we have multiple independent variables, and our cost is a function of lots of them:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=Cost%3D%20y(x_1%2Cx_2%2Cx_3)\" alt=\"Cost= y(x_1,x_2,x_3)\" align=\"absmiddle\" \/><\/p>\n<p>Well, we can basically do the same trick, but use partial derivatives. What results is a gradient vector:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%5B%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x_1)%7D%20%5C%5C%0A%5C%5C%0A%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x_2)%7D%20%20%5C%5C%0A%5C%5C%0A%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x_3)%7D%20%0A%5D\" alt=\"[\\frac{\\delta(y)}{\\delta(x_1)} \\\\ \\\\ \\frac{\\delta(y)}{\\delta(x_2)} \\\\ \\\\ \\frac{\\delta(y)}{\\delta(x_3)} ]\" align=\"absmiddle\" \/><\/p>\n<p>The gradient vector is denoted as :\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%5Cbigtriangledown_xf(x)\" alt=\"\\bigtriangledown_xf(x)\" align=\"absmiddle\" \/><\/p>\n<p>In both the single and multiple gradient case, how much we move in the direction of the gradient is a matter of choice: we define a\u00a0<strong>learning rate or step size<\/strong> which describes how much to jump at each step. Too small and everything will be super slow, and too large, and you risk missing the minimum.<\/p>\n<p><strong>Stationary points<\/strong><\/p>\n<p>Stationary points are where the gradient = 0. They can be maxima, minima, or saddle points, best illustrated graphically:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/function-min-max.gif\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-256\" src=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/function-min-max.gif?resize=370%2C180\" alt=\"function-min-max\" width=\"370\" height=\"180\" \/><\/a>s.<\/p>\n<p>Credit <a href=\"https:\/\/www.mathsisfun.com\/calculus\/images\/function-min-max.gif\">here<\/a><\/p>\n<p>We can discriminate between these with the second derivative; positive second derivatives imply a minimum, negative second derivatives a maximum, and zeros a saddle point. This is straightforward in two dimensions, but for higher dimensional problems, we need to turn to the Jacobian.<\/p>\n<p><strong>Jacobians and Hessians<\/strong><\/p>\n<p>Confusing terms, these. A Jacobian is a matrix filled with differentials. The gradient is itself the Jacobian of the original function.<\/p>\n<p>A Hessian is a certain kind of a Jacobian: it&#8217;s the Jacobian of the gradient, when we have multiple variables and the gradient is a vector of partial derivatives.<\/p>\n<p>So:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=y%3Df(x_1%2Cx_2%2Cx_3)%20%20%20%20%3C--%20original%20function%20%5C%5C%0A%0A%5B%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x)%7D%20%5C%5C%20%0A%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x)%7D%20%5C%5C%20%0A%5Cfrac%7B%5Cdelta(y)%7D%7B%5Cdelta(x)%7D%20%5D%0A%0A%20%20%20%20%3C--%20Jacobian(y)%20%3D%20gradient%20%5C%5C%0A\" alt=\"y=f(x_1,x_2,x_3) &lt;-- original function \\\\ [\\frac{\\delta(y)}{\\delta(x)} \\\\ \\frac{\\delta(y)}{\\delta(x)} \\\\ \\frac{\\delta(y)}{\\delta(x)} ] &lt;-- Jacobian(y) = gradient \\\\ \" align=\"absmiddle\" \/><\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%0A%5B%5Cfrac%7B%5Cdelta%5E2(y)%7D%7B%5Cdelta(x_1)%5E2%7D%20%20%20%20%5C%20%5C%20%5C%20%5C%20%0A%5Cfrac%7B%5Cdelta%5E2(y)%7D%7B%5Cdelta(x_1)%20%5Cdelta(x_2)%7D%0A%5Cfrac%7B%5Cdelta%5E2(y)%7D%7B%5Cdelta(x_1)%20%5Cdelta(x_3)%5C%5C%0A......%0A%5D\" alt=\" [\\frac{\\delta^2(y)}{\\delta(x_1)^2} \\ \\ \\ \\ \\frac{\\delta^2(y)}{\\delta(x_1) \\delta(x_2)} \\frac{\\delta^2(y)}{\\delta(x_1) \\delta(x_3)\\\\ ...... ]\" align=\"absmiddle\" \/>\u00a0&lt;&#8212; Jacobian (<img decoding=\"async\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%5Cbigtriangledown_xf(x)\" alt=\"\\bigtriangledown_xf(x)\" \/>) = the\u00a0Hessian (incomplete; my Latex editor started playing up when I went to multiple lines.).<\/p>\n<p><em>Why do we care about the Hessian?<\/em><\/p>\n<p>Two reasons, so far as I can make out:<\/p>\n<ol>\n<li>We can use it to figure out the nature of a multidimensional stationary point, by referring to the eigenvectors:\n<ul>\n<li>All positive \u00a0 \u00a0= minima \u00a0 \u00a0[Hessian = positive definite]<\/li>\n<li>All negative \u00a0 = maxima \u00a0\u00a0[Hessian = negative\u00a0definite]<\/li>\n<li>Mix = saddle point<\/li>\n<\/ul>\n<\/li>\n<li>We can use Newton&#8217;s method to do\u00a0<strong>second-order gradient descent<\/strong><\/li>\n<\/ol>\n<p><strong>Newton&#8217;s Method<\/strong><\/p>\n<p>Traditional gradient descent allows us to chug down our gradient, but it can be very slow.<\/p>\n<p>Newton, widely held to be a pretty clever chap, described a neat method for doing zero-finding for a function &#8211; that is, finding the value of x\u00a0where f(x) = 0. This means that if we use f&#8217; instead of f, and find the value of x where f'(x) =0, <em>we&#8217;ve done minimisation<\/em>. To do this, we need the Hessian, because this provides us with the &#8216;gradient of the gradient&#8217;, so to speak.<\/p>\n<p><em>Using Newton&#8217;s method for zero-finding<\/em><\/p>\n<p>Newton&#8217;s method gives us an iterative way of finding better and better approximations to the roots of y(x).<\/p>\n<p>Graphically, here&#8217;s how it works:<\/p>\n<ol>\n<li>take a value of x, let&#8217;s call it\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=x%7B_n%7D\" alt=\"x{_n}\" align=\"absmiddle\" \/><\/li>\n<li>take the\u00a0<strong>tangent<\/strong> to the line at this point<\/li>\n<li>find the point where the tangent crosses the x-axis<\/li>\n<li>this is your new x: <img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=x_%7Bn%2B1%7D\" alt=\"x_{n+1}\" align=\"absmiddle\" \/><\/li>\n<\/ol>\n<p>And repeat. This is illustrated below (from <a href=\"https:\/\/en.wikipedia.org\/wiki\/Newton%27s_method\">Wikipedia<\/a>):<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/300px-NewtonIteration_Ani.gif\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-269\" src=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/300px-NewtonIteration_Ani.gif?resize=300%2C214\" alt=\"300px-NewtonIteration_Ani\" width=\"300\" height=\"214\" \/><\/a><\/p>\n<p>Numerically, the update formula is:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=x_%7Bn%2B1%7D%20%3D%20x_n%20-%20%5Cfrac%7Bf(x_n)%7D%7Bf'(x_n)%7D%7D\" alt=\"x_{n+1} = x_n - \\frac{f(x_n)}{f'(x_n)}}\" align=\"absmiddle\" \/><\/p>\n<p>How does this relate to the\u00a0graphical intuition above? Well, the formula for a tangent\u00a0is<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=y%3D%20f'(x)%20(x-x_n)%2Bf(x_n)\" alt=\"y= f'(x) (x-x_n)+f(x_n)\" align=\"absmiddle\" \/><\/p>\n<p>And\u00a0we&#8217;re going to find the point where y=0,\u00a0and solve for x=\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=x_%7Bn%2B1%7D\" alt=\"x_{n+1}\" align=\"absmiddle\" \/>:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=0%20%3D%20f'(x_n)(x_%7Bn%2B1%7D-x_n)%20%2B%20f(x_n)%5C%5C%0A%5Cfrac%7B-f(x_n)%7D%7Bf'(x_n)%7D%20%3D%20x-x_n%5C%5C%0Ax_%7Bn%2B1%7D%20%3D%20%20x-%5Cfrac%7Bf(x_n)%7D%7Bf'(x_n)%7D%20%0A\" alt=\"0 = f'(x_n)(x_{n+1}-x_n) + f(x_n)\\\\ \\frac{-f(x_n)}{f'(x_n)} = x-x_n\\\\ x_{n+1} = x-\\frac{f(x_n)}{f'(x_n)} \" align=\"absmiddle\" \/><\/p>\n<p>Now clearly, if we want to do\u00a0<strong>minimisation rather than zero-finding, all we have to do is substitute f for f&#8217;.<\/strong><\/p>\n<p>So we end up with<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=x_%7Bn%2B1%7D%20%3D%20x_n%20-%20%5Cfrac%7Bf'(x_n)%7D%7Bf''(x_n)%7D%7D\" alt=\"x_{n+1} = x_n - \\frac{f'(x_n)}{f''(x_n)}}\" align=\"absmiddle\" \/><\/p>\n<p>And\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f''(x_n)\" alt=\"f''(x_n)\" align=\"absmiddle\" \/>\u00a0is the Hessian.<\/p>\n<p><em>Cool properties of Newton&#8217;s method<\/em><\/p>\n<p>Newton&#8217;s method allows us to\u00a0<strong>jump<\/strong> down the gradient rather than rolling. In particular, if our function is second-order, that means the differential is first-order, and Newton&#8217;s method will\u00a0<strong>find the local minima in a single jump!<\/strong><\/p>\n<p><strong>Constrained optimisation<\/strong><\/p>\n<p><span style=\"color: #ff0000;\">Disclaimer: I&#8217;m not sure I understand this very well and I don&#8217;t think it&#8217;s terribly important (right now), so apologies if it&#8217;s not very clear<\/span><\/p>\n<p>The\u00a0Krush-Kahn-Tucker (KKT) approach is useful for adding constraint to optimizations. It&#8217;s a development of the Lagrangian method, which allows you to specify some equality; KKT allows you to specify inequalities too.<\/p>\n<p>We can use it when we want to minimise\/maximise some function <img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f(x%2Cy%2C...n)\" alt=\"f(x,y,...n)\" align=\"absmiddle\" \/>, with some constraints <img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=g(x%2Cy%2C...n)%20%3D%20c\" alt=\"g(x,y,...n) = c\" align=\"absmiddle\" \/>, where <img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=c\" alt=\"c\" align=\"absmiddle\" \/>\u00a0is some constant. Note that we can only use it when the constraining function is also a function of the same input variables i.e.\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=x%2Cy%2C...n\" alt=\"x,y,...n\" align=\"absmiddle\" \/>\u00a0match for\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f\" alt=\"f\" align=\"absmiddle\" \/>\u00a0and\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=g\" alt=\"g\" align=\"absmiddle\" \/>.<\/p>\n<p>Concrete example &#8211; from <a href=\"https:\/\/www.khanacademy.org\/math\/multivariable-calculus\/applications-of-multivariable-derivatives\/constrained-optimization\/a\/lagrange-multipliers-single-constraint\">Khan Academy&#8217;s peerless summary<\/a>:<\/p>\n<p>Let&#8217;s say we have a simple function:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f(x%2Cy)%20%3D%202x%20%2B%20y.\" alt=\"f(x,y) = 2x + y.\" align=\"absmiddle\" \/><\/p>\n<p>We want to maximise it, but we&#8217;re only interested in points on a circle, where:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=x%5E2%20%2B%20y%5E2%20%3D%201.%0A\" alt=\"x^2 + y^2 = 1. \" align=\"absmiddle\" \/><\/p>\n<p>To rephrase our question: we want to ask &#8216;where, on my circle, is the function f greatest&#8217;?<\/p>\n<p>This is equivalent to projecting our circle onto a linear plane (defined by the function<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f\" alt=\"f\" align=\"absmiddle\" \/> ):<\/p>\n<p><span class=\"embed-youtube\" style=\"text-align:center; display: block;\"><iframe loading=\"lazy\" class=\"youtube-player\" width=\"660\" height=\"372\" src=\"https:\/\/www.youtube.com\/embed\/KiR7dPaBFm0?version=3&#038;rel=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;fs=1&#038;hl=en-US&#038;autohide=2&#038;wmode=transparent\" allowfullscreen=\"true\" style=\"border:0;\" sandbox=\"allow-scripts allow-same-origin allow-popups allow-presentation allow-popups-to-escape-sandbox\"><\/iframe><\/span><\/p>\n<p>Ok. now for the clever bit.<\/p>\n<p>Solving our constrained optimisation involves finding the maximum of function<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f\" alt=\"f\" align=\"absmiddle\" \/>\u00a0whilst still meeting our constraining criterion,\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=g\" alt=\"g\" align=\"absmiddle\" \/>. It turns out that the values of\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f(x%2Cy)\" alt=\"f(x,y)\" align=\"absmiddle\" \/>\u00a0that we&#8217;re looking on are\u00a0<strong>contour\u00a0lines of\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f\" alt=\"f\" align=\"absmiddle\" \/>\u00a0which are tangential to the function\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=g\" alt=\"g\" align=\"absmiddle\" \/>.<\/strong><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/tangent.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-251\" src=\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2016\/06\/tangent.png?resize=481%2C464\" alt=\"https:\/\/www.khanacademy.org\/math\/multivariable-calculus\/applications-of-multivariable-derivatives\/constrained-optimization\/a\/lagrange-multipliers-single-constraint\" width=\"481\" height=\"464\" \/><\/a><\/p>\n<p>Credit: <a href=\"https:\/\/www.khanacademy.org\/math\/multivariable-calculus\/applications-of-multivariable-derivatives\/constrained-optimization\/a\/lagrange-multipliers-single-constraint\">Khan Academy.<\/a><\/p>\n<p>Now: if two contour lines are\u00a0<strong>tangent<\/strong>, this means that their gradients are\u00a0<strong>parallel\u00a0<\/strong>i.e. they share a direction (the vector might have different lengths or directions, but that doesn&#8217;t matter).<\/p>\n<p>Let\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=x_0\" alt=\"x_0\" align=\"absmiddle\" \/><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=y_0\" alt=\"y_0\" align=\"absmiddle\" \/>\u00a0be a point where the two are tangent. In that case:<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%5Cbigtriangledown%20f(x_0%2Cy_0)%20%3D%20%5Clambda%5Cbigtriangledown%20g(x_0%2Cy_0)%20\" alt=\"\\bigtriangledown f(x_0,y_0) = \\lambda\\bigtriangledown g(x_0,y_0) \" align=\"absmiddle\" \/><\/p>\n<p>I.e. the gradient of\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=f\" alt=\"f\" align=\"absmiddle\" \/>\u00a0can be obtained from that of\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=g\" alt=\"g\" align=\"absmiddle\" \/>\u00a0by multiplication by some constant\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%5Clambda\" alt=\"\\lambda\" align=\"absmiddle\" \/>.\u00a0This is the<strong> Lagrange multiplier.<\/strong><\/p>\n<p>Lagrange&#8217;s formulation thus provides a helper function which helps us crack open our constrained optimisation.<\/p>\n<p><img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=L(x%2Cy%2C%5Clambda)%20%3D%20f(x%2Cy)%20%2B%20%5Clambda%20(g(x%2Cy)-c)\" alt=\"L(x,y,\\lambda) = f(x,y) + \\lambda (g(x,y)-c)\" align=\"absmiddle\" \/><\/p>\n<p>Where\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=L\" alt=\"L\" align=\"absmiddle\" \/>\u00a0is a Lagrangian,\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%5Clambda\" alt=\"\\lambda\" align=\"absmiddle\" \/>\u00a0is our Lagrange multiplier, and\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=c\" alt=\"c\" align=\"absmiddle\" \/>\u00a0is the constraint upon\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=g\" alt=\"g\" align=\"absmiddle\" \/>\u00a0(i.e.\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=g(x%2Cy)%20%3D%20c\" alt=\"g(x,y) = c\" align=\"absmiddle\" \/>).<\/p>\n<p>I think that by bringing\u00a0<img decoding=\"async\" class=\"mathtex-equation-editor\" src=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=%5Clambda\" alt=\"\\lambda\" align=\"absmiddle\" \/>\u00a0inside our optimisation, we elevate it from being a &#8216;soft constraint&#8217;, for which we have to describe some penalty function, to a &#8216;hard constraint&#8217; &#8211; we directly optimise for lambda.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This chapter was a bit of pot-pourri of things the authors wanted to tell us before we got on to machine\u00a0learning proper. Much of it was straightforward, and then there were a few stingers, such as the bits about directional gradients. Typical problems we need to be aware of in using a digital computer to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"hide_page_title":"","_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[6],"tags":[],"class_list":["post-248","post","type-post","status-publish","format-standard","hentry","category-neural-networks"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Deep Learning Book Chapter 3: Numerical Computation - Archy de Berker<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning Book Chapter 3: Numerical Computation - Archy de Berker\" \/>\n<meta property=\"og:description\" content=\"This chapter was a bit of pot-pourri of things the authors wanted to tell us before we got on to machine\u00a0learning proper. Much of it was straightforward, and then there were a few stingers, such as the bits about directional gradients. Typical problems we need to be aware of in using a digital computer to [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/\" \/>\n<meta property=\"og:site_name\" content=\"Archy de Berker\" \/>\n<meta property=\"article:published_time\" content=\"2016-06-20T11:54:30+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-11-26T19:25:53+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D\" \/>\n<meta name=\"author\" content=\"archy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@archydeb\" \/>\n<meta name=\"twitter:site\" content=\"@archydeb\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"archy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/\"},\"author\":{\"name\":\"archy\",\"@id\":\"https:\/\/deberker.com\/archy\/#\/schema\/person\/01cf8dd0f94a4ba124b26eeeeb59e67d\"},\"headline\":\"Deep Learning Book Chapter 3: Numerical Computation\",\"datePublished\":\"2016-06-20T11:54:30+00:00\",\"dateModified\":\"2023-11-26T19:25:53+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/\"},\"wordCount\":1320,\"publisher\":{\"@id\":\"https:\/\/deberker.com\/archy\/#\/schema\/person\/01cf8dd0f94a4ba124b26eeeeb59e67d\"},\"image\":{\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D\",\"articleSection\":[\"Neural networks\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/\",\"url\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/\",\"name\":\"Deep Learning Book Chapter 3: Numerical Computation - Archy de Berker\",\"isPartOf\":{\"@id\":\"https:\/\/deberker.com\/archy\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D\",\"datePublished\":\"2016-06-20T11:54:30+00:00\",\"dateModified\":\"2023-11-26T19:25:53+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#primaryimage\",\"url\":\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D\",\"contentUrl\":\"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/deberker.com\/archy\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning Book Chapter 3: Numerical Computation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/deberker.com\/archy\/#website\",\"url\":\"https:\/\/deberker.com\/archy\/\",\"name\":\"Archy de Berker\",\"description\":\"Building things with data\",\"publisher\":{\"@id\":\"https:\/\/deberker.com\/archy\/#\/schema\/person\/01cf8dd0f94a4ba124b26eeeeb59e67d\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/deberker.com\/archy\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/deberker.com\/archy\/#\/schema\/person\/01cf8dd0f94a4ba124b26eeeeb59e67d\",\"name\":\"archy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/deberker.com\/archy\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2021\/09\/freelance-logo.png?fit=359%2C311&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2021\/09\/freelance-logo.png?fit=359%2C311&ssl=1\",\"width\":359,\"height\":311,\"caption\":\"archy\"},\"logo\":{\"@id\":\"https:\/\/deberker.com\/archy\/#\/schema\/person\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/archydeb\"],\"url\":\"https:\/\/deberker.com\/archy\/author\/archy\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Learning Book Chapter 3: Numerical Computation - Archy de Berker","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/","og_locale":"en_US","og_type":"article","og_title":"Deep Learning Book Chapter 3: Numerical Computation - Archy de Berker","og_description":"This chapter was a bit of pot-pourri of things the authors wanted to tell us before we got on to machine\u00a0learning proper. Much of it was straightforward, and then there were a few stingers, such as the bits about directional gradients. Typical problems we need to be aware of in using a digital computer to [&hellip;]","og_url":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/","og_site_name":"Archy de Berker","article_published_time":"2016-06-20T11:54:30+00:00","article_modified_time":"2023-11-26T19:25:53+00:00","og_image":[{"url":"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D","type":"","width":"","height":""}],"author":"archy","twitter_card":"summary_large_image","twitter_creator":"@archydeb","twitter_site":"@archydeb","twitter_misc":{"Written by":"archy","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#article","isPartOf":{"@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/"},"author":{"name":"archy","@id":"https:\/\/deberker.com\/archy\/#\/schema\/person\/01cf8dd0f94a4ba124b26eeeeb59e67d"},"headline":"Deep Learning Book Chapter 3: Numerical Computation","datePublished":"2016-06-20T11:54:30+00:00","dateModified":"2023-11-26T19:25:53+00:00","mainEntityOfPage":{"@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/"},"wordCount":1320,"publisher":{"@id":"https:\/\/deberker.com\/archy\/#\/schema\/person\/01cf8dd0f94a4ba124b26eeeeb59e67d"},"image":{"@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#primaryimage"},"thumbnailUrl":"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D","articleSection":["Neural networks"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/","url":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/","name":"Deep Learning Book Chapter 3: Numerical Computation - Archy de Berker","isPartOf":{"@id":"https:\/\/deberker.com\/archy\/#website"},"primaryImageOfPage":{"@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#primaryimage"},"image":{"@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#primaryimage"},"thumbnailUrl":"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D","datePublished":"2016-06-20T11:54:30+00:00","dateModified":"2023-11-26T19:25:53+00:00","breadcrumb":{"@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#primaryimage","url":"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D","contentUrl":"http:\/\/chart.apis.google.com\/chart?cht=tx&amp;chl=softmax(x_i)%3D%20%5Cfrac%7Bexp(x_i)%7D%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20exp(x_j)%20%7D"},{"@type":"BreadcrumbList","@id":"https:\/\/deberker.com\/archy\/deep-learning-book-chapter-3-numerical-computation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/deberker.com\/archy\/"},{"@type":"ListItem","position":2,"name":"Deep Learning Book Chapter 3: Numerical Computation"}]},{"@type":"WebSite","@id":"https:\/\/deberker.com\/archy\/#website","url":"https:\/\/deberker.com\/archy\/","name":"Archy de Berker","description":"Building things with data","publisher":{"@id":"https:\/\/deberker.com\/archy\/#\/schema\/person\/01cf8dd0f94a4ba124b26eeeeb59e67d"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/deberker.com\/archy\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/deberker.com\/archy\/#\/schema\/person\/01cf8dd0f94a4ba124b26eeeeb59e67d","name":"archy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/deberker.com\/archy\/#\/schema\/person\/image\/","url":"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2021\/09\/freelance-logo.png?fit=359%2C311&ssl=1","contentUrl":"https:\/\/i0.wp.com\/deberker.com\/archy\/wp-content\/uploads\/2021\/09\/freelance-logo.png?fit=359%2C311&ssl=1","width":359,"height":311,"caption":"archy"},"logo":{"@id":"https:\/\/deberker.com\/archy\/#\/schema\/person\/image\/"},"sameAs":["https:\/\/x.com\/archydeb"],"url":"https:\/\/deberker.com\/archy\/author\/archy\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4cGwe-40","jetpack-related-posts":[],"post_mailing_queue_ids":[],"_links":{"self":[{"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/posts\/248","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/comments?post=248"}],"version-history":[{"count":8,"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/posts\/248\/revisions"}],"predecessor-version":[{"id":1160,"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/posts\/248\/revisions\/1160"}],"wp:attachment":[{"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/media?parent=248"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/categories?post=248"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/deberker.com\/archy\/wp-json\/wp\/v2\/tags?post=248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}