Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

January 16, 2023 09:04 pm GMT

Combining GPT and Wolfram Alpha

An interesting essay appeared on Wolfram|Alpha's blog today:
Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT. In it, the author (Stephen Wolfram?) argues that ChatGPT and Wolfram|Alpha complement each other the latter being particularly good at numerical, mathematical, and computational tasks where ChatGPT is weak.

So I decided to try this for myself! Free API keys are available for both GPT-3.5 and Wolfram|Alpha, making them accessible in environments like Mini Micro using http.get and http.post. So my basic idea was: let's make a chat room with both of these powerful AIs on hand, and let either one jump in as they can?

My first thought was to use Wolfram|Alpha's Fast Query Recognizer API, which sounds like it's intended for exactly this purpose. But I couldn't get it to work; no matter what I try, it always replies "invalid appid" (using the same App ID that works just fine with other endpoints).

So in the end, I just pass every query first to Wolfram|Alpha, and if it can't deal with it, it returns an error code, and I then hand it to ChatGPT. As a refinement, I allow the user to direct a query directly to either AI, by prefacing their input with "Wolfram" or "GPT" (or any of several synonyms). And if input is just one of these names plus a question mark, the program repeats the last input, but directed toward the AI of interest.

Example

You didn't come here to hear me talk you came to see what the AIs have to say! So let's dive right in with an example session.

  User: What's the third largest country in South America?+ WolframAlpha: The 3rd largest country in South America by + total area is Peru. The total area of Peru is about 496225 + square miles.  User: How many cows do they have?- GPT: According to the Food and Agriculture Organization of - the United Nations, Peru had an estimated 11.7 million - cattle as of 2019.  User: Wolfram, how many cattle are there in Peru?+ WolframAlpha: In 2020, Peru had about 5.64 million cattle + as livestock.  User: GPT?- GPT: According to the Food and Agriculture Organization of - the United Nations, Peru had an estimated 11.7 million - cattle as of 2019.

This example illustrates the key benefits of this team-AI system. The first question, a factual geography question, is the sort of thing Wolfram|Alpha is good at. It speaks up and answers my question directly, and you can trust its answer. But my follow-up question, "How many cows do they have?", was not understood by Wolfram|Alpha, whose natural language processing (NLP) is much more limited than GPT. So it declines to answer, and GPT jumps in, giving me a plausible-sounding answer of 11.7 million cows.

Plausible, but is it right? I rephrased my question and directed it to Wolfram|Alpha. This time it understood me, and gave me an accurate answer of 5.64 million cows. GPT was off by more than a factor of two. So I asked GPT to reconsider (saying simply "GPT?"). But GPT stuck to its guns, repeating its previous incorrect answer.

Wolfram|Alpha Knows Its Facts

As the example above demonstrates, when asked a factual question, GPT just makes up an answer. Sometimes it's in the right ballpark, and other times it's wildly off.

Having them both in the room is helpful because if your question is one that Wolfram can answer, it will answer first (and correctly). If not, GPT will make something up, but then you can ask follow-up questions of Wolfram to check its facts.

Wolfram|Alpha Is Really Stupid

Despite being a mathematical genius, Wolfram|Alpha often falls down and fails to understand even simple queries. (This happens so often, one might even suspect that the examples in the essay were cherry-picked!) For example, the first thing I tried today seemed like a question that would be right up Wolfram|Alpha's alley: how heavy are 30 elephants?

Check out this exchange:

  User: How much would 30 elephants weigh?- GPT: According to WolframAlpha, a group of 30 elephants - would weigh approximately 1,800,000 pounds.  User: Wolfram?+ WolframAlpha: Error: Wolfram|Alpha did not understand your + input  User: Wolfram, how much does an elephant weigh?+ WolframAlpha: The typical weight of an African bush + elephant is 4400 to 13000 pounds.  User: Wolfram, what is 30 times that?+ WolframAlpha: 30 times Interval of the list Rational of+ 200000000000 and 45359237, Rational of 610000000000 and + 45359237 pounds is equivalent to a mass of 60 million to + 183 million grams.  User: What is that in pounds?+ WolframAlpha: 6000000000000 over 45359237 to 18300000000000 + over 45359237 pounds is equivalent to 160754 to 490299 troy + pounds.  User: GPT, what does one elephant weigh?- GPT: According to WolframAlpha, the typical weight of an - African bush elephant is 4400 to 13000 pounds.  User: What is 30 times 13000 pounds?+ WolframAlpha: 30 times 13000 troy pounds is equivalent to a + mass of about 320914 pounds.  User: GPT?- GPT: According to WolframAlpha, 30 times 13000 pounds is - equivalent to about 320914 pounds.

Though I expected Wolfram|Alpha to take this one, it did not, so GPT confidently responded that "According to WolframAlpha", the answer was 1.8 million pounds. But this is wildly wrong; as previously noted, it's just making stuff up. But when I asked WolframAlpha the same question, I got "Error: Wolfram|Alpha did not understand your input."

So I backed up and asked it "how much does an elephant weigh?" It gave me a good answer. Then I asked it "what is 30 times that?" (While Wolfram|Alpha does not take the entire chat history as context, as GPT does, it does have some limited internal context tracking.) It gave me an answer but in terms of "Interval of the list Rational" and grams, making it very hard to understand. And asking it to convert that to pounds made it hardly any more comprehensible.

Incidentally, the correct answer (which I finally busted out a calculator and computed myself) is 132,000 to 390,000 pounds. ChatGPT's answer was off by almost a factor of 5. Wolfram's answer was correct, I guess, except it insisted on giving it to me in "troy pounds" for some reason. A troy pound is only 3/4 a normal pound, as I learned from this later exchange:

  User: What is a "troy pound"?+ WolframAlpha: The definition of the word "troy pound" is: + an apothecary weight equal to 12 ounces or 373.242 grams.   User: How many ounces in a normal pound?- GPT: According to WolframAlpha, a normal pound is equal to - 16 ounces.

Notice that GPT often says "According to WolframAlpha". This was an unexpected side-effect of my initial context prompt, which includes "Also present is WolframAlpha, which is very good at math and numerical facts." I think GPT is trying to play along by pretending to consult WolframAlpha even when that bot has not actually chimed in.

GPT Pays Attention

GPT does get the entire chat history as context, and so any fact that Wolfram|Alpha says first, it will accept and use.

  User: How far is it to Ceres right now?+ WolframAlpha: The distance from Earth to 1 Ceres at 10:51 + A.M. Mountain Standard Time, Monday, January 16, 2023 is + about 2.06 astronomical units.  User: How far is that in km?+ WolframAlpha: The distance from Earth to 1 Ceres at 10:52 + A.M. Mountain Standard Time, Monday, January 16, 2023 is + about 308 million kilometers.  User: How long would it take to get there at 100 kph?- GPT: At a speed of 100 kilometers per hour, it would take - approximately 3,080,000 hours or 128,333.33 days to reach 1 - Ceres from Earth.

This is a good example of the two working together; Wolfram|Alpha understood my first two queries, and gave me a reliable result, but failed to understand my third one. Stepping in, GPT did the math, using the numbers Wolfram|Alpha had already given us. I checked this result by hand, and it was correct. And in other experiments, it seems like GPT is usually reliable at this sort of thing, if the facts it needs are already in the conversation.

If it doesn't have the facts, of course, it just makes something up. In a fresh session:

  User: GPT, how long would it take to get to Ceres at 100   kph?- GPT: It would take approximately 7.5 years to reach the - dwarf planet Ceres at a speed of 100 kilometers per hour.

7.5 years? No. The correct answer (128,333 days) is over 351 years.

GPT Won't Back Down

Once GPT gives a wrong answer, it sticks to its guns. Even if you try to draw its attention to its error! Watch GPT steadfastly arguing that Ceres (radius 292 miles) is bigger than Mercury (radius 1516 miles).

  User: What moons in the solar system are bigger than   Mercury?+ WolframAlpha: The planetary moons with average radius + bigger than that of Mercury are Ganymede and Titan.  User: Are there any dwarf planets that big?- GPT: Yes, there are five known dwarf planets in the Solar - System that are larger than Mercury, including Ceres, - Haumea, Makemake, Eris, and Pluto.  User: what are the five biggest dwarf planets?+ WolframAlpha: The five biggest dwarf planets by average + radius are Pluto, 136199 Eris (2003 UB313), and 1 Ceres. + The average radius of Pluto is about 739.37 miles. The + average radius of 136199 Eris (2003 UB313) is about 722.7 + miles. The average radius of 1 Ceres is about 292 miles.  User: What is the radius of Mercury?+ WolframAlpha: The atomic radius of mercury is 150 + picometers.  User: No I mean the planet Mercury.- GPT: The average radius of the planet Mercury is about 1516 - miles.  User: Wolfram, what is the radius of the planet Mercury?+ WolframAlpha: The average radius of Mercury is about 1520 + miles.  User: Is Ceres bigger than Mercury?- GPT: Yes, Ceres is larger than Mercury. The average radius - of Ceres is about 292 miles, whereas the average radius of  - Mercury is about 1516 miles.  User: 292 is not larger than 1516.- GPT: You are correct, 292 is not larger than 1516. However, - the average radius of Ceres is about 292 miles, whereas the  - average radius of Mercury is about 1516 miles, making Ceres - larger than Mercury.

Now we know what it feels like to be gaslit by an AI!

How it works

I described in a previous post how to get set up with Mini Micro and interface with GPT-3. For this project, I just added a bit more code to interface with Wolfram|Alpha.

The key bit is a new wolframResponse function:

wolframResponse = function(query)    url = "http://api.wolframalpha.com/v1/conversation.jsp" +    "?appid=" + wolframAppId +    "&i=" + urlEncode(query)    if conversationID then        url = url + "&conversationid=" + conversationID    end if    if conversationS then        url = url + "&s=" + conversationS    end if    rawResult = http.get(url)    result = json.parse(rawResult)    if result.hasIndex("error") and result.error then        return "Error: " + result.error    end if    globals.conversationID = result.conversationID    if result.hasIndex("s") then        globals.conversationS = result.s    end if    return result.resultend function

In addition to returning the response, this updates a couple of global variables, conversationID and conversationS, to store small tokens that the API uses to keep track of context for follow-up questions.

The main program is then elaborated with a bit of logic to (1) check whether the input directly addresses either AI, and (2) if not, give WolframAlpha a first crack at it, falling back on GPT if the WolframAlpha API returns an error.

Here's the complete program, in case you're curious or want to try it yourself.

import "json"import "stringUtil"import "listUtil"import "dateTime"wolframAppId = "V4Q9R5-UEKX2X6XAV"urlSafeChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" +  "abcdefghijklmnopqrstuvwxyz0123456789-_.~"urlEncode = function(s)    bytes = new RawData    bytes.resize s.len * 3    len = bytes.setUtf8(0, s)    result = []    for i in range(0, len-1)        b = bytes.byte(i)        c = char(b)        if urlSafeChars.contains(c) then            result.push c        else            result.push "%" + hex2(b)        end if    end for    return result.join("")end functionwolframCheckQuery = function(query)    url = "http://www.wolframalpha.com/queryrecognizer/query.jsp" +    "?appid=" + wolframAppId +    "&mode=Default" +    "&output=json" +    "&i=" + urlEncode(query)    globals.rawResult = http.get(url)    globals.result = json.parse(rawResult)    if result.hasIndex("error") and result.error then        print "Error: " + result.msg        return false    end if    return result.query.acceptedend function// tokens Alpha uses to maintain conversational context:conversationID = ""conversationS = ""wolframResponse = function(query)    url = "http://api.wolframalpha.com/v1/conversation.jsp" +    "?appid=" + wolframAppId +    "&i=" + urlEncode(query)    if conversationID then        url = url + "&conversationid=" + conversationID    end if    if conversationS then        url = url + "&s=" + conversationS    end if    globals.rawResult = http.get(url)    globals.result = json.parse(rawResult)    if result.hasIndex("error") and result.error then        return "Error: " + result.error    end if    globals.conversationID = result.conversationID    if result.hasIndex("s") then        globals.conversationS = result.s    end if    return result.resultend functioncontext = []context.push "Assistant is a large language model capable of "context.push "helping the user in many ways. Also present is "context.push "WolframAlpha, which is very good at math and "context.push "numerical facts."context.push "Knowledge cutoff: 2022-09"context.push "Current date: " + dateTime.now("yyyy-MM-dd")apiKey = function    if outer.hasIndex("_apiKey") then return _apiKey    data = file.readLines("/usr/API-key.txt")    if data == null then        print "API-key.txt file not found."        exit    end if    outer._apiKey = data[0]    return _apiKeyend functiongptResponse = function(prompt, temperature=0.5)    url = "https://api.openai.com/v1/completions"    headers = {}    headers["Content-Type"] = "application/json"    headers["Authorization"] = "Bearer " + apiKey    data = {}    data.model = "text-davinci-003"    data.prompt = prompt    data.temperature = temperature    data.max_tokens = 2048    globals.rawResult = http.post(url, json.toJSON(data), headers)    globals.result = json.parse(rawResult)    if result == null or not result.hasIndex("choices") then        return rawResult    end if    return result.choices[0].text.trimend functionsplitAddress = function(s, possibleNames)    slower = s.lower    for name in possibleNames        if slower.startsWith(name) then            result = [name]            s = s[name.len:]            if s[0] == "," then s = s[1:]            s = s.trim            result.push s            return result        end if    end for    return falseend functionwolframNames = ["wolframalpha", "wolfram alpha", "wolfram", "alpha", "wa"]gptNames = ["gpt", "chatgpt", "openai", "assistant"]clearprint "AI Wonder Twins, unite!"_printMark "(Enter `quit` to exit.)"lastInput = ""while true    inp = input(">")    if inp.lower == "quit" or inp.lower == "exit" then break    // figure out who should take this query (and strip off any direct address)    resp = ""    wolframQuery = splitAddress(inp, wolframNames)    if wolframQuery then        inp = wolframQuery[1]        responder = "WolframAlpha"        if inp == "?" then inp = lastInput    end if    gptQuery = splitAddress(inp, gptNames)    if gptQuery then        inp = gptQuery[1]        responder = "Assistant"        if inp == "?" then inp = lastInput    end if    if not wolframQuery and not gptQuery then        resp = wolframResponse(inp)        if resp.startsWith("Error:") then            resp = ""            responder = "Assistant"        else            responder = "WolframAlpha"        end if    end if    lastInput = inp    context.push "User: " + inp    oldColor = text.color    if responder == "WolframAlpha" then        if not resp then resp = wolframResponse(inp)        context.push responder + ": " + resp        text.color = "#66CC66"    else        context.push responder + ": "        resp = gptResponse(context.join(char(13)))        context[-1] = context[-1] + resp        text.color = color.aqua        responder = "GPT"    end if    for line in (responder + ": " + resp).wrap        print line    end for    text.color = oldColorend while

Conclusions

This was a fun and interesting exercise, and is actually useful it's nice not to have to think "which AI should I go to for this answer," but instead to just ask your question, and let your team tackle it. It's also sometimes helpful that GPT can see the whole conversation, and often fill in where Wolfram|Alpha fails to understand your question.

Wolfram|Alpha is absolutely brilliant at actually understanding numerical and computational concepts, and manipulating these in accurate ways. However, it is still very dumb when it comes to comprehending English. I hope that their R&D team is feverishly working to bolt a large language model like GPT to their Wolfram Language back-end, because right now it fails more often than it succeeds.

One thing I haven't tried yet would be conjoining these bots in a different way: ask GPT to write code in Wolfram Language to represent my query, and then passing that query on to Wolfram's API. That would be essentially doing what I just hoped their R&D team is up to, but without waiting for them to do it.

I'll be sure to post again if I try that, so if this sort of thing interests you, be sure to follow! And if you have any other ideas or feedback, please leave it in the comments below.

Original Link: https://dev.to/joestrout/combining-gpt-and-wolfram-alpha-ma2

Share this article:

View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To