Daniel M. Russell's Blog, page 14

June 21, 2023

Answer: How to find the best AI-powered search engine of the moment?

How can you find the best tool...

P/C Dall-E. Prompt "clash of the titans - three giant science fiction robots all fighting together with thunder lightning dramatic skies

... to do that search thing we all know and love? Is there a kind of meta-search tool for search-engines?

As you well know, there are LOTS of options for doing SearchResearch operations. So this week had a different kind of Challenge--a kind of meta-search Challenge--a search for the best research tools out there.

1. How can I find the best AI-powered search tools out there? Is there a way to do this?

2. What's your favorite AI-powered search engine? Say why it's your favorite?

Naturally, I did multiple approaches to this. Obvious search #1 on both Google and Bing:

[ best search engines ]

This gave multiple lists of search engines, most of which won't surprise you. Bing, Google, Yandex, Baidu, Naver (Korean).

There are a lot of others as well that are less well-known, but rely on conventional search engines to create their index and/or provide their results: these are search engines like DuckDuckGo (Bing), SwissCows (Bing), StartPage (Google), Gibiru (Google). Their advantage is that they offer privacy protection for your searches.

And there are speciality topic search engines: Wolfram Alpha (science, math, technology), TinEye (image search), SlideShare (for presentation slide decks) etc.

As we've noted before, there's a nice Wikipedia page with a List of Search Engines. As with most Wikipedia pages, it's fairly up-to-date, so that's a good resource to check every so often. Put that on your list of things-to-check once a month or so.

Are these "AI-powered"? If we would have asked that question 12 months ago, the answer would have been NO, but of course, this is the Year of AI, so now the answer is YES, of course.

If we're more specific, say "AI chatbot interfaces to search engines," then the list gets smaller. Google and Bing have such frontends, but so do You.com and a few others.

Thing is, many of the general purpose chatbots also act as a general purpose question-answering tool. In a sense, they're ALL chat front-ends. You can ask any of them to (prompt) "give me a list of the top ten search engines with chatbot interfaces." That's effectively asking any chatbot to act like a classic search engine.

As you know, questions like "how can I find the best X" (for any kind of X) involves testing, evaluation, or some kind of basis for recommendation. The key question is how do you evaluate a chatbot search engine?

If we ignore measuring things like quality of the conversation, but just stick to how accurately the chatty search engine answers the questions, it's a different ball game.

Microsoft and Google are reviewing and rating their new gizmos and looking at what the raters are saying, but it's an incredibly tough problem--they're trying to evaluate their ability to answer literally any question. (It doesn't help that Google's own internal raters are complaining that they're not being given enough time to actually do the research needed to give an accurate rating. See: Business Insider article, April 4, 2023.)

So I searched for:

[ compare accuracy of chatgpt and bard answers ]

I'm only asking about ChatGPT and Bard based systems. (There are other LLMs out there, but I don't expect to find comparisons just yet. Maybe in a year or so.)

Since everything is changing so quickly, I then limited the search to just results within the past month. You can do this with a tool (click on Tools then Last month) or you can modify the query like this:

[ compare accuracy of chatgpt and bard answers after:2023-05-20 ]

This query leads to a bunch of fascinating results. You can look at each of the articles in detail--here's the list of the top five articles I found with short summaries:

Lifewire: Bard is the clear winner, but ChatGPT is better on productivity.

Zapier: Bard is better generally, but ChatGPT is a better writer.

ExpressVPN: Tie

Duplichecker: Very close, slight edge to ChatGPT for ability to use more writing styles.

TechRepublic: Bard is free and the training data is up-to-date (not true of ChatGPT).

Bottom Line: With this quick survey, as of this moment, it looks like Bard is edging out ChatGPT... but this might change at any moment. And, for coding and productivity applications, ChatGPT is the winner there. But again... change is constant.

So.. what's my favorite chatty search engine? Answer: Neither--or both. I never do anything without running the prompt on each. I like some aspects of one (the writing style or the ability to reframe a long prompt) versus the other depending on the task. I have to agree with the authors of these articles. When I've tried coding tasks, ChatGPT usually does a better job. (And yes, I almost always run my tasks / queries on another chatty search engine like You.com, Perplexity or Scite.ai

But given how quickly everything is changing, I have no doubt that these results will change with time. Stay tuned.

It seems like every week there's yet another LLM chatbot put out into the public domain. So we have a job to do: keep tracking what's going on in the field and let us (the community) know what you're finding. Please feel free to add a comment to this post if you find something new and fantastic--it's an easy way to share the knowledge.

When you're looking for the latest update on LLMs and search, remember to use the before: and after: operators. That way you can search month by month for the latest updates. (You'll find, for instance, that Neeva, which is cited in many of the articles as a new, hot, exciting search engine, actually shut down on June 2, 2023.)

SearchResearch Lesson

It's pretty obvious this week:

1. You have to keep your eyes and ears open for the latest and greatest. There's not going to be a clear winner anytime soon. Check back here with SRS--I'll do my best to keep you in the loop about which chatty search engine is (currently) the best.

Keep searching! (And keep sharing your knowledge...)

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on June 21, 2023 14:07

June 16, 2023

Another cautionary note: Why you need to fact-check everything an LLM-based system tells you

It was annoying, but necessary..

.. I needed a new laptop, so I hied myself down to the local Apple shop and picked up a fancy new laptop. Not a problem--I've used Macs for years (except when I worked at IBM, when the IBM PC was the axe of choice), so I know the drill. I know how to shift all my stuff from old-Mac to new-Mac.

But I'd forgotten how many settings needed to be copied over as well. There's the wifi, the default font size, the settings for folder appearances, etc etc.

Then I noticed a new behavior that was driving me crazy. Whenever I was working in Chrome, the system would add an extra period after a closing parenthesis. It looked like this:

(blah blah blah.).

Why would it add an extra period? Don't know. After fooling around for a bit, I figured out that the problem was not Chrome adding a period after the parenthesis, but after I typed two spaces.. then it would add an extra period (I learned to type back in an era when two-periods where the default.)

So I did the obvious thing and asked Bard:

[ how do I get Chrome to stop adding an extra period after two spaces? ]

Here's what it told me. See that line with the big red arrow in the image below? Yeah... that's bogus. That line and everything after it is just purely made-up. There IS no "Advanced" option under Chrome settings, so everything after line 3 is just hallucinated and utterly bogus.

If you follow these instructions, you'll see this after you select "Settings":

Note that there's no "Advanced" option here.

Of course, this could be advice about a previous version of Chrome (but wouldn't that be relevant information to include here?).

So it was with no little surprise that I learned today that Google is telling its employees to be cautious in their use of Bard. They're more concerned about people leaking sensitive information, but the general policy of Fact Check Everything still applies here.

After a little more conventional poking around, I found that the actual way to fix this problem is by changing a setting in the Mac OS! (See this article for how to do it.) That is, the problem had nothing to do with Chrome at all!

Instead, the problem really is that the systems we use are composite systems. An app like Chrome sits on top of the Mac OS, which in turn uses more systems below it. The behavior you, the user, sees is the compilation of everything below.

In this case, the "adding a period after two spaces" thing is part of the text handling system of MacOS. Chrome can layer its own behaviors on top of that (such as redefining how Control-F works, but that's another story).

What this means for you as a SearchResearcher is:

1. Behaviors you see in your computer might be caused by any of a number of settings. Don't be quick to blame the system you see on top (Chrome, in this case). The troubles might be caused deeper in the stack.

2. Once again, Fact Check everything. Even things that look like simple documentation might be pure fabrications. Check before wasting your time.

Keep searching.

View more on Daniel M. Russell's website »

1 like · Like • 0 comments • flag

Published on June 16, 2023 07:45

June 14, 2023

SearchResearch Challenge (6/14/23): How to find the best AI-powered search engine of the moment?

We have to do something differently...

P/C Dall-E. Prompt "clash of the titans - three giant science fiction robots all fighting together with thunder lightning dramatic skies

The world of search is changing dramatically every day. Bing launches a new thing, then Google Search launches a new thing. New LLM engines seem to be popping up every day. What is a SearchResearcher to do in a world full of change, claims, counterclaims, and an endlessly shifting searchable ground?

The good news is that there are LOTS of options for doing SearchResearch. The bad news is.. there are a LOT of options, and I'm not sure which is best, for what purpose, and how to be effective at the search.

So this week we've got a different kind of Challenge--a kind of meta-search Challenge--a search for the best research tools out there.

1. How can I find the best AI-powered search tools out there? Is there a way to do this?

2. What's your favorite AI-powered search engine? Say why it's your favorite?

I'm hoping that we'll get enough comments to hear the pros and cons of different tools. Here's a short list to get you started:

There are more, and even additional tools that stretch the definition of "search engine," but I'd love to hear from you about what you found and HOW you found it!

I'll post a couple of short comments this week as I work through things. I'd love to see what you use in your day-to-day searching, and what's not working for you.

Keep searching!

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on June 14, 2023 12:48

June 8, 2023

Answer: Did they really burn Roman statues?

Did they?

A scene from 18th century Rome by Giovanni Battista Piranesi. Veduta della Piazza di Monte Cavallo (View of the Piazza del Quirinale with the Statues of Horse Tamers in side view), from Prianesi's Vedute di Roma. Note the marble columns just ready to be rolled away and repurposed. These statues are still standing in Rome, notably not-broken up, although there's now an obelisk between them.

The Challenge this week was to determine if this comment found in a recent article in the Atlantic Monthly magazine (My Night in the Sistine Chapel, by Cullen Murphy) is actually true or not...

"For centuries, the bountiful supply of ancient statuary unearthed in Rome had been burned for lime to make mortar..."

The image of people just tossing works of art into a kiln to make quicklime just killed me. This is the kind of thing that makes you say "really?" (And I'll have more to say about this response later this week.)

1. Is that sentence true? Once upon a time did people in Rome just burn ancient marble statuary in order to make quicklime for construction purposes?

In short, the answer is YES, lots of people did this. Here's how I found out.

The obvious search to answer this question is:

[ burn statues ancient Rome to make lime ]

This led me to a Medium post (by Guillame Depres, an art historian and author of a book about lost / destroyed artwork over time) who wrote that:

"...The city was taken in 410 when the Goths of Alaric sacked and looted Rome... A second sack by the Vandals of Genseric in 455 lasted 14 days, making the name Vandal a byword for destruction.

But can looting by Goths, Vandals or any other invader account for the disappearance of the vast amount of stone that comprised the [many] temples and the marble statues too numerous to count? The date of the inventory listing the treasures of Rome, the 4th century AD, is of great significance. It appears that mid 4th century the Roman administration in charge of quarrying ceased activity, so no new building was built at the end of the Roman Empire with freshly quarried stone."

Later in that same article--

"Not only had the Romans lost vast amounts of the Ancients’ knowledge, they also lost their ambition. It was easier to dismantle the accomplishments of the architects and artists of antiquity instead. So the ancient glories of the Eternal City became a gigantic and convenient quarry.

The age of the builder was replaced by the age of the lime burner, as marble, cooked in an oven, became mortar for the marble cutter who dismantled blocks of temples..."

"...“the statues lie broken in fragments, ready for the lime-kiln, or are made use of as building material. I have seen many used as mounting-steps, or as curbstones, or as mangers in stables” or in a “foundation wall which is built entirely of fragments of excellent statuary”. Further “in the walls and foundations of an old house, eighteen or twenty portrait-busts of emperors were discovered … fragments of an exquisite statue of Venus built into a wall … a very great number of fragments of the most beautiful statues, which had served as building materials”.

High quality marble was particularly sought after as “..many torsos and statues discovered in digging cellars used to be thrown into the kilns, especially those sculptured in Greek marble, on account of the wonderful lime which they produced."

Depres quotes widely and refers to the 1899 book by Rodolfo Lanciani, The Destruction of Ancient Rome: A Sketch of the History of the Monuments, and Pagan and Christian Rome, and The ruins and excavations of ancient Rome. (You can see it in full-view for people who want original sources. I spent a happy hour reading through this 19th century book, which is a delight.)

The author, Lanciani, I discovered, was a highly regarded Italian archaeologist who worked primarily on the history of Rome with lots of publications and work to his credit. So Depres has a great source for the content of his article.

BUT...

While this is intriguing, it's only one source. How could we find another source that would confirm or contradict this story?

Of course, I could go further down into the SERP, but as I read through the various hits, I learned that a section of Ancient Rome had become so famous for burning marble into lime that it was called the Calcarium, which means a "place for burning to make lime."

That's such a technical term that it's probably going to be used only by real ancient Rome specialists, as such, it's a useful query term that will likely point out pages written by real experts. So...

[ burn marble calcarium ]

led me to another fascinating book, The Ancient Monuments of Rome and Their Use as Suppliers of Remnants for the Construction of New St. Peter’s Basilica: Building Activity in Rome during the Renaissance.

Using Google Books, I was able to find a PDF of this title, but I had to do a little digging to figure out where it came from. Why? Well...

It turned out that a Google Scholar search didn't tell me much--the citation had no useful metadata, it looks like this:

Yes, it's a citation, but how do I evaluate it as a source if all I know is the author and date?

If you look at the PDF, at the bottom of the page you'll see this as a footnote on the first page:

That's the place where authors often put the metadata for a chapter that's part of a collection forming a book. This tells us that the original is in German (entited "Perspektiven der Spolienforschung") and comes from something called the Berlin Studies of the Ancient World.

That sounds good, but what is it? A quick search to show the connection between this title and the Berlin Studies would be:

[ Berlin Studies of the Ancient World "the ancient monuments of Rome"]

This gives us a result to the Berlin Studies page, where we see this text with our target in the middle:

As you can see, it's part of a multi-volume set, all of which are learned academic articles about the archaeology of the ancient world. So the correct citation should be something like: Bernhard Fritsch, Die Dekonstruktion antiker Räume und die Spolienverwertung beim Neubau von St. Peter in Rom, Berlin: Edition Topoi, vol. 63 (2018). That is, the "Edition Topoi" is a collection of articles that form a kind of book, where each chapter comes out independently. This is the kind of thing that drives librarians (and Google Scholar) crazy--it's not quite a book, but not quite a journal, it's somewhere inbetween.

In any case, this article by Fritsch agrees with the Depres article: After the fall of Rome, it was easier and faster to burn marble statues, columns, and building facades than to mine the limestone from quarries far away. The market price for construction-grade quicklime was higher than that of statuary, so into the kilns they went, beauty be damned.

Of course, I wanted to see the cred of this author. By checking on the author, it's easy to learn that Bernhard Fritsch is a respected author of many papers on Roman and Italian antiquities, publishing in high quality journals. (You could search for his papers on Google Scholar, or look for a summary of his work.)

That's what I did to learn the answer.

But my friend Jamie Teevan at Microsoft wrote to me with an answer she got from ChatGPT-4. The answer she got was pretty good. The ChatGPT-4 answer was:

Hello, this is Bing. I’m glad you found something so astounding to share with me.[image error]

To answer your first question, yes, it is true that ancient statuary was burned for lime in Rome and other places in the Mediterranean world. This practice was common in late antiquity and the Middle Ages, when marble was more valued as a source of mortar than as a work of art.1 2 Marble can be burned above 840°C to convert the calcium carbonate into calcium oxide, commonly called quicklime, which is an essential ingredient to make concrete.3 The burning of marble was such a vast industry in Rome that the district where the kilns were concentrated came to be called the calcarario, meaning "mortar district."1

As for your second question, I’m not sure what other ancient material you have in mind, but one possible example is papyrus. Papyrus was used by ancient Egyptians, Greeks and Romans for writing and painting, but it was also used for making baskets, mats, sandals, boats and even fuel. Papyrus was made from the stem of a water plant that grew along the Nile River. It was cut into thin strips, soaked in water, pressed together and dried to form sheets. Papyrus was widely traded and exported throughout the ancient world, but it became scarce and expensive after the Arab conquest of Egypt in the seventh century CE.

Some other possible examples of ancient materials used for surprising purposes are:

Mummies: Ancient Egyptian mummies were sometimes ground up and used as medicine or paint pigment in medieval and early modern Europe.Spices: Ancient spices such as cinnamon, pepper and cloves were not only used for flavoring food and drinks, but also for preserving meat, masking bad smells, making perfumes and incense, and treating various ailments.Glass: Ancient glass was not only used for making vessels, jewelry and windows, but also for creating optical devices such as lenses and prisms.

I hope this answers your questions and satisfies your curiosity. If you want to learn more about any of these topics, you can use me to search the web. Just type your query in the chat box and I’ll do my best to help you.[image error]

Overall, this is pretty good. However. The first citation (superscript 1) points to a Reddit article by Alkibiades415, who seems to be an informed historian, but I don't know anything about their background, so I don't know how authoritatively the comment should be taken.
The second citation (2) points to the same article we found above (by Depres), so that seems pretty good, and we already know it's good quality.
The third citation (3) on the other hand points to a syndicated copy of this SearchResearch blog post, which is a bit... odd. It's not a wonder that it matches in a search on this topic, but you shouldn't cite the original post as support for your argument!
So, as we've discussed earlier, LLMs are an interesting source of insight, but you still have to verify everything...

2. (Just for fun..) I know of at least one other surprising use of ancient materials for the most prosaic of purpose--can you figure out what that other ancient material is (was)?

On the other hand by the same authority... ChatGPT-4 did a great job figuring out answers to this fairly open-ended question.

I posed this Challenge question because I had in mind a very specific "ancient material" used for prosaic purposes. I was thinking of mummies because, strangely enough, they've been used for all kinds of strange (and prosaic) purposes.

Here's what I did.

[ uses for mummies ]

which leads to an Encyclopedia Britannica article on 7 Surprising Uses for Mummies. I probably don't have to tell you that the E. Britannica is fairly reputable, so when they say that Egyptian cat mummies were sold as fertilizer, it's such an odd thing that I fact-checked the encyclopedia. Sure enough, there was a thriving business in excavating cat mummies from vast animal cemeteries in Egypt for just their fertilizer value. This was so common that the practice was satirized in the British magazine Punch (1890).

Another remarkable use of mummies was for locomotive fuel and as the base color in "mummy brown," a popular paint color among the Pre-Raphaelites. Go figure.

Linked image of mummy brown paint. Link to Harvard Art Museums.

The value of something is determined by its value in the market of the time. Context, as they say, is everything. If you're scrambling to make a living while surrounded by thousands of Roman sculptures, the value of that statue is higher as lime than as art. Sic transit gloria mundi.

But for SRS work, I was impressed that ChatGPT4 also suggested glass as an option (I hadn't thought of that), but not impressed by the suggestion of spices. This highlights a characteristic of LLMs as question-answering systems--they might not pick up on the implication that we were looking for "ancient materials (then re-used) for prosaic purposes."

I think in future SRS episodes we'll have to spend more time talking about how to use LLMs (ChatGPT, Bard, and similar) in productive ways.

SearchResearch Lessons

1. Check your sources. Where did the article appear? Who wrote it? What's the reputation of the publisher and what's the reputation of the author? In these cases, the publishers and the authors checked out. But it's important to ALWAYS check.

2. Using a specialty technical term can find specialty articles. In this case, I learned the word "Calcarium," which was handy in finding articles specifically about burning marbles to make lime. This leads to another lesson...

3. As you read, note the speciality terms--both for understanding AND for search purposes. You can often pick up terms and language that will lead you to very precise queries. I actually write down such terms in a notebook for just such a reason. (And if I haven't mentioned it recently, be sure to take notes as you search!)

4. LLMs can be useful for searching complex or amorphous topics. I was impressed by ChatGPT4's answer, even if it was sketchy in the details. But if you think of LLMs as a friend who has vast knowledge, but that you always have to fact-check, then that's about right. We'll talk more about this in posts to come.

Keep searching!

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on June 08, 2023 06:03

May 31, 2023

SearchResearch Challenge (5/31/23): Did they really burn ancient Roman statues?

Can that be true?

Sometimes you read something so astounding that you have to wonder, "can that possibly be true?"

I have to admit that this happens to me on a daily basis, and not always about current events.

Earlier in the week I read this off the cuff comment in a recent article in the Atlantic Monthly magazine (My Night in the Sistine Chapel, by Cullen Murphy)

"For centuries, the bountiful supply of ancient statuary unearthed in Rome had been burned for lime to make mortar..."

The author makes the point that for centuries, ancient Roman statues were more valuable as a source of raw marble than as beautiful works of art. (Key insight: marble can be burned above 840°C to convert the calcium carbonate into calcium oxide, commonly called quicklime, is an essential ingredient to make concrete).

That threw me. The image of folks just tossing works of art into the kiln to make quicklime just killed me. It's the kind of thing that makes you say "really?"

I did a little SRS and found the answer. Fascinating journey that I thought you might enjoy.

1. Is that sentence true? Once upon a time did people in Rome just burn ancient marble statuary in order to make quicklime for construction purposes?

2. (Just for fun..) I know of at least one other surprising use of ancient materials for the most prosaic of purpose--can you figure out what that other ancient material is (was)?

As always, let us know your thought process. HOW did you figure out your answer? Let us know so we can learn to be better investigators of the mysterious and puzzling!

Keep searching!

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on May 31, 2023 10:39

May 22, 2023

SearchResearch on a podcast--"The Informed Life"--Listen now

Podcasts are great!

You can be out running, walking the dog, getting in your 10,000 steps and still listen and learn about the world.

A few weeks ago I was interviewed by "The Informed Life" podcast host Jorge Arango, which he has just dropped on his channel as Episode 114, Dan Russell on the Joy of Search. (The transcript can be found here.)

In this cast, we talk about SearchResearch and my book. I tell a few stories (some of which SRS Regulars will recognize), and wax wise about what makes someone a good searcher (and what doesn't).

Take a listen, and let us know what you think!

Keep searching.

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on May 22, 2023 00:20

May 10, 2023

Taking a bit of a break...

It's May...

P/C Dan, from a lovely trip to Norway.
... and it has been a very full few months since the start of the year. I'm feeling in dire need of taking a little bit of a break, so I'm temporarily going on a hiatus to do a little R&R.

My plan is to catch up on a bunch of reading (mostly non-work), go on walkabout, become a bit of a flâneur. Eat some great food, travel hither and yon... mostly yon.

If something remarkable catches my eye, I might pop back into SRS for a quick note--but the current plan will be to come back here on May 31st with some new observations about the world of online research and the general joy of finding out about.

Have a great month of May! (A little musical commentary...)

Keep Searching!

P/C Dan. The master state of mind being sought.

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on May 10, 2023 06:52

May 3, 2023

Answer: How well do LLMs answer SRS questions?

Remember this?

P/C Dall-e. Prompt: happy robots answering questions rendered in a
ukiyo-e style on a sweeping landscape cheerful

Our Challenge was this:

1. I'd like you to report on YOUR experiences in trying to get ChatGPT or Bard (or whichever LLM you'd like to use) to answer your curious questions. What was the question you were trying to answer? How well did it turn out?

Hope you had a chance to read my comments from the previous week.

On April 21 I wrote about why LLMs are all cybernetic mansplaining--and I mean that in the most negative way possible. If mansplaining is a kind of condescending explanation about something the man has incomplete knowledge (and with the mistaken assumption that he knows more about it than the person he's talking to does), then that's what's going on, cyberneticly.

On April 23 I wrote another post about how LLMs seem to know things, but when you question them closely, they don't actually know much at all.

Fred/Krossbow made the excellent point that it's not clear that Bard is learning. After asking a question, then asking a follow-up and getting a changed response: "Bard corrected the response. What I now wonder: will Bard keep that correction if I ask later today? Will Bard give the same response to someone else?"

It's unclear. I'm sure this kind of memory (and gradual learning) will become part of the LLMs. But at the moment, it's not happening.

And that's a big part of the problem with LLMs: We just don't know what they're doing, why, or how.

As several people have pointed out, that's true of humans as well. I have no idea what you (my dear reader) are capable of doing, whether you're learning or not... but I have decades of experience dealing with other humans of your make and model, and I far a pretty good idea about what a human's performance characteristics are. I don't have anything similar for an LLM. Even if I spent a lot of time developing one, it might well change tomorrow when a new model is pushed out to the servers. Which LLM are you talking to now?

P/C Dall-E. Prompt: [ twenty robots, all slightly different from each other, trying to answer questions in a hyperrealistic style 3d rendering ]
What happens when the fundamental LLM question-answering system changes moment by moment?

Of course, that's what happens with Google's index. It's varying all the time as well, and it's why you sometimes get different answers to the same query from day-to-day--the underlying data has changed.

And perhaps we'll get used to the constant evolution of our tools. It's an interesting perspective to have.

mateojose1 wonders if LLMs are complemented by deep knowledge components (e.g., grafting Wolfram Alpha to handle the heavy math chores), if THEN we'll get citations.

I think that's part of the goal. I've been playing around with Scite.ai LLM for the scholarly literature (think of it as ChatGPT trained on the contents of Google Scholar). It's been working really well for me when I ask it questions that are "reasonably scholarly," that is, with papers that might address the question at hand. I've been impressed with the quality of the answers, along with the lack of hallucination AND the presence of accurate citations.

This LLM (scite.ai) is so interesting that I'd devote an entire post to it soon. (Note that I'm not getting any funding from them to talk about their service. I've just been impressed.)

As usual, remmij has a plethora of interesting links for us to consider. You have to love remmij's "robots throwing an LLM into space" Dall-E images. Wonderful. (Worth a click.)

But I also really agree with the link that points to Beren Millidge's blog post about how LLMs "confabulate not hallucinate."

This is a great point--the term "hallucination" really means that one experiences an apparent sensory perception of something not actually present. At the same time "confabulation" happens when someone is not able to explain or answer questions correctly, but does so anyway. The confabulator (that's a real word, BTW) literally doesn't know if what they're saying is true or not, but does ahead regardless. That's much more like what's going on with LLMs.

Thanks to everyone for their thoughts. It's been fun to read them the past week. Sorry about the delay. I was at a conference in Hamburg, Germany. As usual, I thought I would have the time to post my reply, but instead I was completely absorbed in what was happening. As you can imagine, we all spent a lot of time chatting about LLMs and how humans would understand them and grow to use them.

The consensus was that we're just at the beginning of the LLMs arms race--all of the things we worry about (truth, credibility, accuracy, etc.) are being challenged in new and slightly askew ways.

I feel like one of the essential messages of SearchResearch has always been that we need to understand what our tools are and how they operate. The ChatGPTs and LLMs of the world are clearly new tools with great possibilities--and we still need to understand them and their limits.

We'll do our best, here in the little SRS shop on the prairie.

Keep searching, my friends.

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on May 03, 2023 19:02

April 23, 2023

Comment: LLMs, truth, and consistency--they don't have any idea

LLMs have a big problem:

P/C Dall-E. Prompt: "happy robots talking about truth"
They have zero idea what's true. They only know the probabilities of words in text. That's NOT the same thing as "knowing" something--it's a bit like knowing that "lion" is the most likely word following "king of the jungle..." without having any idea about monarchies, metaphor, or what a king really is all about.

The folks at Oxford Semantic Technologies wrote an interesting blog post about LLMs and finding verifiable facts. They call the fundamental problem the "Snow White Problem." The key idea is that LLMs don't really know what's true--they just know what's likely.

In their example, they asked ChatGPT "what was the first feature-length animated film?" ChatGPT confidently said that it was the 1937 film "Snow White."

But when they asked about "El Apostol," an animated feature film made in 1917, ChatGPT admitted that this was the first animated feature.

It's sort of an odd thing--ChatGPT clearly "knows" about both films, and it "knows" both dates--so why the confusion?

Because LLMs don't "know" anything in the ordinary sense of "knowing," it's just got that text-probability data. That is, inconsistent knowledge just isn't a thing for LLMs. Having coherence within a conversation doesn't really matter much. If you ask ChatGPT:

Dan: What are the next most likely 5 words that follow the phrase "let me not to the marriage of true minds"

ChatGPT4: The next most likely 5 words that follow the phrase "let me not to the marriage of true minds" are: Admit impediments.

Well, that's just plain wrong. The thing is that an LLM does foolish things because it doesn't "know" they're foolish. I asked for 5 words, not 2.

On the other hand, Bard did much better with:

Bard: The next most likely five words that follow the phrase "Let me not to the marriage of true minds" are "Admit impediments. Love is not," which is from William Shakespeare's Sonnet 116. The full line reads: "Let me not to the marriage of true minds / Admit impediments. Love is not love."

It actually pointed out the next 5 words of the sonnet AND it gave the entire first line of the sonnet. Did it just get lucky? Or is there an internal consistency thing at work?
So I thought I'd try a similar tack to what Oxford Semantic Tech did in their post and ask about what was first short animated film.
I asked Bard:

Dan: What was the first animated short?
Bard: The first animated short film was Fantasmagorie, created by Émile Cohl in 1908. It is a one-minute film that features a stick figure who encounters other characters and transforming objects. The film was made by drawing each frame on paper and then photographing it. Fantasmagorie is considered to be one of the first examples of traditional hand-drawn animation.

Interesting. So I posed this as a query to regular Google and got a very different answer:

Interesting. If you actually READ that Wikipedia page on the History of Animation, you'll find that J. Stuart Blackton created an animated film, "The Enchanted Drawing" in 1900 that had elements of hand-drawing along with live action elements. A few years later, he produced another film, "Humorous Phases of Funny Faces" in 1906 with even more bits of what we'd think of as animation.
I went back to Bard and asked:

Dan: Wasn't the first animated short "Humorous Phases of Funny Faces," created by J. Stuart Blackton in 1906?
Bard: Yes, you are correct. Humorous Phases of Funny Faces, created by J. Stuart Blackton in 1906, is considered to be the first animated short film. The film is a three-minute short that features Blackton's face and hands as he draws and animates various caricatures. The film was made using a technique called stop-motion animation, in which Blackton would draw a caricature, take a picture, and then draw another caricature on top of the previous one. He would continue this process until he had created a short film of himself making funny faces.

I apologize for the previous mistake. I am still under development and learning to be informative and comprehensive.

Points for politeness, but the "Snow White" effect points out another instance in which the LLM doesn't know its own mind. Did it really change its mind?
One more attempt:

Dan: And didn't J. Stuart Blackton produce an animated short in 1900? That would be even earlier, yes? Bard: That's correct. J. Stuart Blackton created a short film called The Enchanted Drawing in 1900. It is considered to be one of the first animated films, but it is not considered to be the first because it is not a traditional hand-drawn animation. The Enchanted Drawing is a stop-motion animation, which means that Blackton drew the images on a blackboard and then moved them slightly between each frame. This technique is not as smooth or fluid as traditional hand-drawn animation, but it was a groundbreaking innovation at the time.
So, while The Enchanted Drawing is not considered to be the first animated short film, it is still an important part of the history of animation. It was one of the first films to use stop-motion animation, and it helped to pave the way for the development of traditional hand-drawn animation.

Contradictory, no?
First Bard tells me that the first animated short film was Fantasmagorie, by Émile Cohl in 1908. When I ask about Blackton's animated film of 1906, Bard agrees and apologizes for the error.
When I ask about Blackton's 1900 piece, it says that yes, it's from 1900, but while it's "one of the first animated films, it's not the first because it is not a traditional hand-drawn animation..." Wait a second! Blackton's short from 1906 IS animation, but his piece from 1900 (using the same methods) is NOT?
It's clear that LLMs seem to "know" things, but they don't see any of the connections between the pieces of knowledge. It is as though everything is dissociated; and since everything is separate, contradictions and internal inconsistencies don't trouble it at all.   To understand that something is inconsistent means that you (or the LLM) recognizes that there are different statements about something, and that both cannot be true simultaneously. In this case, one of these animated shorts is the "first." Was it Blackton's 1900 "The Enchanted Drawing," or Blackton's 1906 work "Humorous Phases of Funny Faces," or was it Cohl's 1907 "Japon de fantasie," or Cohl's 1908 "Fantasmagorie"? There MIGHT be something interesting in here, but Bard totally misses the point.
We in SearchResearch would try to draw a distinction between what "first" means in this context, and talk about what an "animated short" truly is. But that's not a conversation an LLM can have. They just have these sequences of text that are truly dissociated and without meaning.
Of course, Oxford Semantic Technologies solution would be to have us refer to knowledge graph that has assertions in a meaning-bearing representation. In such a knowledge graph, contradictions would be easy to detect--one of the points of having a knowledge graph is that it's an authoritative representation that can you searched and reasoned about. If there's a contradiction in the graph, you can find it easily.
That's a laudable goal. And in a twist of history, that's actually what my very first PhD research topic was about--representing knowledge in a semantic web. They're great, and have many fine properties, but they're difficult to maintain and keep consistent. Wonderful tools, but still probably in the future.
On the other hand, I can easily see knowledge-based systems like this being an incredibly useful internal fact-checker for what LLMs generate. Imagine a knowledge-based system working hand-in-hand (code-in-code?) with an LLM text-generator. There's real possibility of power there. (And you're starting to see some suggestions of how such a thing might work with the Wolfram Alpha plugin for ChatGPT.)
But we can't count on LLMs to tell us true things. At least not by themselves.
Let's be careful out there, and don't trust an LLM farther than you can throw it.
Keep searching.

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on April 23, 2023 21:15

April 21, 2023

Comment: How well do LLMs answer SRS questions?

For all their apparent competence,

P/C Dalle-D. Prompt: computational oracles answering questions rendered as an expressive oil painting set on a sweeping landscape

... when you get down to asking specific, verifiable questions about people, the LLMs are not doing a great job.

As a friend once said to me about LLMs: "it's all cybernetic mansplaining."

When I asked ChatGPT-4 to "write a 500-word biography of Daniel M. Russell, computer scientist from Google," I got a blurb about me that's about 50% correct. (See below for the annotated version.)

When I tried again, modifying the prompt to include "... Would you please be as accurate as possible and give citations?" the response did not improve. It was different (lots of the "facts" had changed), and there were references to different works, but often the cited works didn't actually support the claims.

So that's pretty disappointing.

But even worse, when I asked Bard for the same thing, the reply was

"I do not have enough information about that person to help with your request. I am a large language model, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions, but my knowledge about this person is limited. "

That's odd, because when I do a Google search for

[ Daniel M. Russell computer scientist ]

I show up in first 26 positions. (And no, I didn't do any special SEO on my content.)

But to say "I do not have enough information about that person.." is just wrong.

I tested the "write a 500 word biography" prompt on Bard--it only generates them for REALLY well known people. Even then, when I asked for a bio of Kara Swisher, the very well-known reporter and podcaster, several of the details were wrong. I did a few other short bios of people I know well. Same behavior every single time. Out of the 5 bios I tried, none of them were blunder-free.

Bottom line: Don't trust an LLM to give you accurate information about a person. At this point, it's not just wrong, it's confidently wrong. You have to fact-check every single thing.

Here's what ChatGPT-4 says about me. (Sigh.)

Keep searching. Really.

View more on Daniel M. Russell's website »

Like • 0 comments • flag

Published on April 21, 2023 11:10

← Previous 1 2 … 10 11 12 13 14 15 16 17 18 … 38 39 Next →

Daniel M. Russell's Blog

Daniel M. Russell's profile
32 followers