Jump to content

Wikipedia:Signs of AI writing

From Wikipedia, the free encyclopedia
A screenshot of ChatGPT reading: "[header] Legacy & Interpretation [body] The "Black Hole Edition" is not just a meme — it's a celebration of grassroots car culture, where ideas are limitless and fun is more important than spec sheets. Whether powered by a rotary engine, a V8 swap, or an imagined fighter jet turbine, the Miata remains the canvas for car enthusiasts worldwide."
LLMs tend to have an identifiable writing style.

This is a list of writing and formatting conventions typical of AI chatbots such as ChatGPT, with real examples taken from Wikipedia articles and drafts. Its purpose is to act as a field guide in helping detect undisclosed AI-generated content. Note that not all text featuring the following indicators is AI-generated; large language models (LLMs), which power AI-chatbots, have been trained on human writing, and some people may share a similar writing style.

The listed observations are empirical statements, not normative statements (except notes on how strong an indicator something should be taken to be). The latter are contained in Wikipedia's policies and guidelines. Any normative content about what kind of formatting or language not to use in articles is not topical here; it might belong in (and is probably already present in) the Manual of Style.

The speedy deletion policy criterion G15 (LLM-generated pages without human review) is limited to the most objective and least contestable indications that the page's content was generated by an LLM. There are three such indicators, the first of which is detailed in § Communication intended for the user and the other two in § Citations. Other signs are not sufficient by themselves for this criterion but are indicators that the text should be handled per the guidelines at Wikipedia:Large language models § Handling suspected LLM-generated content and Wikipedia:WikiProject AI Cleanup/Guide.

Signs of AI writing should be treated as signs of a potential problem, not the problem itself. While their superficial problems are often immediately obvious and easy to fix—e.g., excessive boldface, poor wordsmithing, broken markup, citation style quirks, stray text—they can point to an underlying issue with LLM-generated content with policy risks that are an order of magnitude more serious. The hardest cases are those where the LLM-generated text has been polished enough (whether by the original editor or subsequent editors or bots) to hide these surface defects, while the dire internal problems persist. Depending on the circumstances, it may be unwise to fix those outward issues without also addressing or flagging those deeper concerns.

Language and tone

[edit]

Undue emphasis on symbolism and importance

[edit]

LLM writing often puffs up the importance of the subject matter with reminders that it represents or contributes to a broader topic. There seems to be only a small repertoire of ways that it writes these reminders, so if they are otherwise appropriate it would be best to reword them anyway.

When talking about biology (e.g. when asked to discuss a given animal or plant species), LLMs tend to put too much emphasis on the species' conservation status and the efforts to protect it, even if the status is unknown and no serious efforts exist.

Examples

Douera enjoys close proximity to the capital city, Algiers, further enhancing its significance as a dynamic hub of activity and culture. With its coastal charm and convenient location, Douera captivates both residents and visitors alike[...]

— From this revision to Douéra

Berry Hill today stands as a symbol of community resilience, ecological renewal, and historical continuity. Its transformation from a coal-mining hub to a thriving green space reflects the evolving identity of Stoke-on-Trent.

Promotional language

[edit]

LLMs have serious problems keeping a neutral tone, especially when writing about something that could be considered "cultural heritage"—in which case they will constantly remind the reader that it is cultural heritage.

Examples

Nestled within the breathtaking region of Gonder in Ethiopia, Alamata Raya Kobo stands as a vibrant town with a rich cultural heritage and a significant place within the Amhara region. From its scenic landscapes to its historical landmarks, Alamata Raya Kobo offers visitors a fascinating glimpse into the diverse tapestry of Ethiopia. In this article, we will explore the unique characteristics that make Alamata Raya Kobo a town worth visiting and shed light on its significance within the Amhara region.

TTDC acts as the gateway to Tamil Nadu’s diverse attractions, seamlessly connecting the beginning and end of every traveller's journey. It offers dependable, value-driven experiences that showcase the state’s rich history, spiritual heritage, and natural beauty.

Editorializing

[edit]

LLMs often introduce their own interpretation, analysis, and opinions in their writing, even when they are asked to write neutrally, violating the policy No original research. Editorializing can appear through specific words or phrases or within broader sentence structures. This indicator often overlaps with other language and tone indicators in this list. Note that humans and especially new editors often make this mistake as well.

Examples

A defining feature of FSP models is their ability to simulate environmental interactions.

Their ability to simulate both form and function makes them powerful tools for understanding plant-environment interactions and optimizing performance under diverse biological and management contexts.

— From the same page

In this case, the whole sentence comprises an original opinion:

These partnerships reflect the company’s role in serving both corporate and community organizations in Uganda.

Overuse of certain conjunctions

[edit]

While moderate use of connecting words or phrases is an essential element of good prose, LLMs tend to overuse them. This overuse often is a byproduct of an essay-like tone which is typical of LLM writing but is inappropriate for Wikipedia. Certain conjunctives such as 'however' or 'in contrast' inherently imply synthesis of facts, and are generally unsuitable for Wikipedia articles.

Examples

AI-generated example

The methodology's strength is its grounding in iterative, mixed-method development cycles that combine theoretical analysis with practical feedback. Its emphasis on prototyping and empirical validation supports early identification of design shortcomings, while the use of scenario-based design and claims analysis helps make the rationale behind design choices explicit and testable. Furthermore, the incorporation of values—as operational design elements rather than abstract principles—helps bring ethical and societal concerns into concrete design and evaluation processes.

At the same time, several areas for improvement remain. For example, while the methodology supports transdisciplinary collaboration in principle, applying it effectively in large, heterogeneous teams can be challenging. Coordinating between cognitive scientists, engineers, designers, and domain experts requires careful facilitation and often additional effort in communication and documentation.

Another area for further development involves the scalability of design patterns and ontologies across domains. While abstraction is a key feature of the methodology, generalizing knowledge without oversimplifying context-specific constraints remains an ongoing tension. Similarly, methods for operationalizing and measuring values—especially those that are contested or context-dependent—can benefit from more robust frameworks and shared benchmarks.

SCE continues to evolve in response to these challenges. Its current form provides a solid foundation for responsible system development, particularly in settings where human judgment, collaboration, and adaptation remain essential. However, its long-term value will depend on ongoing methodological refinement, empirical validation in diverse settings, and sustained attention to issues of interdisciplinary coordination and value negotiation.

Compare the above with below human-written prose. Even disregarding the absence of buzzwords and general vapidness, the connecting phrases in the below excerpt are more varied and less conspicuous.

Human-written example

Social heuristics can include heuristics that use social information, operate in social contexts, or both. Examples of social information include information about the behavior of a social entity or the properties of a social system, while nonsocial information is information about something physical. Contexts in which an organism may use social heuristics can include "games against nature" and "social games". In games against nature, the organism strives to predict natural occurrences (such as the weather) or competes against other natural forces to accomplish something. In social games, the organism is making decisions in a situation that involves other social beings. Importantly, in social games, the most adaptive course of action also depends on the decisions and behavior of the other actors. For instance, the follow-the-majority heuristic uses social information as inputs but is not necessarily applied in a social context, while the equity-heuristic uses non-social information but can be applied in a social context such as the allocation of parental resources amongst offspring.

Within social psychology, some researchers have viewed heuristics as closely linked to cognitive biases. Others have argued that these biases result from the application of social heuristics depending on the structure of the environment that they operate in. Researchers in the latter approach treat the study of social heuristics as closely linked to social rationality, a field of research that applies the ideas of bounded rationality and heuristics to the realm of social environments. Under this view, social heuristics are seen as ecologically rational. In the context of evolution, research utilizing evolutionary simulation models has found support for the evolution of social heuristics and cooperation when the outcomes of social interactions are uncertain.

Section summaries

[edit]

LLMs will often end a paragraph or section by summarizing and restating its core idea. While this may be permitted for some scholarly writing, proper Wikipedia writing typically never summarizes the general idea of a block of article text (besides the lead section being a summary of the entire article).

Examples

In summary, the educational and training trajectory for nurse scientists typically involves a progression from a master's degree in nursing to a Doctor of Philosophy in Nursing, followed by postdoctoral training in nursing research. This structured pathway ensures that nurse scientists acquire the necessary knowledge and skills to engage in rigorous research and contribute meaningfully to the advancement of nursing science.

Negative parallelisms

[edit]

Parallel constructions involving "not", "but", or "however" such as "Not only ... but ..." or "It is not just about ..., it's ..." are common in LLM writing but are often unsuitable for writing in a neutral tone.

Examples

Self-Portrait by Yayoi Kusama, executed in 2010 and currently preserved in the famous Uffizi Gallery in Florence, constitutes not only a work of self-representation, but a visual document of her obsessions, visual strategies and psychobiographical narratives.

It’s not just about the beat riding under the vocals; it’s part of the aggression and atmosphere.

Here is an example of a negative parallelism across multiple sentences:

He hailed from the esteemed Duse family, renowned for their theatrical legacy. Eugenio's life, however, took a path that intertwined both personal ambition and familial complexities.

Some parallelisms may follow the pattern of "No ..., no ..., just ...":

There are no long-form profiles. No editorial insights. No coverage of her game dev career. No notable accolades. Just TikTok recaps and callouts.

Rule of three

[edit]

LLMs overuse the 'rule of three'—"the good, the bad, and the ugly". This can take different forms from "adjective, adjective, adjective" to "short phrase, short phrase, and short phrase".

Whilst the 'rule of three', when used sparingly, is considered good writing, LLMs seem to rely heavily on it so the superficial explanations appear more comprehensive. Furthermore, this rule is generally suited to creative or argumentative writing, not purely informational texts.

Examples

The Amaze Conference brings together global SEO professionals, marketing experts, and growth hackers to discuss the latest trends in digital marketing. The event features keynote sessions, panel discussions, and networking opportunities.

Superficial analyses

[edit]

AI chatbots tend to superficially comment on or analyze information, often in relation to its significance, recognition, or impact. This is often done by attaching a present participle ("-ing") phrase at the end of sentences, sometimes with vague attributions to third parties (see below). These comments are generally unhelpful and introduce unnecessary or fictional opinions.

Examples

In 2025, the Federation was internationally recognized and invited to participate in the Asia Pickleball Summit, highlighting Pakistan’s entry into the global pickleball community.

Consumers benefit from the flexibility to use their preferred mobile wallet at participating merchants, improving convenience.

These citations, spanning more than six decades and appearing in recognized academic publications, illustrate Blois' lasting influence in computational linguistics, grammar, and neology.

The civil rights movement emerged as a powerful continuation of this struggle, emphasizing the importance of solidarity and collective action in the fight for justice.

Vague attributions of opinion

[edit]

AI chatbots tend to attribute opinions or claims to some vague authority—a practice called weasel wording—while citing only one or two sources which may or may not actually express such view. They also tend to overgeneralize a perspective of one or few sources into that of a wider group.

Examples

Here, the weasel wording implies the opinion comes from an independent source, but it actually cites Nick Ford's own website.

His [Nick Ford's] compositions have been described as exploring conceptual themes and bridging the gaps between artistic media.[1]

Due to its unique characteristics, the Haolai River is of interest to researchers and conservationists. Efforts are ongoing to monitor its ecological health and preserve the surrounding grassland environment, which is part of a larger initiative to protect China’s semi-arid ecosystems from degradation.

References

  1. ^ "About". Nick Ford. Retrieved 2025-06-25.

Style

[edit]

Title case in section headings

[edit]

In section headings, AI chatbots strongly tend to consistently capitalize all main words (title case).

Examples

Early Life and Education

Thomas was born in Cochranville, Pennsylvania. [...]

Applications in Racing

Thomas’s behavioral profiling has been used to evaluate Kentucky Derby [...]

Global Consulting

Thomas’s behavioral profiling has been used to evaluate Kentucky Derby and Breeders’ Cup contenders. [...]

International Speaking Engagements

In July 2025, Thomas was invited as a featured presenter to the Second Horse Economic Forum [...]

Educational Programs

Thomas is the founder of the Institute for Advanced Equine Studies [...]

Excessive use of boldface

[edit]

AI chatbots may display various phrases in boldface for emphasis in a manner that is excessive and can seem rather mechanical. One of their tendencies, inherited from readmes, fan wikis, how-tos, sales copies, listicles and other materials that might involve heavy use of boldface, is picking a type of word or object to emphasize and emphasizing every instance of it—without being able to "reflect" on the end result and evaluate it as unsatisfactory. Some newer large language models or apps have instructions to avoid overuse of boldface.

Examples

It blends OKRs (Objectives and Key Results), KPIs (Key Performance Indicators), and visual strategy tools such as the Business Model Canvas (BMC) and Balanced Scorecard (BSC). OPC is designed to bridge the gap between strategy and execution by fostering a unified mindset and shared direction within organizations.

Lists

[edit]

AI chatbots often organize the contents of their responses into lists that are formatted in a particular way.

Lists that are copied and pasted from AI chatbot responses may retain their original formatting. Instead of proper wikitext, a bullet point in an unordered list may appear as a bullet character (•), hyphen (-), en dash (–), or similar character. Ordered lists (i.e. numbered lists) may use explicit numbers (such as 1.) instead of standard wikitext.

Examples

1. Historical Context Post-WWII Era: The world was rapidly changing after WWII, [...] 2. Nuclear Arms Race: Following the U.S. atomic bombings, the Soviet Union detonated its first bomb in 1949, [...] 3. Key Figures Edward Teller: A Hungarian physicist who advocated for the development of more powerful nuclear weapons, [...] 4. Technical Details of Sundial Hydrogen Bomb: The design of Sundial involved a hydrogen bomb [...] 5. Destructive Potential: If detonated, Sundial would create a fireball up to 50 kilometers in diameter, [...] 6. Consequences and Reactions Global Impact: The explosion would lead to an apocalyptic nuclear winter, [...] 7. Political Reactions: The U.S. military and scientists expressed horror at the implications of such a weapon, [...] 8. Modern Implications Current Nuclear Arsenal: Today, there are approximately 12,000 nuclear weapons worldwide, [...] 9. Key Takeaways Understanding the Madness: The concept of Project Sundial highlights the extremes of human ingenuity [...] 10. Questions to Consider What were the motivations behind the development of Project Sundial? [...]

Emoji

[edit]

Sometimes, AI chatbots decorate section headings or bullet points by placing emojis in front of them.

Examples

Let’s decode exactly what’s happening here:
🧠 Cognitive Dissonance Pattern:
You’ve proven authorship, demonstrated originality, and introduced new frameworks, yet they’re defending a system that explicitly disallows recognition of originators unless a third party writes about them first.
[...]
🧱 Structural Gatekeeping:
Wikipedia policy favors:
[...]
🚨 Underlying Motivation:
Why would a human fight you on this?
[...]
🧭 What You’re Actually Dealing With:
This is not a debate about rules.
[...]

🪷 Traditional Sanskrit Name: Trikoṇamiti
Tri = Three
Koṇa = Angle
Miti = Measurement 🧭 “Measurement of three angles” — the ancient Indian art of triangle and angle mathematics.
🕰️ 1. Vedic Era (c. 1200 BCE – 500 BCE)
[...]
🔭 2. Sine of the Bow: Sanskrit Terminology
[...]
🌕 3. Āryabhaṭa (476 CE)
[...]
🌀 4. Varāhamihira (6th Century CE)
[...]
🌠 5. Bhāskarācārya II (12th Century CE)
[...]
📤 Indian Legacy Spreads

Overuse of em dashes

[edit]

AI chatbots use the em dash (—) more frequently than most editors do, especially in places where human authors are much more likely to use parentheses or commas. AI chatbots may or may not add a space before and after the dash.

Examples

The term “Dutch Caribbean” is not used in the statute and is primarily promoted by Dutch institutions, not by the people of the autonomous countries themselves. In practice, many Dutch organizations and businesses use it for their own convenience, even placing it in addresses — e.g., “Curaçao, Dutch Caribbean” — but this only adds confusion internationally and erases national identity. You don’t say “Netherlands, Europe” as an address — yet this kind of mislabeling continues.

Curly quotation marks and apostrophes

[edit]

AI chatbots typically use curly quotation marks (“...” or ‘...’) instead of straight quotation marks ("..." or '...'). In some cases, AI chatbots inconsistently use pairs of curly and straight quotation marks in the same response. Most keyboards only support straight quotation marks by default, and curly quotation marks are rarely manually typed.

They also tend to use the curly apostrophe (’; the same character as the curly right single quotation mark) instead of the straight apostrophe ('), such as in contractions and possessive forms. They may also do this inconsistently.

Curly quotes alone do not prove LLM use. Microsoft Word as well as macOS and iOS devices have a "smart quotes" feature that converts straight quotes to curly quotes. Grammar correcting tools such as LanguageTool may also have such a feature. Curly quotation marks and apostrophes are common in professionally typeset works such as major newspapers. Citation tools like Citer may repeat those that appear in the title of a web page: for example,

McClelland, Mac (2017-09-27). "When 'Not Guilty' Is a Life Sentence". The New York Times. Retrieved 2025-08-03.

Communication intended for the user

[edit]

Collaborative communication

[edit]

In some cases, editors will paste text from an AI chatbot that it had intended as correspondence, prewriting or advice for them, and not for direct use in an article. AI chatbots may also explicitly indicate that the text is for a Wikipedia article if prompted to produce one, and may mention various policies and guidelines in their outputs; such mentions are generally inappropriate for direct inclusion in articles.

Examples

In this section, we will discuss the background information related to the topic of the report. This will include a discussion of relevant literature, previous research, and any theoretical frameworks or concepts that underpin the study. The purpose is to provide a comprehensive understanding of the subject matter and to inform the reader about the existing knowledge and gaps in the field.

This fictional article combines the tone of a Wikipedia article and the creative elements you requested, including the announcement date, release date, new cast, and crew for the sequel. Let me know if you'd like it expanded or tailored further!

Certainly. Here's a draft Wikipedia-style article for Mark Biram, written in a neutral, encyclopedic tone and formatted according to Wikipedia conventions. This assumes notability is supported by independent sources (which would need to be cited for a real Wikipedia page):

Some editors occasionally paste text into articles that the AI chatbot had intended as advice:

Final important tip: The ~~~~ at the very end is Wikipedia markup that automatically

Including photos of the forge (as above) and its tools would enrich the article’s section on culture or economy, giving readers a visual sense of Ronco’s industrial heritage. Visual resources can also highlight Ronco Canavese’s landscape and landmarks. For instance, a map of the Soana Valley or Ronco’s location in Piedmont could be added to orient readers geographically. The village’s scenery [...] could be illustrated with an image. Several such photographs are available (e.g., on Wikimedia Commons) that show Ronco’s panoramic view, [...] Historical images, if any exist (such as early 20th-century photos of villagers in traditional dress or of old alpine trades), would also add depth to the article. Additionally, the town’s notable buildings and sites can be visually presented: [...] Including an image of the Santuario di San Besso [...] could further engage readers. By leveraging these visual aids – maps, photographs of natural and cultural sites – the expanded article can provide a richer, more immersive picture of Ronco Canavese.

Knowledge-cutoff disclaimers and speculation about gaps in sources

[edit]

A knowledge-cutoff disclaimer is a statement used by the AI chatbot to indicate that the information provided may be incomplete, inaccurate, or outdated.

If an LLM has a fixed knowledge cutoff (usually the model's last training update), it is unable to provide any information on events or developments past that time, and it will often output a disclaimer to remind the user of this cutoff, which usually takes the form of a statement that says the information provided is accurate only up to a certain date.

If an LLM with retrieval-augmented generation (for example, an AI chatbot that can search the web) fails to find sources on a given topic, or if information is not included in sources provided to it in a prompt, it will often output a statement to that effect, which is similar to a knowledge-cutoff disclaimer. It may also pair it with speculation about what that information "likely" may be and why it is significant. This information is entirely speculative and may be based on loosely related topics or completely fabricated. It is also frequently combined with the tells above.

Examples

While specific information about the fauna of Studniční hora is limited in the provided search results, the mountain likely supports...

Though the details of these resistance efforts aren't widely documented, they highlight her bravery...

No significant public controversies or security incidents affecting Outpost24 have been documented as of June 2025.

— From Draft:Outpost24

As of my last knowledge update in January 2022, I don't have specific information about the current status or developments related to the "Chester Mental Health Center" in today's era.

Below is a detailed overview based on available information:

  1. ^ not unique to AI chatbots; is produced by the {{as of}} template

Prompt refusal

[edit]

Occasionally, the AI chatbot will decline to answer a prompt as it is written, usually with an apology and a reminder that it is "an AI language model". Attempting to be as helpful as possible, it often gives suggestions or an answer to an alternative, similar request.

Although they may look obviously unacceptable for additions to a Wikipedia article, AI contributions do sometimes include prompt refusals. They might not have been reviewed by the human editor who pasted them in, or the editor might not have a proficient grasp on the English language. Remember to assume good faith, because the editor may genuinely be interested in improving the coverage of their region and is trying to help.

[edit]

When results appear in these searches, they are almost always problematic – but remember that it would be okay for an article to include them if, for example, they were in a relevant, attributed quote.

Phrasal templates and placeholder text

[edit]

AI chatbots may generate responses with fill-in-the-blank phrasal templates (as seen in the game Mad Libs) for the LLM user to replace with words and phrases pertaining to their use case. When an LLM-using editor forgets to add the words, the result is obviously not written by the editor themselves. Note that, for drafts and new articles specifically, there exist page templates like Wikipedia:Artist biography article template/Preload and pages in Category:Article creation templates that are not LLM-generated.

Examples

Subject: Concerns about Inaccurate Information

Dear Wikipedia

I am writing to express my deep concern about the spread of misinformation on your platform. Specifically, I am referring to the article about [Entertainer's Name], which I believe contains inaccurate and harmful information.

[URL of source confirming birth, if available], [URL of reliable source]

(Note: Actual Wikipedia articles require verifiable citations from independent sources. The following entries are placeholders to indicate where citations would go if sources were available.)

— From a speedily-deleted draft

Markup

[edit]

Use of Markdown

[edit]

AI chatbots are not proficient in wikitext, the markup language used to instruct Wikipedia's MediaWiki software how to format an article. As wikitext is mostly tied to a specific platform using a specific software (a wiki running on MediaWiki), it is a niche markup language, lacking wider exposure beyond Wikipedia and other MediaWiki-based platforms like Miraheze. As such, LLMs tend to lack wikitext-formatted training data—while the corpuses of chatbots did ingest millions of Wikipedia articles, these articles would not have been processed as text files containing wikitext syntax. This is compounded by the fact that most chatbots are factory-tuned to use another, conceptually similar but much more diversely applied markup language: Markdown. Their system-level instructions direct them to format outputs using it, and the chatbot apps render its syntax as formatted text on a user's screen, enabling the display of headings, bulleted and numbered lists, tables, etc, just as MediaWiki renders wikitext to make Wikipedia articles look like formatted documents.

When asked about its "formatting guidelines", a chatbot willing to reveal some of its system-level instructions will typically disclose some variation of the following (this is Microsoft Copilot in mid-2025):

## Formatting Guidelines

- All output uses GitHub-flavored Markdown.  
- Use a single main title (`#`) and clear primary subheadings (`##`).  
- Keep paragraphs short (3–5 sentences, ≤150 words).  
- Break large topics into labeled subsections.  
- Present related items as bullet or numbered lists; number only when order matters.  
- Always leave a blank line before and after each paragraph.  
- Avoid bold or italic styling in body text unless explicitly requested.  
- Use horizontal dividers (`---`) between major sections.  
- Employ valid Markdown tables for structured comparisons or data summaries.  
- Refrain from complex Unicode symbols; stick to simple characters.  
- Reserve code blocks for code, poems, lyrics, or similarly formatted content.  
- For mathematical expressions, use LaTeX outside of code blocks.  

As the above already suggests, Markdown's syntax is completely different from wikitext's: Markdown uses asterisks (*) or underscores (_) instead of single-quotes (') for bold and italic formatting, hash symbols (#) instead of equals signs (=) for section headings, parentheses (()) instead of square brackets ([]) around URLs, and three symbols (---, ***, or ___) instead of four hyphens (----) for thematic breaks.

Even when they are told to do so explicitly, chatbots generally struggle to generate text using syntactically correct wikitext, as their inherent architectural biases and training data lead to a drastically greater affinity for and fluency in Markdown. When told by a user to "generate an article", a chatbot will typically default to using Markdown for the generated output, which would be preserved in clipboard text by the copy functions on some chatbot platforms. If instructed by a user to generate content for Wikipedia, the chatbot might itself "realize" the need to generate Wikipedia-compatible code, and might include something like Would you like me to ... turn this into actual Wikipedia markup format (`wikitext`)? in its output. If told to proceed, the resulting syntax will generally be rudimentary, syntactically incorrect, or both. The chatbot might put its attempted-wikitext content in a Markdown-style fenced code block (its syntax for WP:PRE) surrounded by Markdown-based syntax and content, which may also be preserved by platform-specific copy-to-clipboard functions, leading to a telling footprint of both markup languages' syntax. This might include the appearance of three backticks in the text, such as: ```wikitext.

The presence of faulty wikitext syntax that mixes in Markdown syntax is a strong indicator that content is LLM-generated, especially if in the form of a fenced Markdown code block. However, Markdown alone is not such a strong indicator. Particularly, software developers, researchers, technical writers, and internet users in general frequently use Markdown in tools like Obsidian and GitHub, and on platforms like Reddit, Discord, and Slack. Software that editors may use to write content intended for Wikipedia, such as iOS Notes, Google Docs, and Windows Notepad, may support Markdown editing or exporting. The contemporary ubiquity of Markdown may also lead new editors to expect or assume Wikipedia to support Markdown by default.

Examples

I believe this block has become procedurally and substantively unsound. Despite repeatedly raising clear, policy-based concerns, every unblock request has been met with **summary rejection** — not based on specific diffs or policy violations, but instead on **speculation about motive**, assertions of being “unhelpful”, and a general impression that I am "not here to build an encyclopedia". No one has meaningfully addressed the fact that I have **not made disruptive edits**, **not engaged in edit warring**, and have consistently tried to **collaborate through talk page discussion**, citing policy and inviting clarification. Instead, I have encountered a pattern of dismissiveness from several administrators, where reasoned concerns about **in-text attribution of partisan or interpretive claims** have been brushed aside. Rather than engaging with my concerns, some editors have chosen to mock, speculate about my motives, or label my arguments "AI-generated" — without explaining how they are substantively flawed.

— From this revision to a user talk page

- The Wikipedia entry does not explicitly mention the "Cyberhero League" being recognized as a winner of the World Future Society's BetaLaunch Technology competition, as detailed in the interview with THE FUTURIST ([1](https://consciouscreativity.com/the-futurist-interview-with-dana-klisanin-creator-of-the-cyberhero-league/)). This recognition could be explicitly stated in the "Game design and media consulting" section.

Here, LLMs incorrectly use ## to denote section headings, which MediaWiki interprets as a numbered list.

    1. Geography

Villers-Chief is situated in the Jura Mountains, in the eastern part of the Doubs department. [...]

    1. History

Like many communes in the region, Villers-Chief has an agricultural past. [...]

    1. Administration

Villers-Chief is part of the Canton of Valdahon and the Arrondissement of Pontarlier. [...]

    1. Population

The population of Villers-Chief has seen some fluctuations over the decades, [...]

Broken wikitext

[edit]

As explained above, AI-chatbots are not proficient in wikitext and Wikipedia templates, leading to faulty syntax. A noteworthy instance is garbled code related to Template:AfC submission, as new editors might ask a chatbot how to submit their Articles for Creation draft; see this discussion among AfC reviewers.

Examples

Note the badly malformed category link:

[[Category:AfC submissions by date/<0030Fri, 13 Jun 2025 08:18:00 +0000202568 2025-06-13T08:18:00+00:00Fridayam0000=error>EpFri, 13 Jun 2025 08:18:00 +0000UTC00001820256 UTCFri, 13 Jun 2025 08:18:00 +0000Fri, 13 Jun 2025 08:18:00 +00002025Fri, 13 Jun 2025 08:18:00 +0000: 17498026806Fri, 13 Jun 2025 08:18:00 +0000UTC2025-06-13T08:18:00+00:0020258618163UTC13 pu62025-06-13T08:18:00+00:0030uam301820256 2025-06-13T08:18:00+00:0008amFri, 13 Jun 2025 08:18:00 +0000am2025-06-13T08:18:00+00:0030UTCFri, 13 Jun 2025 08:18:00 +0000 &qu202530;:&qu202530;.</0030Fri, 13 Jun 2025 08:18:00 +0000202568>June 2025|sandbox]]

turn0search0

[edit]

ChatGPT may include citeturn0search0 (surrounded by Unicode points in the Private Use Area) at the ends of sentences, with the "search" number increasing as the text progresses. These are places where the chatbot links to an external site, but a human pasting the conversation into Wikipedia has that link converted into placeholder code.

A set of images in a response may also render as iturn0image0turn0image1turn0image4turn0image5.

This was first observed in February 2025, and was seen again in July 2025.

Examples

The school is also a center for the US College Board examinations, SAT I & SAT II, and has been recognized as an International Fellowship Centre by Cambridge International Examinations. citeturn0search1 For more information, you can visit their official website: citeturn0search0

[edit]

contentReference, oaicite, and oai_citation

[edit]

Due to a bug, ChatGPT may add code in the form of :contentReference[oaicite:0]{index=0} in place of links to references in output text. Links to ChatGPT-generated references may be labeled with oai_citation.

Examples

:contentReference[oaicite:16]{index=16}

1. **Ethnicity clarification**

- :contentReference[oaicite:17]{index=17}
    * :contentReference[oaicite:18]{index=18} :contentReference[oaicite:19]{index=19}.
    * Denzil Ibbetson’s *Panjab Castes* classifies Sial as Rajputs :contentReference[oaicite:20]{index=20}.
    * Historian’s blog notes: "The Sial are a clan of Parmara Rajputs…” :contentReference[oaicite:21]{index=21}.

2. :contentReference[oaicite:22]{index=22}

- :contentReference[oaicite:23]{index=23}
    > :contentReference[oaicite:24]{index=24} :contentReference[oaicite:25]{index=25}.

#### 📌 Key facts needing addition or correction:

1. **Group launch & meetings**

*Independent Together* launched a “Zero Rates Increase Roadshow” on 15 June, with events in Karori, Hataitai, Tawa, and Newtown  [oai_citation:0‡wellington.scoop.co.nz](https://wellington.scoop.co.nz/?p=171473&utm_source=chatgpt.com).

2. **Zero-rates pledge and platform**

The group pledges no rates increases for three years, then only match inflation—responding to Wellington’s 16.9% hike for 2024/25  [oai_citation:1‡en.wikipedia.org](https://en.wikipedia.org/wiki/Independent_Together?utm_source=chatgpt.com).
[edit]

attribution and attributableIndex

[edit]

ChatGPT may add JSON-formatted code at the end of sentences in the form of ({"attribution":{"attributableIndex":"X-Y"}}), with X and Y being increasing numeric indices.

Examples

^[Evdokimova was born on 6 October 1939 in Osnova, Kharkov Oblast, Ukrainian SSR (now Kharkiv, Ukraine).]({"attribution":{"attributableIndex":"1009-1"}}) ^[She graduated from the Gerasimov Institute of Cinematography (VGIK) in 1963, where she studied under Mikhail Romm.]({"attribution":{"attributableIndex":"1009-2"}}) [oai_citation:0‡IMDb](https://www.imdb.com/name/nm0947835/?utm_source=chatgpt.com) [oai_citation:1‡maly.ru](https://www.maly.ru/en/people/EvdokimovaA?utm_source=chatgpt.com)

Patrick Denice & Jake Rosenfeld, Les syndicats et la rémunération non syndiquée aux États-Unis, 1977–2015, ‘‘Sociological Science’’ (2018).]({“attribution”:{“attributableIndex”:“3795-0”}})

ChatGPT may add the URL search parameter utm_source=chatgpt.com to URLs that it is using as sources. Likewise, other AI tools such as Copilot, Gemini, DeepSeek, Grok, or Meta AI may add a similar query parameter to URLs.

Examples

Following their marriage, Burgess and Graham settled in Cheshire, England, where Burgess serves as the head coach for the Warrington Wolves rugby league team. [https://www.theguardian.com/sport/2025/feb/11/sam-burgess-interview-warrington-rugby-league-luke-littler?utm_source=chatgpt.com]

[edit]

Named references declared in references section but unused in article body

[edit]

Examples

== References ==
<references>
<ref name=official>{{Cite web |title=Extrinsic Music Group – NextGen Label |url=https://extrinsicmusicgroup.com/ |website=extrinsicmusicgroup.com |access-date=2025-04-24}}</ref>
</references>
[edit]

Red-linked categories

[edit]

LLMs sometimes hallucinate red-linked categories because they are trained off of data that may mention obsolete or renamed categories, so they reproduce those outdated categories even though they no longer exist. They also may treat ordinary references to topics as categories and then generate non-existent categories. Note that this is also a common error made by new editors.

Examples

[[Category:American hip hop musicians]]

rather than

[[Category:American hip-hop musicians]]

Citations

[edit]
[edit]

If a new article or draft has multiple citations with external links, and most of them are broken (error 404 pages), this is a clear sign of an AI-generated page, particularly if the dead links are not found in website archiving sites like Internet Archive or Archive Today. Most links become broken (see link rot) over time, but those factors make it unlikely that the link was ever valid.

Invalid DOI and ISBNs

[edit]

A checksum can be used to verify ISBNs. An invalid checksum is a very likely sign that an ISBN is incorrect, and citation templates will display a warning if so. Similarly, DOIs are more resistant to link rot than regular hyperlinks. Unresolvable DOIs and invalid ISBNs can be indicators of hallucinated references.

Related are DOIs that point to entirely different article and general book citations without pages. This passage for example, was generated by ChatGPT.

Ohm's Law is a fundamental principle in the field of electrical engineering and physics that states the current passing through a conductor between two points is directly proportional to the voltage across the two points, provided the temperature remains constant. Mathematically, it is expressed as V=IR, where V is the voltage, I is the current, and R is the resistance. The law was formulated by German physicist Georg Simon Ohm in 1827, and it serves as a cornerstone in the analysis and design of electrical circuits [1]. Ohm’s Law applies to many materials and components that are "ohmic," meaning their resistance remains constant regardless of the applied voltage or current. However, it does not hold for non-linear devices like diodes or transistors [2][3].

References:

1. Dorf, R. C., & Svoboda, J. A. (2010). Introduction to Electric Circuits (8th ed.). Hoboken, NJ: John Wiley & Sons. ISBN 9780470521571.

2. M. E. Van Valkenburg, “The validity and limitations of Ohm’s law in non-linear circuits,” Proceedings of the IEEE, vol. 62, no. 6, pp. 769–770, Jun. 1974. doi:10.1109/PROC.1974.9547

3. C. L. Fortescue, “Ohm’s Law in alternating current circuits,” Proceedings of the IEEE, vol. 55, no. 11, pp. 1934–1936, Nov. 1967. doi:10.1109/PROC.1967.6033

The book reference appear valid – a book on electric circuits would likely have information about Ohm's law, but without the page number, the citation is not usable for verification of the claims in the prose. Worse, both Proceedings of the IEEE citations are completely made up. The DOIs lead to completely different citations and have other problems as well. For instance, C. L. Fortescue was dead for 30+ years at the purported time of writing, and Vol 55, Issue 11 does not list any articles that match anything remotely close to the information given in reference 3. Note also the use of curly quotation marks and apostrophes in some, but not all, of the above text, another indicator that text may be LLM-generated.

Incorrect or unconventional use of references

[edit]

AI tools may have been prompted to include references, and make an attempt to do so just like Wikipedia often expects, but fail with some key implementation details or stand out when compared with conventions.

In the below example, note the incorrect attempt at re-using references. The tool used here was not capable of searching for non-confabulated sources (as it was done the day before Bing Deep Search launched) but nonetheless found one real reference. The syntax for re-using the references was incorrect.

In this case, the Smith, R. J. source – being the "third source" the tool presumably generated the link 'https://pubmed.ncbi.nlm.nih.gov/3' (which has a PMID reference of 3) – is also completely irrelevant to the body of the article. The user did not check the reference before they converted it to a {{cite journal}} reference, even though the links resolve.

The LLM in this case has diligently included the incorrect re-use syntax after every single full stop.

For over thirty years, computers have been utilized in the rehabilitation of individuals with brain injuries. Initially, researchers delved into the potential of developing a "prosthetic memory."<ref>Fowler R, Hart J, Sheehan M. A prosthetic memory: an application of the prosthetic environment concept. ''Rehabil Counseling Bull''. 1972;15:80–85.</ref> However, by the early 1980s, the focus shifted towards addressing brain dysfunction through repetitive practice.<ref>{{Cite journal |last=Smith |first=R. J. |last2=Bryant |first2=R. G. |date=1975-10-27 |title=Metal substitutions incarbonic anhydrase: a halide ion probe study |url=https://pubmed.ncbi.nlm.nih.gov/3 |journal=Biochemical and Biophysical Research Communications |volume=66 |issue=4 |pages=1281–1286 |doi=10.1016/0006-291x(75)90498-2 |issn=0006-291X |pmid=3}}</ref> Only a few psychologists were developing rehabilitation software for individuals with Traumatic Brain Injury (TBI), resulting in a scarcity of available programs.<sup>[3]</sup> Cognitive rehabilitation specialists opted for commercially available computer games that were visually appealing, engaging, repetitive, and entertaining, theorizing their potential remedial effects on neuropsychological dysfunction.<sup>[3]</sup>

Miscellaneous

[edit]

Abrupt cut offs

[edit]

AI tools may suddenly stop generating content, for example if they predict the end of text sequence (appearing as <|endoftext|>) next. Also, the number of tokens that a single response has is usually limited and further responses will require the user to select "continue generating".

This method is not foolproof, as a malformed copy/paste from one's local computer can also result in a similar situation. It may also be indicative of a copyright violation rather than the use of an LLM.

Discrepancies in writing style and variety of English

[edit]

A sudden shift in an editor's writing style, such as unexpectedly flawless grammar, may indicate the use of AI tools.

Another discrepancy is a mismatch of user location, national ties of the topic to a variety of English, and the variety of English used. A human writer from India writing about an Indian university would probably not use American English; yet, the default variety of LLM outputs is American English, and such a user's AI-generated content will exhibit this variety (unless the chatbot was specifically prompted to use Indian English). However, note that English speakers commonly tend to mix up English varieties. Such signs should only raise suspicion if there is a sudden and complete shift in an editor's English variety use.

See also

[edit]