Framasoft @Framasoft

0 message0 participant0 message aujourd’hui

**Harald Sack** @lysander07@sigmoid.social · 6 j

Harald Sack @lysander07@sigmoid.social

Interesting (short) paper of game-based training and evaluation of agentic behaviour in LLMs: Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, Cheston Tan.: "Textarena"

https://arxiv.org/html/2504.11442v1

TextArena Soft-skill comparison. Frontier models and Humanity are compared across ten key skills. Each skill is normalised separately for presentation.

The image shows a radar chart comparing the capabilities of different AI models in various areas.
The image is an octagon radar chart that compares the performance of four different AI models (Humanity, claude-3.5-sonnet, qwen-plus, and gpt-40) in eight different categories: logical reasoning, memory recall, bluff detection, theory of mind, pattern recognition, spatial reasoning, strategic planning, and adaptability. Each category is represented as an axis, and the performance of each model is represented by a point on the axis, which is connected by a line to the points in adjacent categories. The axes are labeled with the categories. The models are represented by different colors (orange, purple, turquoise, and dark red), with the legend at the bottom assigning each model to the corresponding color. The areas between the lines create a polygon whose shape and size visualize the respective performance of the model in the individual areas. The overall atmosphere of the image is factual and informative, with a clear and easy-to-understand presentation of the data. The color scheme is neutral and contributes to the clarity of the image.

#llms #AI #generativeai

**Jean-Sébastien Barboteu** @jsb@mastodon.mim-libre.fr · 11 avr.

11 avr.

Jean-Sébastien Barboteu @jsb@mastodon.mim-libre.fr

Une ⁠belle ressource qui permet de donner des repères sur les nouvelles modalités d'#évaluation des écrits des élèves. #algorithmes @canotech #education vidéo 4 minutes https://www.canotech.fr/a/37903/comment-evaluer-une-production-decrits-assistee-par-une-ia

www.canotech.frLa formation continue des enseignants - CanoTechDiversifiez les apprentissages des élèves du primaire et du secondaire grâce à nos modules de formation continue pour enseignants et équipes éducatives

**Carthage** @Tunisia@turath.tn · 10 avr. *

10 avr. *

Carthage @Tunisia@turath.tn

Système d’évaluation individuelle des agents de la fonction #publique en Tunisie

https://chroniques.tn/2025/04/appui-a-linstauration-dun-nouveau-systeme-devaluation-individuelle-des-agents-de-la-fonction-publique-en-tunisie/

> Une plateforme #digitale dédiée au processus d’ #évaluation est par ailleurs en cours de développement, illustrant la volonté d’intégrer pleinement les outils #numériques dans la #gestion des ressources humaines.

ChroniquesAppui à l’instauration d’un nouveau système d’évaluation individuelle des agents de la fonction publique en Tunisie - ChroniquesLa Tunisie franchit une nouvelle étape dans la modernisation de son administration publique grâce à la mise en place d’un nouveau système d’évaluation individuelle des agents de la fonction publique, une réforme structurante portée dans le cadre du projet « Appui à la modernisation du système de formation et d’évaluation…

#News #Gouvernent #Allemagne

**Jan R. Boehnke** @jrboehnke@mastodon.social · 10 avr.

10 avr.

Jan R. Boehnke @jrboehnke@mastodon.social

Mirjam Stieger (Lucerne Uni) and I are invited to present
"Contemporary #Evaluation of Interventions: Mobile, Digital, and Pragmatic"
at @unibern
https://www.bbs.unibe.ch/training/summer_course/index_eng.html

This is the annual Summer Course of the Doctoral Program Brain and Behavioral Sciences, and comprises a mixture of keynotes, masterclasses, hidden curriculum etc.

I am very much looking forward to it, and also very honoured to be invited once again to Bern to train #ECRs!

Doctoral Program Brain and Behavioral SciencesSummer Course 2025

#RCT #EvaluationResearch #ResearchDesign

**tante** @tante@tante.cc · 9 avr. *

9 avr. *

tante @tante@tante.cc

On correct but wrong responses

For a research project I am currently evaluating all kinds of generative AI models (mostly for visual artifacts but some text based ones as well). There also is somewhat of a push at my employer to use those systems more because of "efficiency". So we all know that LLMs fabricate facts, meaning: They produce text that is factually untrue. Happens a lot, those so-called hallucinations are a structural property of those kinds of systems. But I kept wondering about something else that I keep […]

https://tante.cc/2025/04/09/on-correct-but-wrong-responses/

Screenshot of a gemma3 based chatbot responding to the prompt "Write me a function in Python that gets a list of items and returns the sum of those items" with a manual implementation of a function doing that

#ai #correct #evaluation

**Steven Carneiro** @mischiefist@vivaldi.net · 4 avr.

4 avr.

Steven Carneiro @mischiefist@vivaldi.net

Scale Evaluation (for AI):
#ai #llm #machinelearning #evaluation #futurism

https://www.wired.com/story/this-tool-probes-frontier-ai-models-for-lapses-in-intelligence/

WIRED · 2 avr.This Tool Probes Frontier AI Models for Lapses in IntelligencePar Will Knight

**gmcs** @gmcs2@piaille.fr · 3 avr.

3 avr.

gmcs @gmcs2@piaille.fr

via @TheMetaNews Avis de tempête sur les labos ?
Les plus de 2000 répondants au baromètre CPESR/TheMetaNews dépeignent un tableau plus que contrasté de leur métier et des institutions, entre passion et "quiet quitting".
https://themeta.news/avis-de-tempete-sur-les-labos/
« On nous empêche de fonctionner et ensuite on nous reproche de dysfonctionner »
#ESR #financements #evaluation

TheMetaNews · 2 avr.Avis de tempête sur les labos ? — TheMetaNewsL’heure est au bilan. Si les universités et organismes de recherche sont nombreux à dénoncer des conditions budgétaires dégradées, quels en sont les impacts sur les conditions de travail et le ressenti des personnels au sens large de l’enseignement supérieur et de la recherche ? Il y a quelques semaines, le collectif Conférence des praticiennes […]

**IRRJ** @IRRJ@sigmoid.social · 2 avr.

2 avr.

IRRJ @IRRJ@sigmoid.social

Published at #IRRJ: "Don't Use LLMs to Make Relevance Judgments" by Ian Soboroff. #evaluation #relevance #llm https://doi.org/10.54195/irrj.19625

doi.orgDon't Use LLMs to Make Relevance Judgments | Information Retrieval Research

**Réseau Canopé** @reseau_canope@tube.reseau-canope.fr · 26 mars

26 mars

Réseau Canopé @reseau_canope@tube.reseau-canope.fr

Utiliser les IA génératives pour concevoir des séquences pédagogiques

https://tube.reseau-canope.fr/videos/watch/fc57c133-100a-4915-98d3-3ad591b47609

PeerTubeUtiliser les IA génératives pour concevoir des séquences pédagogiquesPar Pix+Édu

#competencesnumeriques #evaluation #ia

**idw-Team** @team@idw-online.social · 21 mars

21 mars

idw-Team @team@idw-online.social

Ein kleiner Eindruck von der idw-Jubiläumsveranstaltung am 13. und 14. März in der @BBAW in Berlin.

Auf dem Festakt gab es mehrere Grußworte, ein #KI-Vortrag von Prof. Katharina Zweig und die Verleihung des idw-Preis für #wisskomm: https://idw-online.de/de/news848988
Nach der #idwMV25 haben wir eine spannende Arbeitstagung zu den Themen #socialmedia #evaluation und Pressemitteilung als Multimediapaket durchgeführt.

Es hat uns große Freude gemacht mit euch zu feiern!

Fotos von Judith Affolter

Collage von Bildern von der idw-Jubiläumsveranstaltung, von oben links nach
unten rechts: idw-Pralinen, Dr. Svenja Niescken, Mike Zeits/Nicola Kuhrt,
Post-its aus dem Workshop, Ulf Richter auf dem Podium, zwei Bilder
aus dem Plenum, Leonie Wruck-Albrecht im Workshop, Svenja Niescken/
Cordula Kleidt vor der idw-Wand, Grafik mit Monika Dechene aus dem
Workshop, Prof. Katharina Zweig bei ihrem KI-Vortrag, Musik von
BALSAMICO mit Sänger Peter Saueressig

**Dining & Cooking** @dnc@vive.im · 17 mars

17 mars

Dining & Cooking @dnc@vive.im

Development of a tool to assess the compliance of cafeteria menus with the Mediterranean Diet | BMC Nutrition https://www.diningandcooking.com/1960084/development-of-a-tool-to-assess-the-compliance-of-cafeteria-menus-with-the-mediterranean-diet-bmc-nutrition/ #ClinicalNutrition #Evaluation #HealthPromotionAndDiseasePrevention #index #Mediterranean #MediterraneanDiet #MediterraneanFood #menus #PublicHealth

**Nicole Hennig** @nic221.bsky.social@bsky.brid.gy · 14 mars

14 mars

Nicole Hennig @nic221.bsky.social@bsky.brid.gy

Rebooting AI from the Ground Up | SXSW LIVE www.youtube.com/live/91I7AGb... (excellent talk by Dr. Rumman Chowdhury) #AI #ResponsibleAI #bias #jobs #evaluation #geopolitics

Rebooting AI from the Ground U...

YouTubeRebooting AI from the Ground Up | SXSW LIVEPar SXSW

**Kaa** @kfort@sciences.re · 14 mars

14 mars

Kaa @kfort@sciences.re

J'ai fait une intervention au siège du CNRS mercredi sur les biais et plus généralement les pbs d'évaluation des LLMs.
Pour celles et ceux que ça intéresse, les diapos sont ici :
https://members.loria.fr/KFort/files/fichiers_cours/KarenFort_LLMEvaluation.pdf
#LLM #evaluation #IA #nlp #tal

**WIST Quotations** @wist@my-place.social · 12 mars

12 mars

WIST Quotations @wist@my-place.social

A quotation from Montaigne

We readily inquire, “Does he know Greek or Latin?” “Can he write poetry and prose?” But what matters most is what we put last: “Has he become better and wiser?” We ought to find out not who understands most but who understands best. We work merely to fill the memory, leaving the understanding and the sense of right and wrong empty.

[Nous enquerons volontiers, Sçait-il du Grec ou du Latin ? escrit-il en vers ou en prose ? mais, s’il est devenu meilleur ou plus advisé, c’estoit le principal, & c’est ce qui demeure derriere. Il falloit s’enquerir qui est mieux sçavant, non qui est plus sçavant. Nous ne travaillons qu’à remplir la memoire, & laissons l’entendement & la conscience vuide.]

Michel de Montaigne (1533-1592) French essayist
Essay (yyyy), “Of Pedantry[Du pedantisme] (1572-1578), Essays, Book 1, ch. 24 (1.24) (1595) [tr. Screech (1987), ch. 25]

Sourcing, notes, alternate translations: wist.info/montaigne-michel-de/…

#quote #quotes #quotation

Suite du fil

**Félicien Breton ⏚** @breton@eldritch.cafe · 10 mars

10 mars

Félicien Breton ⏚ @breton@eldritch.cafe

"Trier, évaluer, la fabrique du mineur non accompagné" https://www.radiofrance.fr/franceculture/podcasts/lsd-la-serie-documentaire/trier-evaluer-la-fabrique-du-mineur-non-accompagne-3661421

France CultureTrier, évaluer, la fabrique du mineur non accompagné : épisode 2/4 du podcast Jeunesses africaines en exilAUDIO • Jeunesses africaines en exil, épisode 2/4 : Trier, évaluer, la fabrique du mineur non accompagné. Une série inédite proposée par France Culture. Écoutez LSD, la série documentaire, et découvrez nos podcasts en ligne.

#faireSemblant #métier #travail

**DionyZack** @DionyZack@piaille.fr · 8 mars

8 mars

DionyZack @DionyZack@piaille.fr

Revolut°Permanente Tést https://www.revolutionpermanente.fr/Test-36104?utm_source=dlvr.it&utm_medium=mastodon RP #Test #Évaluation #Apprentissage #Éducation #Savoir

Revolution Permanente · 8 marsTéstPar Wolfgang Mandelbaum

**Christof Schöch** @christof@fedihum.org · 3 mars *

3 mars *

Christof Schöch @christof@fedihum.org

Bin dann auch bald auf dem Weg zur #DHd2025, die Vorfreude steigt!

Für das @tcdh haben wir hier mal notiert, was wir alles machen: https://tcdh.uni-trier.de/de/event/das-tcdh-bei-der-dhd2025-bielefeld

Kommt gerne vorbei! Es geht um #Evaluation von #Keyness-Maßen, #Shakespeare #Editionen, #LLMs und #SetFit für #Metaphern, #Vokabulare für #LOD und #Forschungsdaten für die DH-#Wissenschaftsgeschichte!

U.a. mit @ClaudiaBamberg, @cnDuKeli Julia Dudar, @moulin, Marina Spielberg, @MariaHinzmann, Julia Röttgermann.

tcdh.uni-trier.deDas TCDH bei der DHd2025 in Bielefeld | KOMPETENZZENTRUM - TRIER CENTER FOR DIGITAL HUMANITIESTagung, 03.03.2025 - Under Constructions. Geisteswissenschaften und Data Humanities Die 11. Tagung des Verbands „Digital Humanities im deutschsprachigen Raum“ findet vom 3. bis 7. März an der Universität Bielefeld statt. Auch einige Mitarbeitende des TCDH sind bei der Tagung, die unter dem Leitthema „Under Constructions. Geisteswissenschaften und Data Humanities“ steht, vor Ort.

**Aude Caussarieu** @AudeCaussarieu@sciences.re · 24 févr. *

24 févr. *

Aude Caussarieu @AudeCaussarieu@sciences.re

#Introduction #IntroductionFr

Après 3 ans sur mastodon, je change d'instance pour me créer un vrai compte public pour parler de boulot et de biblio !

Didactique des sciences : j'ai passé pas mal de temps dans le monde universitaire, maintenant je vole de mes propres ailes.

Les sujets qui m'intéressent en particulier : #Maths4Sciences, #MesureEtIncertitudes, #QCM, #Evaluation et le rôle de la formation dans la transformation des comportements.

À titre perso, j'ai quelques compagnons à 4 pattes (,, ; ) et j'ai vraiment besoin de passer du temps avec elleux !

J'essaie de m'approprier les outils libres : #inkscape, #Python, #R, #Libreoffice, #Zaclys, #Twine, #Scenari, #Framasoft ... Mais je sais toujours rien faire avec des lignes de commande... #GeekEnCarton

**Rod2ik** @rod2ik@mastodon.social · 19 févr.

19 févr.

Rod2ik @rod2ik@mastodon.social

#Évaluation des #licences et #masters : « C’est une #attaque #politique contre l’ #université de #masse », dénonce le #chercheur Stéphane Bonnéry

https://www.humanite.fr/societe/acces-a-leducation/evaluation-des-licences-et-masters-cest-une-attaque-politique-contre-luniversite-de-masse-denonce-le-chercheur-stephane-bonnery

L'Humanité · 16 févr.Évaluation des licences et masters : « C’est une attaque politique contre l’université de masse », dénonce le chercheur Stéphane BonnéryPar Camille Bauer

**Wissenschaft im Dialog** @wissenschaftimdialog@wisskomm.social · 19 févr.

19 févr.

Wissenschaft im Dialog @wissenschaftimdialog@wisskomm.social

Morgen von 12-13 Uhr stellt die Impact Unit in einer digitalen Infoveranstaltung das kostenlose Multiplikator*innen-Programm “Wissen, was wirkt!” zur Evaluation der Wisskomm vor. Ihr könnt dem Team eure Fragen stellen und erfahren, ob das Programm für euch passt. Zur Anmeldung:
https://eveeno.com/infoveranstaltung_wissenwaswirkt

#Evaluation #Wisskomm #Wissenschaft

Recherches récentes

Options de recherche

Administré par :

Statistiques du serveur :

#evaluation