Methodological Challenges

Find further details of each talk in the Book of Abstracts here.

Those marked with are eligible for nomination to a student researcher award. Find the full list of awards here.

You are welcome to use the comment function at the bottom of the page to comment on papers you have seen and/or submit questions that you would like to see raised in the discussion panel. If replying to an individual paper, please specify who you are talking to.

Panel chaired by Anna Marchi (@journolinguist).

Developing a complex query to build a specialised corpus: Reducing the issue of polysemous query terms. ★

Daniel Malone Edge Hill University

[long paper]

A methodological proposal to realize a Systemic Functional Linguistics exam through Corpus Linguistics: comparing the textual strategies of the political discourse in English and Spanish

Virginia Mattioli Pontificia Universidad Católica de Valparaíso

virginia.mattioli@pucv.cl
https://pucv.academia.edu/VirginiaMattioli

[long paper]

Analysing Intersectionality in Discourse: A Corpus-informed Methodology

Zainah Alshahrani & Michael HandfordCardiff University

AlshahraniZT@cardiff.ac.uk
@ZainahGhareeb

[long paper]

This image has an empty alt attribute; its file name is Screenshot-2020-06-03-at-15.17.47-1.png

Categorising keywords in discourse – a case study of texts on bacterial resistance ★

Natalie Dyke, Joachimm Peters & Stefan EvertFAU Erlangen-Nürnberg

[long paper]

This image has an empty alt attribute; its file name is Screenshot-2020-06-03-at-15.17.47-1.png

Combining Collocational Analysis and Semantic Prosodies in a Large-Scale Corpus Study of Metaphor

Stefanie Ullmann University of Cambridge

[long paper]

[withdrawn]

This image has an empty alt attribute; its file name is Screenshot-2020-06-03-at-15.17.47-1.png

How critical is critical discourse analysis of advertising: Corpus-driven findings and methodological implications ★

Yixiong Chen & Csilla WeningerNational Institute of Education, Nanyang Technological University

chen023@e.ntu.edu.sg

[long paper]

This image has an empty alt attribute; its file name is Screenshot-2020-06-03-at-15.17.47-1.png

Pragmatic annotation for digital discourse analysis – value, quality criteria, further development

Marcus Müller & Michael BenderTU Darmstadt, Germany

marcus.mueller@tu-darmstadt.de
@marcus_dislab
@MimoBender

https://www.linglit.tu-darmstadt.de/institutlinglit/mitarbeitende/marcusmueller/index.en.jsp

[long paper]

This image has an empty alt attribute; its file name is Screenshot-2020-06-03-at-15.17.47-1.png

Revisiting key-key-words: proposing a method for identifying unique keywords in a collection of corpora ★

Mark McGlashanBirmingham City University
Alexandra KrendelLancaster University

Mark.McGlashan@bcu.ac.uk
@Mark_McGlashan
http://www.MarkMcGlashan.org

a.krendel@lancaster.ac.uk
@ALexiconArtist

[long paper]

This image has an empty alt attribute; its file name is Screenshot-2020-06-03-at-15.17.47-1.png

‘YOT Talk’: Analysing discourse in youth justice assessment interviews.

Ralph Morton Loughborough University

ralph.morton@bcu.ac.uk
@ralphmortonlang
https://www.bcu.ac.uk/english/staff/ralph-morton

[long paper]

22 thoughts on “Methodological Challenges

  1. Veronika Koller June 16, 2020 — 9:03 pm

    Question for Alshahrani & Handford: Can you please explain what multi-keywords are? Thank you.

    Like

    1. Zaina Alshahrani June 17, 2020 — 3:42 am

      Many thanks Veronika for your question.
      Multi-key words are produced by a corpus method in Sketch Engine tool. It displays clusters of two words or more that seem to be outstanding or frequent in the target corpus when compared to a general reference one. These clusters are either significantly key or culturally key. In my small specialized corpus, they are of the latter type ‘culturally multi-keywords’.
      I hope this answers your question.

      Like

      1. @Zaina Alshahrani: So, basically, key n-grams, yes?

        Like

      2. Nicholas Groom June 17, 2020 — 8:24 am

        @Zaina Alshahrani: So, basically, ‘multi-keywords’ are key n-grams, yes?

        Like

      3. Yes, I think Key is the key in the two names. Different naming in different corpus tools.

        Like

  2. All the presentations are really interesting and thought provoking. My question is addressed to Al shahrani and Handford: How did you identify the intersectional systems in the key word and multi-key word analysis?
    Thank you very much.

    Like

    1. Zaina Alshahrani June 17, 2020 — 9:45 am

      Mainly, the Multi- keywords method exposed the intersectionality of the systems, e.g, “Muslim Country”. But before that we looked at the frequency lists and the single keywords to pinpoint the main single systems: “Nationality”, “Religion” and “Gender”.
      Hope this is clear.

      Like

      1. Malak Al Sharif June 17, 2020 — 12:13 pm

        Thank you so much Zaina that was clear enough.

        Like

  3. Thanks Marcus and Michael for the interesting presentation on pragmatic annotations. Could you give more information on how you could “assess” the homogeneity of the input? I was thinking about the work we did on Belgian French-speaking politicians (we’re in the Verbal aggression in politics panel tomorrow), where we did a lot of manual annotation 😉

    Like

    1. By homogeneity of the input we mean here that the ML framework learns better if the annotation categories / labels are assigned to segments with preferably similar forms (in terms of syntactic structure).
      This means that it is problematic if, for example, a category such as “positive evaluation” in an annotation scheme can be assigned to both complete sentences and elliptical or fragmentary backchannel utterances in spoken language transcripts. In this case, it is better to assign a label for evaluations in more complex, complete sentences and one for evaluations in backchannel-behavior.
      For recurrent networks, the (stylistic, structural) homogeneity of the co-text plays a role as well. Corpora with rather uniform text forms are more suitable than corpora with very different ones. In our work, the most important aspect to assess linguistic homogenity für ML has been syntactic structure in relation to the labels. It is therefore more a question of operationalisation in the annotation scheme in terms of a form-label-relation that is as consistent as possible.

      Like

  4. Irene (Stockholm uni) June 17, 2020 — 10:47 am

    @Yixiong Chen & Csilla Weninger: Thank you for your presentation. Critical studies should stand put to critique 🙂 I would appreciate to get your references, please. I do believe that you are right in that collaborations with totally different fields are a good thing to be able to explain our findings. That is probably also why few of your studied texts did so.
    A question: Did you check how many of the papers without an “explanation 2” that had one author, and how many had two or more?

    Like

  5. For Mark and Alexandra: What a clear and helpful presentation, thank you! You mentioned that the complement keywords can offer an insight to the saliency of key-key-words – do you have a particular method for this? I am imagining something like comparing the number and frequency of Complements against the number and frequency of Key-Keys to determine whether subcorpora are more distinct from/similar to each other, and therefore how salient the Key-Keys are as a feature… Am I on the right track?

    Like

    1. And a second question…! What might be the differences between Complement Keywords as set out in your presentation and the Keywords that would appear for each subcorpus when using the whole Target corpus as Reference corpus?

      Like

  6. @Natalie Dyke, Joachimm Peters & Stefan Evert:
    Great presentation, and a very useful and important study. It is very heartening to see that you found a convergence between manual and automatic analyses, as this may save us all a lot of work in the future!
    I have a simple question about your claim that “What researchers usually do is to group the keywords according to categories that rely on their prior familiarity with the discourse topic” (0:50).
    My question is: is this true? Is this what most keywords researchers actually do? If so, that is very worrying as it suggests that most researchers are simply doing it wrong. If the aim is to put keywords into qualitative categories, these categories should be grouped into categories generated inductively through a careful and detailed manual inspection of concordance lines. That is, the categories should represent how the keyword is used in the research corpus in question, and not prior assumptions about general usage. Keywords should never be interpreted as disembodied lists of words out of context.

    Like

    1. Sorry, typo!
      *If the aim is to put keywords into qualitative categories, these categories should be grouped …”
      Should say:
      “If the aim is to put keywords into qualitative categories, these categories should be generated …”

      Like

    2. HI, I’m not sure if others are supposed to jump in, but I thought I would try to re-create the coordinator conversations at conferences.

      I don’t necessarily disagree with you, but I think the problem comes in when you determine the categories in the first place. Categories necessarily occur on a continuum of abstractions (at the least abstract, you probably have one category for each concordance line). I don’t know if determining the level of abstraction of a given category for an analysis can be determined by looking concordance lines. This doesn’t mean that you ignore context, but that some levels of abstraction will be necessary for some types of analyses, and other levels for different ones. So I understood “familiarity with the discourse topic” as referring to this problem of establishing the level of abstraction of the categories, although I’m not sure if the authors meant it like this.

      I’ve thought a lot about this in terms of conceptual metaphor theory: even though I think there are good methodological ways of determining metaphor at the linguistic level, it’s much harder (if it’s possible) at the conceptual level. I’ve never found a truly rigorous way of establishing conceptual domains, given all the different levels of abstraction that are possible.

      Like

      1. Sorry, typo. I said coordinator and I meant to say corridor.

        Like

    3. Thanks for the question! I might have phrased that part a bit unfortunately. However I have seen a lot of work stressing the role of prior knowledge and discursive familarity (which to some extent is certainly necessary to reach a sensible intrepretation. My impression is that by usually focusing on a particular topical area as a research interest, it’s pretty much inevitable that the categories will be informed by expectations/ prior findings. This might also have to do with the role of keywords, which commonly serve as a “broad picture” before deeper anlaysis is done. So content familiarity with the overall discourse will probably have to go into the process in order to come up with useful groupings.
      However, in my impression the overall question of what we’re actually *doing* when forming these groups has largely been disregarded and is sometimes taken for granted in a way. Some papers make it look like a rather ad-hoc process, some explicitly map their findings to aspects of social theory, but I feel like it’s often done rather selectively. Maybe foregrounding linguistic form, arranging words according to their linguistic features and drawing conclusions from different arrangements could contribute to making the process more explicit

      Like

      1. We seem to have posted at roughly the same time, so in my reply I only saw Nicholas’ comment. In any case, thanks for jumping in @Joe!
        And yes, I think you’ve phrased it much more clearly than I have. The kinds of groups necessitating a distinction, the level at which to abstract (e.g. who the different actors are, what kinds of topics/ arguments… are considered two sides of the same coin and which get their “own” category status) – I think all of this is often inevitably determined in part before the actual corpus study. In our case, for instance, Joachim’s analysis had established that essentially the same people might take the role of a “staff member”, “hospital representative” or “scientist” depending on how they were being framed. If this kind of grouping is taken as a starting point, it will most likely shape the further process of analysis

        Like

      2. Nicholas Groom June 17, 2020 — 2:23 pm

        Of course, it is trivially obvious that categorisation will always be influenced by prior knowledge; I don’t dispute that at all. On the contrary, it would be foolish and naive to assume that one could read concordance lines in a totally ‘tabula rasa’ way. But what I understood you to be saying was that a lot of keywords researchers didn’t even look at concordances when categorising keywords – I thought you were saying that they just look at the decontextualised lists of keywords that their software spits out and put them into categories without even checking to see how they are being used in concordances, which is something that I would have a problem with. But perhaps I misunderstood your point here – if so, apologies!

        Like

      3. Nicholas Groom June 17, 2020 — 2:32 pm

        Also, I entirely agree with you that “foregrounding linguistic form, arranging words according to their linguistic features and drawing conclusions from different arrangements could contribute to making the process more explicit”; this is precisely how I start analysing a set of concordance lines. That is, I always start by looking at the surface formal environments or patterns around each keyword, and then ask what meaning(s) is/are being made by each pattern. I would never go straight to a semantic analysis – my preference is form -> function!

        Like

  7. Irene (Stockholm uni) June 17, 2020 — 12:29 pm

    @Mark McGlashan & Alexandra Krendel: Thank you for this presentation! Nicely and pedagogically put!
    A comment and tip: I do believe that this is what you would get if you try the Words application from the Czech National Corpus, where you can enter a reference text of your choice. They do not have access to reference corpora in English, for obvious reasons. In the results when you use the app, there is a dispersion graph, and keywords for each individual text that you enter. https://kwords.korpus.cz if anyone wants to try it out!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close